So since I was home alone this week, with Jaimie in Austin on business, I had the chance to catch up on some math studying. Spent the time looking at two topics.
First, while poking around at B&N I found a couple of magazines for “active” traders that went into strategies for playing the market. Looking at variations in volume, volatility, and price in order to develop purchasing models. What was a bit of a shock to me though was that as scientific as these strategies were treated they were in the end just strategies. There is no knowing, there are only statistics. This treatment is shockingly similar to the ones that you might use in blackjack. In fact, the measurements of success were very similar, winning plays vs. losing plays. It’s just that it is much harder to definitively say what a winning strategy is and how to quantify if you’re successfully following that strategy.
The second area that I looked at came from a pointer from Greg Gliden here. It’s a link to the class notes for a DataMining class at Stanford where I found the discussion on determining similarity and duplication in documents incredibly interesting. This is a very hard problem and I’m sure the major search players all have their own special sauce when it comes to how to do this best. Some of the proposed solutions deal with using k-grams of the documents to create signatures that can easily be compared. There were several different approaches to speed up the process and reduce computation requirements, but it definitely gives you a feel for what Google is up to with those hundreds of thousands of servers.