Google Email Updates Update

Looks like there’s been an update to the Google Email Update application. I’m now getting about 20x the volume for a series of terms that I track. The system seems to be much more responsive to actual updates on the web. In the past I would actually question whether the system was working. I’m curious if this change has anything to do with updates or integration with Google’s Blog Search.

Google and YouTube, a marriage of necessity

It would appear that Google paid a huge price for a small company that doesn’t make any money. It makes you wonder what’s going on here.

There were some hints that YouTube had technology for detecting copyrighted works coming out, but I’m sure that’s not worth $1.6B.

YouTube has a huge user base, but that’s because it’s the easiest to use right now, no guarantee of future performance.

Google purchasing YouTube doesn’t make any sense unless you compare it to the situation where a competitor purchased it. Google already has their own video product, but it’s not anywhere as easy to use as YouTube’s. Google is not paying $1.6B to get the extra video capacity, but to keep it out of the hands of Yahoo, Microsoft, and AOL. It probably would have been cheaper for Google if YouTube had gone bust and just disappeared, but they couldn’t take that chance as it became more apparent that someone was going to pony up some money for the users.

Installing Fedora Core 5

I’ve upgraded my laptop to use FC5. Only took a couple of tries to get it to go through. The main issue is with the CD Handling. Once you’ve picked your packages there’s no way to adjust if there is a problem with the cd.

My initial install was going to just be a absolute bare bones install since I only wanted to have to download the first cd. Turns out that with all of the packages unselected you still need at least the 2nd cd with FC5. Wasn’t this way with FC4, but oh well.

Downloaded the second cd, got the laptop installed. Was then going to just install everything else I needed over the network. Usually this is really easy, just yum install kdebase and this will put kde and X on the system and away you go. This time though it installed kde, but didn’t mark Xorg as a dependency. Went back to install Xorg, but it was always missing something, this is the problem that rpms were designed to fix. It was late, so the plan was to start over in the morning.

For the next try, I downloaded all of the cds and just did a full install from cd. Of course there was some issue mounting cd 3. Install was no good, no way to recover, had to start over again. Finally on the 3rd time through all of the packages were installed and everything was good to go.

Mandrake had a better way to handle cds. If at some point while installing there was an issue, you could just tell it to keep going and to not install that package. Wish Fedora would add this since when you’re picking packages you have no idea which cds you need and if you pick a package on a cd that you don’t have there is no way to adjust other than simply starting the complete install over.

Next project, get the built in wifi working.

Update:

Just had to get the ipw2200 firmware installed and reboot and were off and running.

Webapp or Thick Client?

This question seems to be one of the leading questions in software development right now. Whether to go with the classic software style of a thick client that is downloaded onto the customers computer or to go with a Webapp that is hosted centrally.

With the launch of Google’s Docs and Spreadsheets the competing methodologies are laid out clearly. Microsoft has Office, the epitomy of fat clients, while Google has their webapps. Not only are these different application styles, but different philosophies entirely. Microsoft offers every feature a document writing user could ever want. You could spend years digging into every little nook and cranny in office and still not use everything that it has to offer. Meanwhile, Google is taking a track that has been pushed by a new breed of designers like 37Signals. This philosophy is to just offer what most people need to get most of their work done and focus on making that subset as strong as possible. By keeping things simple you actually create a more usable product.

The reality is that there is room for both philosophies as users have different needs. What will be interesting to see is if Google can accomplish their goal with the webapp. Web based applications have a lot of advantages over fat clients, maintenance, support, and upgrades are all much simpler when working with a centralized application. The problem has always been with how much you can make a browser do and would the browser behave. IE, Firefox, and Safari all behave very differently in certain situations and writing an application that works on all of these platforms can be difficult.

One thing that Google has going for it is that there is a whole lot of work and excitement happening in browser based apps right now while there just isn’t anything new happening with fat clients. The network is a more important part of the computer every day and available bandwith has become the limiting factor with computers now that the cheapest entry level computer purchased at Walmart can handle most any user’s tasks.

Another advantage is that as the browser becomes more of a standard way to run applications, bits of functionality can be developed to be shared with different applications. The whole mashup idea is still a bit ahead of its time as early implementations of this were based on APIs that weren’t really meant to be used together. Companies were exposing bits of functionality without really thinking things through. Others were building applications on these APIs without concern for them being revoked at some point in the future. The relationships were often to only one parties advantage, everyone was just experimenting to try it out. Now that the idea has settled down a bit and business models can be built around the APIs they’re maturing to the point where they can be safely used.

The product that I’m currently working on, YOUnite, actually splits the difference between the two philosophies. We’ve created what we call a Webtop application. It’s a downloaded client that runs a embedded application server on the local desktop. The user then interacts with the application through their browser. There were several reasons technically to go this route, but it’s still yet to be seen if we’re ahead of the game on this type of application or creating more trouble than it’s worth by going this route.

More Personalization

To add a few more thoughts to the area of personalization, I think it’s going to be really interesting to see how different algorithms shake out. There have been all sorts of different approaches taken, a lot of false starts. The focus happens to be primarily centered in the area of search, trying to figure out what a user is really looking for.

In the area of personalization you’re faced with the problem of too much data as well as too little data at the same time. In terms of the actual queries that people use, there is usually way to little to work off of. The search query is where the user is telling you exactly what they want at that moment. Having too little data in this area is difficult to get around. The idea of natural language queries is based on the assumption that if systems could answer actual questions then people would ask actual questions. Even if the user asked a real question though, the problem of removing disambiguity doesn’t go away completely, but you do get more data to work with.

A more realistic method involves using query refinement with tools like grouping and classification to allow a user to further refine the area of the results that they’re interested in. Automatic classification still has its issues though as it’s a hard thing for a computer to get right. Although I personally think that there’s a long way to go with query refinement. Creating tools that are intuitive and work as a user expects would give them a whole lot more power.

With the user’s query, there’s just only so much that you can extract. However, there is a whole load of data that can be used in conjugation with the query. All of the user’s past actions can come into play in determining what the user currently is looking for. In this area there is often too much data. A user can often have months if not years of activity that can be used to create a profile. The issue then becomes, what aspects of the profile are relevant in determining what the user is currently looking for. It can often happen that past events can actually counteract what the user is currently looking for.

This is the area that I think the next big steps in personalization algorithms are going to come. As systems learn to determine what is currently relevant and what isn’t. Also, for how long a data point should be stored and used and at what point it becomes stale and needs to be discarded. The system also needs to determine which aspects of the profile are most important. Are viewing certain pages important, or is it combinations of pages, or combinations of other data points. There’s just so much to look at that there is definitely hope that some aspects of a user’s profile will be useful.

Eclipse Issues

Today my linux thinkpad gave up the ghost. Eclipse just decided that it didn’t want to run anymore. Continually crashing, usually during file access, to the point where it wouldn’t run more than 5-15mins. This just isn’t acceptable for a work computer. This lead me to spend the day digging through the eclipse bug reports to see if I could figure out what was going on.

It turns out that it was a vm crash in native code. What was initially a bit disappointing was that there were several bug reports with the exact problem described on the eclipse bug system, but once it was determined that it was a jvm problem and not an eclipse problem all discussion was stopped and the bug was closed. Definitely a bit disappointing since a pointer in the right direction would have been really helpful. I do understand that with a project the size of Eclipse you do need to stay focused, but some of the comments were just a bit harsh.

While the stack traces let me know that the problem was in the jvm native code they weren’t any more helpful in tracking the issue down further. It appears that the source of the problem is actually in a library on the linux install because I tried installing multiple different jdks and all of them had the same exact problem. Clean installs had no affect on the problem at all. So, I’ve taken this as a sign that the upgrade that I had been planning to do for sometime was finally due.

Another disconcerting issue that I found while digging through the Eclipse bug reports was that eclipse can have OutOfMemoryErrors due to running out of PermSize space. This is due to all of classes that are loaded through the Eclipse plugin system. I’ve found that the Callisto System has a whole slew of useful plugins, but what I hadn’t realized was that for me doing a default install from Callisto brought Eclipse to 99.8% usage of PermSize space after startup. Anything in addition would put the system over the edge. It appears that you can address the problem by putting a fix like “-XX:PermSize=256m -XX:MaxPermSize=256m” in your eclipse.ini file.

This issue really brings up the inefficiency of java classloaders with the plugin system as well as the lack of standards with java executable params. With a full Eclipse IDE you really need at least 1Gb of memory just to get going. Anything less than this and you’re just not going to get the performance that you need. The other issue is that Eclipse has a really tough time managing something simple like PermSize because the java is designed to not care what’s running underneath, but if there is not a standard way to address these params then you really can’t address the problem in an elegant way.

Personalization

Netflix has an interesting little contest going on. The goal is to improve their cinematch software for making movie recommendations by 10%. The prize for winning the contest is $1M.

I’ve been working on personalization software for awhile now with my CloudGrove personal search engine. Now I’m looking forward to taking a look at the netflix setup to play around with some more data.