More Personalization

To add a few more thoughts to the area of personalization, I think it’s going to be really interesting to see how different algorithms shake out. There have been all sorts of different approaches taken, a lot of false starts. The focus happens to be primarily centered in the area of search, trying to figure out what a user is really looking for.

In the area of personalization you’re faced with the problem of too much data as well as too little data at the same time. In terms of the actual queries that people use, there is usually way to little to work off of. The search query is where the user is telling you exactly what they want at that moment. Having too little data in this area is difficult to get around. The idea of natural language queries is based on the assumption that if systems could answer actual questions then people would ask actual questions. Even if the user asked a real question though, the problem of removing disambiguity doesn’t go away completely, but you do get more data to work with.

A more realistic method involves using query refinement with tools like grouping and classification to allow a user to further refine the area of the results that they’re interested in. Automatic classification still has its issues though as it’s a hard thing for a computer to get right. Although I personally think that there’s a long way to go with query refinement. Creating tools that are intuitive and work as a user expects would give them a whole lot more power.

With the user’s query, there’s just only so much that you can extract. However, there is a whole load of data that can be used in conjugation with the query. All of the user’s past actions can come into play in determining what the user currently is looking for. In this area there is often too much data. A user can often have months if not years of activity that can be used to create a profile. The issue then becomes, what aspects of the profile are relevant in determining what the user is currently looking for. It can often happen that past events can actually counteract what the user is currently looking for.

This is the area that I think the next big steps in personalization algorithms are going to come. As systems learn to determine what is currently relevant and what isn’t. Also, for how long a data point should be stored and used and at what point it becomes stale and needs to be discarded. The system also needs to determine which aspects of the profile are most important. Are viewing certain pages important, or is it combinations of pages, or combinations of other data points. There’s just so much to look at that there is definitely hope that some aspects of a user’s profile will be useful.