Hadoop Lives

I was a bit concerned after the announcement that Microsoft would be taking over search development for Yahoo that Hadoop would suffer. But according to this, development will continue if not be enhanced.

I’ve been using Hadoop and HBase with ReadPath and have been thoroughly impressed with the results. Prior to testing it out, I had read many comments about HBase being usable for batch processing but not really up to the task of replacing a database for live requests.

As an initial test I’ve replaced ReadPath’s dictionary system, which had been running on Mysql, with HBase. This system is responding to hundreds of requests a second and returning in ~10mS which is on par with what Mysql was doing. The big wins are with scalability though. I’ve currently got a five server cluster with which I’m seeing a 100x increase in inserts/updates over Mysql with a much larger data set. As I convert other databases to use HBase and add these servers into the Hadoop cluster, performance should only increase.