Since I’m on the waiting list for Google’s AppEngine, all of the buzz got me thinking about how the cloud computing realization could augment the projects that I’m currently working on. After doing some homework on what was available out there it actually appears that Amazon’s EC2 is better suited to my needs.
EC2 can provide quickly requisitioned servers for an incredibly low price. This makes several aspects of scaling out a web service much simpler. Now I’m trying to wrap my head around a new way of thinking. I’ve got a small cluster of computers that I would like to expand through EC2, the question is where to draw the line. What stays in my datacenter and what goes to Amazon’s?
You have to pay for moving data in and out of EC2, so you want to be careful about how different parts of the system interact. I currently have several aspects of ReadPath that are implemented as web services to make it easier to change the configuration of the underlying systems. These systems would be excellent candidates for moving to EC2, but what I’m also considering is actually moving all of the background computation and crawling into EC2. Then only having the public web interface in my datacenter. For the web portion the crawl data is read only while the user data is read/write. So we could separate the content data from the user data and create a nice separation. The main risk is if the EC2 systems aren’t stable and cause data loss, we’ll just have to see.