To Partition or Archive?

I’m getting close to having to make a decision with Cloudgrove on what to do about the main content table. Cloudgrove already has 5M posts and is growing rather quickly. Currently the table indices still fit within memory on the DB server, but at some point in the near future that will no longer be the case. When that happens, performance is going to start to suffer dramatically.

The two main ways to handle this are to either partition the table with the possibility of moving some of the partitions off to other DB servers or to setup an archive system to move older content off to another system. The access pattern to this content strongly favors reading content less than 2 weeks old. So having a small primary table with the latest content and a larger archive table with the rest of the content could work. The more standard system though would be to partition the table into say 10 sub tables. If I went this route I could use something like Hibernate Shards for data access and have to change very little code.

An advantage of using partitions is that the backend system is hidden from the application. So if access patterns change in the future there would be minimal impact. With an archive system, if the older content became used it could cause problems.

I’ll have to do some experiments with Hibernate Shards to see if it’s ready for production use and will do what I need it to.