Really love the new Atlassian Bamboo

I’m not sure when exactly the feature came out, but the deployment projects in Bamboo are AWESOME. They make it really easy to continue the workflow from Task(Jira) -> Code(Stash) -> Build and Deploy(Bamboo). I’ve got a lot of different environments and it’s really easy to keep track of which code is where. Another great product from Atlassian.

Added A Github Page

Seems like you’ve got to have a GitHub page these days. Never mind that I’ve got tens of thousands of lines of code that I’d be happy to show to anyone that’s interested. It’s all living here. It’s just that at this point I’m just not willing to give all of that code away.

So just to make sure I’m not missing out on anything, I’ve posted some sample code that I can use as a reference.

Don’t forget to change your hdfs / mapred config when a drive fails with hadoop

Just ran into a little gotcha when running a huge job against my CDH4 cluster. One of the servers lost a drive at the 50% mark. Each server has 4 1TB drives mounted, so losing one isn’t a huge deal. With the new config “dfs.datanode.failed.volumes.tolerated” set to 2 it was possible for the datanode to keep right on going and not impact the larger job.

To get ready to replace the drive later, I unmounted the drive, leaving only the mount point dir. Then I made the mistake of bouncing the datanode so that I could start collecting ganglia stats, which are great by the way and really easy to set up.

Now the datanode determined that the mount point was back, nevermind that it was on the root device. So a day later, when the root device filled up and the tasks on that server started failing, I realized what I had done wrong.

If you’re going to temporarily take a drive out. Take it out of the config as well or else you’re going to forget about it and get yourself into trouble.

Leap Second Issues

There was so much hype with Y2K, but it turns out that it’s a leap second that takes out portions of the web. I had my Amazon EC2 instances taken out with this bug. This little code snippet brought the java cpu load back to normal.

/etc/init.d/ntpd stop; date; date `date +”%m%d%H%M%C%y.%S”`; date

Then you just need to restart ntpd. Some have reported having to wait awhile to restart ntpd so that the issue doesn’t happen again.