Added A Github Page

Seems like you’ve got to have a GitHub page these days. Never mind that I’ve got tens of thousands of lines of code that I’d be happy to show to anyone that’s interested. It’s all living here. It’s just that at this point I’m just not willing to give all of that code away.

So just to make sure I’m not missing out on anything, I’ve posted some sample code that I can use as a reference.

Don’t forget to change your hdfs / mapred config when a drive fails with hadoop

Just ran into a little gotcha when running a huge job against my CDH4 cluster. One of the servers lost a drive at the 50% mark. Each server has 4 1TB drives mounted, so losing one isn’t a huge deal. With the new config “dfs.datanode.failed.volumes.tolerated” set to 2 it was possible for the datanode to keep right on going and not impact the larger job.

To get ready to replace the drive later, I unmounted the drive, leaving only the mount point dir. Then I made the mistake of bouncing the datanode so that I could start collecting ganglia stats, which are great by the way and really easy to set up.

Now the datanode determined that the mount point was back, nevermind that it was on the root device. So a day later, when the root device filled up and the tasks on that server started failing, I realized what I had done wrong.

If you’re going to temporarily take a drive out. Take it out of the config as well or else you’re going to forget about it and get yourself into trouble.

Leap Second Issues

There was so much hype with Y2K, but it turns out that it’s a leap second that takes out portions of the web. I had my Amazon EC2 instances taken out with this bug. This little code snippet brought the java cpu load back to normal.

/etc/init.d/ntpd stop; date; date `date +”%m%d%H%M%C%y.%S”`; date

Then you just need to restart ntpd. Some have reported having to wait awhile to restart ntpd so that the issue doesn’t happen again.

iPad Kindle Update

This is just a quick post for any of the Amazon iPad Kindle Reader developers out there that might be listening. As someone that has purchased and read hundreds of kindle books, the latest update is a step backwards.

Instapaper has better reading options and that’s a done by a single developer. The margins change is really bugging me. I realize that you just cant pick a single setting that is going to make 100% of the people happy, so there should just be a slider for margins. Then everyone can choose what they’re comfortable with. A slider for brightness and a slider for font size. Then you could let people choose their favorite font and you’d have the perfect reader.

I know that some of these changes are difficult. How would you calculate page numbers with infinite variations in the text? But making difficult things look easy is why Amazon is great.

My Representatives

With MyElectedRep, I get a view into the people that are representing me. Whether I agree with the votes that they’re making on my behalf or if I would like them to vote another way.

To help me make decisions on votes, I can look at how some trusted organizations recommend I vote and read the analysis that they’ve provided.

With each Representative, I can vote on upcoming legislation, so that they can determine how the district feels. I can also go through their past votes and either agree or disagree with the votes that they’ve made. If I feel strongly about certain votes, I can contact my Representatives directly to let them know how I feel.

Once I’ve gone through the different bills and voted on the ones that I wanted to, I can look at how each Representative scores. They get a district score as well as a personal score. I can then use this score to determine if I should vote for this person to be my Representative again next time or if someone else would do a better job.

My Representatives are:

Money in Politics

This article on CNN today expresses a lot of the reasoning behind why I created MyElectedRep.

With 3 of the 4 Republican candidates for President being heavily sponsored by single donors with $100+M in assets are you feeling more or less confident that after the next election cycle our elected officials will represent us and not the super wealthy that paid for their campaigns?

The only way we can keep our Representatives accountable is if we vote and show them how we want to be represented.

We need a place to point to, where we can show elected officials that they’re not doing their job. A place where we can show that on a certain item a district wants to vote one way. Then we can score our Representatives on whether or not they listened.

By measuring the votes that each Representative makes, it becomes less important to control the money that they receive.

Android isn’t open source

There’s been a lot of discussion over documents coming out about Google’s strategy with Android. Google wants to use a Carrot and Stick strategy with Android to try and maintain control of the platform. This involves giving hardware manufacturers that behave early access to new code. It also meant that the code is developed in private and only released after the fact.

I don’t care what Google or others say, this isn’t open source. This is published source. There is no way to see bugs, contribute a patch, or take part in discussions on development. The only saving grace is that Google is publishing their code with a fairly unrestrictive license.

What this situation is screaming for is to have someone with a desire to actually behave in an open source manner to come along and fork the code. Then allow developers to contribute to that branch and let Google go it on their own. My hope is that Amazon will do exactly that with their fork of the Android code base.

Getting EHCache to shutdown cleanly in a webapp

I was seeing an annoying issue with a webapp having trouble shutting down cleanly. I would issue the tomcat stop command and most everything would shutdown, but I’d get this error:

Exception in thread "Multicast Heartbeat Receiver Thread" java.lang.NullPointerException
 at org.slf4j.impl.Log4jLoggerAdapter.error(
 at net.sf.ehcache.distribution.MulticastKeepaliveHeartbeatReceiver$

After the error, the Multicast Heartbeat Receiver would not shutdown and would just hang. Eventually you would have to kill off the java process manually.

Doing a quick search found this page. With a small snippet added to the web.xml to add a shutdown listener everything shutdowns cleanly now.


My Own Search Engine

I guess every software engineer should be writing their own competitors to Google and Facebook in their garage (in a future post I’ll include pics of our garage data center).  Because of issues I saw with Facebook, I created ReadPath. A social network with more of a focus on privacy and news sharing. It’s currently about 70% done. UI still needs lots of tweaks and there are some features that need to be completed.

One nice bonus of running ReadPath, is that it is constantly spidering content from RSS feeds for the news reader. The other day I realized that I’ve now stored a full billion content items going back several years. So of course having that much content I had to create a search engine to mine it. So, I created MiniSearch to play with different concepts involved in running a general search engine. There are a lot of things that turned out to be a lot harder than expected.

Currently the index is in the process of being built and only includes 20% of available content. There is also a lot of work to be done with ranking still. I’ll post again when I think it’s in a more usable state.

Moving back to Firefox

With the release of Firefox 4.0 today, I’ve switched my default browser back to Firefox. Chrome had taken over for awhile because it was cleaner and faster. The latest Firefox seems to be just as fast now and I prefer having access to Firebug when I need it without having to open another browser.

Great Job Mozilla guys 🙂

Virtualizing Mission Critical Applications

Jaimie has organized a webinar to discuss what it takes to manage a large scale virtualization project.

One of the speakers, Mr. Brodhun, is uniquely qualified on this subject having previously served as Technical Director for Enterprise Standards and Technologies for the United States Marine Corps, where he oversaw the deployment of approximately 2,300 ESX hosts and nearly 7,000 virtual machines across 167 sites.

Regardless of the size of your virtualization project, you’ll learn how to maximize uptime and performance of mission critical applications, while eliminating hidden costs that can decrease virtualization ROI upwards of 50%.

Click here to register for this webinar.