I just noticed that the last time that I actually published a post on my blog was almost exactly a year ago. Something about having time at the in-laws while I’m waiting for the 2yr old to fall asleep gets me thinking about writing again. Going to try and publish a bit more often this year.
Programming hadoop, kerberos
Kerberos is one of those items that I can’t imagine anyone ever truly enjoys. It’s a necessary evil if you want to have a secure Hadoop cluster setup because without it the permissions checks are trivially easy to sidestep.
When you’re running into an issue with an application running in this type of setup the first thing you’ll want to look at is to turn on debug logging. If you start your application with the parameter:
Then a you’ll get a whole lot of debug output to stdout. Often issues will come up with ticket renewal and in most setups ticket renewal will happen every 24hrs. So it helps to pipe the console log to a file so you can go over it later. An example of how you can start an application to accomplish this is:
nohup ./bin/app-start >> logs/app.log 2>&1 &
This will start your application and pipe the stdout and stderr to the app.log file. It will also append to the log instead of rewriting it, this way you won’t lose the log on restarts.
When developing applications that communicate with a secure Hadoop cluster, I’ve found it to be helpful to change the default ticket renewal time to be 10minutes. Less than this and you’ll start to see unrelated errors, but this will make it easier to verify that ticket renewal is happening properly.
I’ve really wanted to get my hands on a linux laptop again. I used to work on one all the time, but most of my work these days is on a macbook pro. I’ve got a great Thinkpad sitting at home, but it’s running win7 because there are a couple of applications that I need that require windows.
Over the holiday break, I started playing with AWS Workspaces. For $35 / month I can have a win7 VDI setup. Along with workspaces, I’ve started testing Zocolo, Amazon’s file syncing solution.
So far, I’m really happy with the solution. It seems to work really well. Once I’ve got everything up and running in the workspace I’ll be able to reinstall linux on the Thinkpad and have the best of both worlds.
Programming encryption, security
Now that security on the net is becoming more of a concern, here are two websites for testing the security of your web connections. SSL/TLS is really a negotiation between client and server, so you need to see what settings each endpoint will accept.
For the Server side, there is SSL Labs. With this site you can put any URL in and get a score for how well configured the settings of that server are.
For the client side, there is How’s My SSL. This will give you a readout on the browser that you’re currently using.
As an Engineer at a storage company, I’m often working to characterize how different drives will perform in an environment. As you go down the stack from application to OS to hardware, there are a lot of different factors that come into play. It’s amazing to see what types of differences in performance you’ll see with varying drives and workloads.
Here are some example results from a test bed looking at a single Seagate 15K 600G drive connected to a LSI 2008 HBA in a CentOS 6.5 machine.
The Random Read tests are using fio 100% random reads of the specified block size and queue depth against the entire raw drive. There is no filesystem caching in these tests. The Random Write test is the same, but with 100% writes. The Mixed tests are 65% read and 35% writes.
I picked up another bitcoin this week. I had originally wanted to buy when they were worth < $100. Now they’re bumping up against $1000. I’m not too worried about them gaining or losing value, I just really wanted to see how they work.
The hardest part has been for normal users to convert dollars into bitcoin. They’re just not the easiest thing to purchase. You’ve got to do a bank transfer to someone that can handle it for you and trusting some party on the internet with any bank details whatsoever is always a bit of a leap of faith. I set up an account on Coinbase, which keeps the process fairly simple. The only downside is that it can take almost a full week for your purchase to go through.
Once you have a few bitcoins in your wallet, it becomes clear how revolutionary the system is. You can quickly bounce it around different wallets, in increments worth fractions of a penny. Bitcoin or a system like it is definitely going to take off, there are just too many advantages for it not to.
One last note, Jaimie just found a very good / simple explanation of bitcoin here.
I really don’t understand the decision in the Mac Settings to link together the scroll direction of the trackpad and mouse. There are two different check boxes for whether you want natural scroll direction or not for each of these inputs.
On the trackpad, I definitely want natural scroll direction. However, the so called “natural” direction on the mouse is the opposite of every other computer that I use. Since there are two check boxes, one for each of these settings, I would REALLY like to be able to have them set opposite of each other. The problem is that if you check one, it checks the other and vice versa.
Otherwise the macbook pro is the state of the art when it comes to laptops. Especially when paired with a thunderbolt display in the office.
I’m not sure when exactly the feature came out, but the deployment projects in Bamboo are AWESOME. They make it really easy to continue the workflow from Task(Jira) -> Code(Stash) -> Build and Deploy(Bamboo). I’ve got a lot of different environments and it’s really easy to keep track of which code is where. Another great product from Atlassian.
Seems like you’ve got to have a GitHub page these days. Never mind that I’ve got tens of thousands of lines of code that I’d be happy to show to anyone that’s interested. It’s all living here. It’s just that at this point I’m just not willing to give all of that code away.
So just to make sure I’m not missing out on anything, I’ve posted some sample code that I can use as a reference.
Programming CDH4, hadoop
Just ran into a little gotcha when running a huge job against my CDH4 cluster. One of the servers lost a drive at the 50% mark. Each server has 4 1TB drives mounted, so losing one isn’t a huge deal. With the new config “dfs.datanode.failed.volumes.tolerated” set to 2 it was possible for the datanode to keep right on going and not impact the larger job.
To get ready to replace the drive later, I unmounted the drive, leaving only the mount point dir. Then I made the mistake of bouncing the datanode so that I could start collecting ganglia stats, which are great by the way and really easy to set up.
Now the datanode determined that the mount point was back, nevermind that it was on the root device. So a day later, when the root device filled up and the tasks on that server started failing, I realized what I had done wrong.
If you’re going to temporarily take a drive out. Take it out of the config as well or else you’re going to forget about it and get yourself into trouble.
There was so much hype with Y2K, but it turns out that it’s a leap second that takes out portions of the web. I had my Amazon EC2 instances taken out with this bug. This little code snippet brought the java cpu load back to normal.
/etc/init.d/ntpd stop; date; date `date +”%m%d%H%M%C%y.%S”`; date
Then you just need to restart ntpd. Some have reported having to wait awhile to restart ntpd so that the issue doesn’t happen again.
Personal ipad, kindle
This is just a quick post for any of the Amazon iPad Kindle Reader developers out there that might be listening. As someone that has purchased and read hundreds of kindle books, the latest update is a step backwards.
Instapaper has better reading options and that’s a done by a single developer. The margins change is really bugging me. I realize that you just cant pick a single setting that is going to make 100% of the people happy, so there should just be a slider for margins. Then everyone can choose what they’re comfortable with. A slider for brightness and a slider for font size. Then you could let people choose their favorite font and you’d have the perfect reader.
I know that some of these changes are difficult. How would you calculate page numbers with infinite variations in the text? But making difficult things look easy is why Amazon is great.
I’ve noticed more and more lately, that while reading my news folder in my RSS Reader, that the stories all seem to revolve around humans killing themselves and others in the craziest and most horrific ways. These stories seem to account for at least half of the headlines. Is that all that the news can deliver these days?
Politics MyElectedRep, Politics
With MyElectedRep, I get a view into the people that are representing me. Whether I agree with the votes that they’re making on my behalf or if I would like them to vote another way.
To help me make decisions on votes, I can look at how some trusted organizations recommend I vote and read the analysis that they’ve provided.
With each Representative, I can vote on upcoming legislation, so that they can determine how the district feels. I can also go through their past votes and either agree or disagree with the votes that they’ve made. If I feel strongly about certain votes, I can contact my Representatives directly to let them know how I feel.
Once I’ve gone through the different bills and voted on the ones that I wanted to, I can look at how each Representative scores. They get a district score as well as a personal score. I can then use this score to determine if I should vote for this person to be my Representative again next time or if someone else would do a better job.
My Representatives are:
Politics, Work MyElectedRep
This article on CNN today expresses a lot of the reasoning behind why I created MyElectedRep.
With 3 of the 4 Republican candidates for President being heavily sponsored by single donors with $100+M in assets are you feeling more or less confident that after the next election cycle our elected officials will represent us and not the super wealthy that paid for their campaigns?
The only way we can keep our Representatives accountable is if we vote and show them how we want to be represented.
We need a place to point to, where we can show elected officials that they’re not doing their job. A place where we can show that on a certain item a district wants to vote one way. Then we can score our Representatives on whether or not they listened.
By measuring the votes that each Representative makes, it becomes less important to control the money that they receive.