Internet Scale Services Checklist

Comments Off on Internet Scale Services Checklist

Adrian Colyer created a great summarization of the takeaways from James Hamilton’s 2007 paper “On Designing and Deploying Internet-Scale Services.

It’s a nice checklist to go down and remind yourself of the things that you can solve in the design phase of a project so that they don’t bite you later in the operations/deployment phase.

Why is software hard?

Comments Off on Why is software hard?

The Morning Paper blog has a great analysis of a paper from 1987, No Silver Bullet – essence and accident in software engineering. Even though the paper is 30 years old, it’s still relevant. If you find yourself asking why there continue to be bugs in software, it’s worth reading.



Terminal hack to make logging into a cluster easier

Comments Off on Terminal hack to make logging into a cluster easier

I’ve been looking for a way to make it simpler to quickly login to all nodes of a cluster from a mac terminal. Discovered a small npm module that helped me do it called ttab. With this npm module I can then write a bash script:



rs6 () {
ttab -w ssh
ttab ssh
ttab ssh
ttab ssh
ttab ssh
ttab ssh

if [ “rs6” = $TYPE ]

This will open one new terminal window with 6 tabs all logged into the different servers.

Spring Boot with JSP and React Template Application

Comments Off on Spring Boot with JSP and React Template Application

Awhile back I was playing around with moving some personal applications to use spring boot. There are a lot of really nice aspects to this project. You get a whole lot of functionality without having to write boilerplate code.

  1. Executable jar file.
    1. packages entire application as a single executable file.
  2. Service Support
    1. link to your executable jar from /etc/init.d and you get full Linux service support (start, stop, restart, status)
    2. logs go to /var/log/{application name}.log
    3. PID status support in /var/run/
  3. Embedded web container
    1. tomcat or jetty is upgraded with your spring boot version so you don’t have a separate set of binaries to maintain.

The biggest downside that I could see was that JSPs were not supported out of the box because of a tomcat issue. Spring boot is great for developing REST microservices but somewhat limited as a general purpose web application replacement. You need to convert your display code to use Thymeleaf templates, which could be a significant amount of work.

I came across a workaround to get JSPs working as expected. It involved moving files out of the /WEB-INF/ directory and putting them in /resources/META-INF/resources/WEB-INF/. It’s a little weird, but when you do it everything works as expected. You’re even able to use things like JSP tags just as you would have before.

Another thing that I’ve been spending time thinking about is how to integrate a Java development project with a modern Javascript setup. Maven works as well as anything for a large multi-module setup, but how does the Javascript code integrate? There are a lot of different options to choose from. I wanted a build pipeline that would allow you to use the more up-to-date Javascript tools, not break for non-Javascript developers, and allow fast iteration of Javascript changes (doesn’t require a full ‘mvn clean package’ to see changes).

After all of this experimentation, I believe I’ve found a good balance of tools for both Java and Javascript developers. It uses Spring boot with full JSP support as a base. Gulp and Webpack for the Javascript build pipeline. Has a development mode that uses the Webpack dev server to iterate quickly. It also includes support for development with React, so you can play with the cutting edge of developing web applications.

I’ve done all of the fiddlings with different settings and created a working template application.  Check it out and let me know what you think.

Changing Kerberos Expiration To Test Ticket Renewal

Comments Off on Changing Kerberos Expiration To Test Ticket Renewal

If you’re testing a Kerberos enabled hadoop cluster and want to make sure that ticket renewal is working as expected, you’ll probably want to change the ticket renewal time so that you don’t have to wait 24 hours for each test.

Using a “krb5-server” as an authentication source for a Hadoop Cluster. You can run the following commands to change the default ticket lifespan. 

kadmin.local: getprinc krbtgt/EXAMPLE.COM@EXAMPLE.COM

Expiration date: [never]
Last password change: [never]
Password expiration date: [none]
Maximum ticket life: 1 days 00:00:00
Maximum renewable life: 365 days 00:00:00
Last modified: Wed Nov 19 00:09:37 UTC 2014 (quick/admin@EXAMPLE.COM)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 7
Key: vno 1, aes256-cts-hmac-sha1-96, no salt
Key: vno 1, aes128-cts-hmac-sha1-96, no salt
Key: vno 1, des3-cbc-sha1, no salt
Key: vno 1, arcfour-hmac, no salt
Key: vno 1, des-hmac-sha1, no salt
Key: vno 1, des-cbc-md5, no salt
Key: vno 1, des-cbc-crc, no salt
MKey: vno 1
Policy: [none]

To change the default ticket expiration from 1 day to 30 minutes, issue the following command:

kadmin.local:  modprinc -maxlife “30 minutes” krbtgt/EXAMPLE.COM@EXAMPLE.COM

You should then be able to verify that the settings have taken effect:

[root@host ~]# kinit -k -t hdfs.keytab hdfs
[root@host ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs@EXAMPLE.COM
Valid starting     Expires            Service principal
08/22/16 20:24:18  08/22/16 20:54:18  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 08/29/16 20:24:18

You should see that the “Expires” time is 30 minutes in the future. This keytab will be renewed every 30 minutes for 7 days.

The interesting point that isn’t well documented is that there is a hierarchy to the settings in Kerberos. You can modify each individual principle’s maxlife and maxrenewlife, but if a higher level principle has stricter settings then they will be used. The krbtgt principle is the top level principle. Changes made here will apply to all other principles.


Debugging Kerberos Issues

Comments Off on Debugging Kerberos Issues

Kerberos is one of those items that I can’t imagine anyone ever truly enjoys. It’s a necessary evil if you want to have a secure Hadoop cluster setup because without it the permissions checks are trivially easy to sidestep.

When you’re running into an issue with an application running in this type of setup the first thing you’ll want to look at is to turn on debug logging. If you start your application with the parameter:

Then a you’ll get a whole lot of debug output to stdout. Often issues will come up with ticket renewal and in most setups ticket renewal will happen every 24hrs. So it helps to pipe the console log to a file so you can go over it later. An example of how you can start an application to accomplish this is:

nohup ./bin/app-start >> logs/app.log 2>&1 &

This will start your application and pipe the stdout and stderr to the app.log file. It will also append to the log instead of rewriting it, this way you won’t lose the log on restarts.

When developing applications that communicate with a secure Hadoop cluster, I’ve found it to be helpful to change the default ticket renewal time to be 10minutes. Less than this and you’ll start to see unrelated errors, but this will make it easier to verify that ticket renewal is happening properly.

Trying out AWS Workspaces

Comments Off on Trying out AWS Workspaces

I’ve really wanted to get my hands on a Linux laptop again. I used to work on one all the time, but most of my work these days is on a MacBook pro. I’ve got a great Thinkpad sitting at home, but it’s running win7 because there are a couple of applications that I need that require windows.

Over the holiday break, I started playing with AWS Workspaces. For $35 / month I can have a win7 VDI setup.  Along with workspaces, I’ve started testing Zocolo, Amazon’s file syncing solution.

So far, I’m really happy with the solution. It seems to work really well. Once I’ve got everything up and running in the workspace I’ll be able to reinstall Linux on the Thinkpad and have the best of both worlds.

SSL Test Sites

Comments Off on SSL Test Sites

Now that security on the net is becoming more of a concern, here are two websites for testing the security of your web connections. SSL/TLS is really a negotiation between client and server, so you need to see what settings each endpoint will accept.

For the Server side, there is SSL Labs. With this site, you can put any URL in and get a score for how well configured the settings of that server are.

For the client side, there is How’s My SSL. This will give you a readout on the browser that you’re currently using.

Doing some drive characterizations

Comments Off on Doing some drive characterizations

As an Engineer at a storage company, I’m often working to characterize how different drives will perform in an environment. As you go down the stack from application to OS to hardware, there are a lot of different factors that come into play. It’s amazing to see what types of differences in performance, you’ll see with varying drives and workloads.

Here are some example results from a testbed looking at a single Seagate 15K 600G drive connected to an LSI 2008 HBA on a CentOS 6.5 machine.

The Random Read tests are using fio with 100% random reads of the specified block size and queue depth against the entire raw drive. There is no filesystem caching in these tests. The Random Write test is the same, but with 100% writes. The Mixed tests are 65% read and 35% writes.










Dabbling in Bitcoin

Comments Off on Dabbling in Bitcoin

I picked up another bitcoin this week. I had originally wanted to buy when they were worth  < $100. Now they’re bumping up against $1000. I’m not too worried about them gaining or losing value, I just really wanted to see how they work.

The hardest part has been for normal users to convert dollars into bitcoin. They’re just not the easiest thing to purchase. You’ve got to do a bank transfer to someone that can handle it for you. Trusting some 3rd party on the internet with any bank details whatsoever is always a bit of a leap of faith. I set up an account on Coinbase, which keeps the process fairly simple. The only downside is that it can take almost a full week for your purchase to go through.

Once you have a few bitcoins in your wallet, it becomes clear how revolutionary the system is. You can quickly bounce it around different wallets, in increments worth fractions of a penny. Bitcoin or a system like it is definitely going to take off, there are just too many advantages for it not to.

One last note, Jaimie just found a very good / simple explanation of bitcoin here.

Macbook Quibble

Comments Off on Macbook Quibble

I really don’t understand the decision in the Mac Settings to link together the scroll direction of the trackpad and mouse. There are two different check boxes for whether you want natural scroll direction or not for each of these inputs.

On the trackpad, I definitely want natural scroll direction. However, the so called “natural” direction on the mouse is the opposite of every other computer that I use. Since there are two check boxes, one for each of these settings, I would REALLY like to be able to have them set opposite of each other. The problem is that if you check one, it checks the other and vice versa.

Otherwise the macbook pro is the state of the art when it comes to laptops. Especially when paired with a thunderbolt display in the office.

Really love the new Atlassian Bamboo

Comments Off on Really love the new Atlassian Bamboo

I’m not sure when exactly the feature came out, but the deployment projects in Bamboo are AWESOME. They make it really easy to continue the workflow from Task(Jira) -> Code(Stash) -> Build and Deploy(Bamboo). I’ve got a lot of different environments and it’s really easy to keep track of which code is where. Another great product from Atlassian.

Added A Github Page

Comments Off on Added A Github Page

Seems like you’ve got to have a GitHub page these days. Never mind that I’ve got tens of thousands of lines of code that I’d be happy to show to anyone that’s interested. It’s all living here. It’s just that at this point I’m just not willing to give all of that code away.

So just to make sure I’m not missing out on anything, I’ve posted some sample code that I can use as a reference.

Don’t forget to change your hdfs / mapred config when a drive fails with hadoop

Comments Off on Don’t forget to change your hdfs / mapred config when a drive fails with hadoop

Just ran into a little gotcha when running a huge job against my CDH4 cluster. One of the servers lost a drive at the 50% mark. Each server has 4 1TB drives mounted, so losing one isn’t a huge deal. With the new config “dfs.datanode.failed.volumes.tolerated” set to 2 it was possible for the datanode to keep right on going and not impact the larger job.

To get ready to replace the drive later, I unmounted the drive, leaving only the mount point dir. Then I made the mistake of bouncing the datanode so that I could start collecting ganglia stats, which are great by the way and really easy to set up.

Now the datanode determined that the mount point was back, nevermind that it was on the root device. So a day later, when the root device filled up and the tasks on that server started failing, I realized what I had done wrong.

If you’re going to temporarily take a drive out. Take it out of the config as well or else you’re going to forget about it and get yourself into trouble.

Leap Second Issues

Comments Off on Leap Second Issues

There was so much hype with Y2K, but it turns out that it’s a leap second that takes out portions of the web. I had my Amazon EC2 instances taken out with this bug. This little code snippet brought the java cpu load back to normal.

/etc/init.d/ntpd stop; date; date `date +”%m%d%H%M%C%y.%S”`; date

Then you just need to restart ntpd. Some have reported having to wait awhile to restart ntpd so that the issue doesn’t happen again.

Older Entries