When Natural Language Processing Goes Wrong

Was just cruising through my news updates over on ReadPath where I’ve got a subscription to the Google EngEdu videos. These are always great ways to get some in depth coverage of a geeky topic.

One of the latest ones gave me a quick laugh though. A video on Natural Language Processing had it’s title clipped by an algorithm down to “Practical Applications of Natural Language Processing in Ass… “, which if there were any natural language processing going on with the syndication feed, it might have realized there was a problem.

Just checked out Powerset

Can’t say that I’m really all that impressed. It wasn’t the shockingly better experience that I remember Google was when I first came across it. Some of the searches that I tested out seemed to be slightly better than Google, but I had to use contrived examples that I wouldn’t really use in everyday activities. These searches were simply designed to test Powerset’s strengths.

Couple this lack of noticeable improvement in search results along with the poor user experience due to site speed and I can’t say that I’ll be making the switch any time soon.

Always check the time

Found another one of those gotchas when debugging caching with multiple servers. Don’t make the assumption that ntpd is working properly and that the time is synchronized on all of the servers. I just spent a whole lot of time to finally discover that somehow a server had its times off by an hour. This caused some very weird effects to trickle out.

Configuring distributed caching with EHCache on a server with multiple nics

I ran into a small issue yesterday with a tomcat server running hibernate with a distributed ehcache setup for the second tier cache. My server setup currently uses two tomcat servers that are load balanced for serving front end pages along with several other backend servers that do offline processing. Because a lot of the work is done asynchronously, the frontend servers need to be updated when the processing has finished and the data has changed.

To start off with all of the tomcat servers where on machines with only a single private interface on a network. There were seperate apache servers that bridged the public and private networks. However, to consolidate some of the servers I moved one of the tomcats onto an apache server with two nics.

What I noticed then was that this server was not recieving any of the cache flush commands from the backend servers. Turning on debug logging for ehcache verified that this server was not communicating with the other servers. I did a lot of poking around and trying out different configs, but everything looked good and since the configuration had been working fine just a few minutes before on the other server I figured it had to be something specific to the new dual nic server.

Eventually I tracked it down to a routing issue with the multicast udp packets. Since for the new server the default route went out over the internet the multicast messages were being sent the wrong way. This was quickly solved by putting in a new route to point the multicast packets to the private network.

route add -net netmask dev private_nic

Once the new route was in place everything worked as expected again.