I ran into a small issue yesterday with a tomcat server running hibernate with a distributed ehcache setup for the second tier cache. My server setup currently uses two tomcat servers that are load balanced for serving front end pages along with several other backend servers that do offline processing. Because a lot of the work is done asynchronously, the frontend servers need to be updated when the processing has finished and the data has changed.
To start off with all of the tomcat servers where on machines with only a single private interface on a 10.0.0.0 network. There were seperate apache servers that bridged the public and private networks. However, to consolidate some of the servers I moved one of the tomcats onto an apache server with two nics.
What I noticed then was that this server was not recieving any of the cache flush commands from the backend servers. Turning on debug logging for ehcache verified that this server was not communicating with the other servers. I did a lot of poking around and trying out different configs, but everything looked good and since the configuration had been working fine just a few minutes before on the other server I figured it had to be something specific to the new dual nic server.
Eventually I tracked it down to a routing issue with the multicast udp packets. Since for the new server the default route went out over the internet the multicast messages were being sent the wrong way. This was quickly solved by putting in a new route to point the multicast packets to the private network.
route add -net 126.96.36.199 netmask 240.0.0.0 dev private_nic
Once the new route was in place everything worked as expected again.