https://issues.apache.org/bugzilla/show_bug.cgi?id=46935
Summary: Problem with and Patch for Using the Correct Multicast Address in Tomcat 5.5.x Product: Tomcat 5 Version: 5.5.27 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Catalina:Cluster AssignedTo: dev@tomcat.apache.org ReportedBy: oli...@neofonie.de Created an attachment (id=23425) --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23425) Patch for binding the multicast socket to the correct address in 5.5.27 Hi, when trying session replication using Apache Tomcat 5.5.27 and below, I came across some problems, not unlike other people that tried to use the cluster multicast membership service and TCP-based replication, as suggested in the Clustering/Session Replication HOWTO (http://tomcat.apache.org/tomcat-5.5-doc/cluster-howto.html). I found numerous reports from people having problems with clustering in 5.5, and although the usual response to their inquiry for help was telling them to check their configuration, I think there is a bug concerning clustering in 5.5 code which has survived up to 5.5.27. Although the 5.5.x Tomcat series is now somewhat obsolete, and this very problem has been successfully addresses in 6.0.x (but obviously never been backported to 5.5), I wanted to share what I found, because it might spare some headache for users and might reconsile others with Tomcat session replication altogether. :) I am aware of the networking prerequisites for TCP Replication, most notably caused by the Multicast Membership Service: - multicast support both in the operating system's networking stack and the network infrastructure altogether (like mentioned in The Clustering FAQ, see http://wiki.apache.org/tomcat/FAQ/Clustering#Q9 , among many other web locations) - occassional problems with multicast routing on GNU/Linux (the OS of choice for said setups) - specific problems with GNU/Linux, Java, multicast and IPv6 support (as discussed partially in http://java.sun.com/j2se/1.5.0/docs/guide/net/ipv6_guide/index.html, although I did not at all rely on IPv6 in my setup) I tried several configurations in different network environments, always making double-sure that multicast works (both using Java software and non-Java software), and although the "Simple Cluster Configuration" from the Replication HOWTO seemed to work for a while, more sophisticated setups regularly failed. However, session replication in Tomcat 6.0.18 also worked in similar setups that made 5.5.27 break. The usual symptoms where: - multicast membership packages sent through the network (and also reaching the network interfaces, although apparently not being received from the application) - no replicated sessions at all - frequent exceptions in catalina.out like: INFO Cluster-MembershipRecovery org.apache.catalina.cluster.mcast.McastService - Membership recovery was successful. WARN Cluster-MembershipReceiver org.apache.catalina.cluster.mcast.McastService - Error receiving mcast package (errorCounter=10). Try Recovery! java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:712) at org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:238) at org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run (McastServiceImpl.java:330) INFO Cluster-MembershipRecovery org.apache.catalina.cluster.mcast.McastService - Cluster membership, running recovery thread, multicasting is not functional. WARN Cluster-MembershipSender org.apache.catalina.cluster.mcast.McastService - Sender Thread ends with errorCounter=0. I finally compared the code snippets in Tomcat 6.0.18 and 5.5.27 that take care of the membership service respectively and found this: in org/apache/catalina/cluster/mcast/McastServiceImpl.java:167ff: protected void setupSocket() throws IOException { if (mcastBindAddress != null) socket = new MulticastSocket(new java.net. InetSocketAddress(mcastBindAddress, port)); else socket = new MulticastSocket(port); socket.setLoopbackMode(false); //hint that we don't need loop back messages and in org/apache/catalina/tribes/membership/McastServiceImpl.java:185ff: protected void setupSocket() throws IOException { if (mcastBindAddress != null) { try { log.info("Attempting to bind the multicast socket to "+address+":"+port); socket = new MulticastSocket(new InetSocketAddress(address,port)); } catch (BindException e) { /* * On some plattforms (e.g. Linux) it is not possible to bind * to the multicast address. In this case only bind to the * port. */ log.info("Binding to multicast address, failed. Binding to port only."); socket = new MulticastSocket(port); } } else { socket = new MulticastSocket(port); } So, provided a mcastBindAddress property has been specified, 6.0.18 uses the (multicast) address to create the InetSocketAddress to bind to, while 5.5.27 uses the mcastBindAddress - which causes the socket not to see any multicast packages at all, since being bound to the wrong address, hence the exceptions about receives timing out. Therefore, I suggest the following patch to alter the Tomcat5 multicast binding behaviour to be similar to Tomcat6: diff -u -r apache-tomcat-5.5.27-src.orig/container/modules/cluster/src/share/org /apache/catalina/cluster/mcast/McastServiceImpl.java apache-tomcat-5.5.27-src/co ntainer/modules/cluster/src/share/org/apache/catalina/cluster/mcast/McastService Impl.java --- apache-tomcat-5.5.27-src.orig/container/modules/cluster/src/share/org/apache /catalina/cluster/mcast/McastServiceImpl.java 2008-08-29 05:13:58.000000000 +0 200 +++ apache-tomcat-5.5.27-src/container/modules/cluster/src/share/org/apache/cata lina/cluster/mcast/McastServiceImpl.java 2008-11-27 01:29:04.905529298 +0 100 @@ -166,7 +166,7 @@ protected void setupSocket() throws IOException { if (mcastBindAddress != null) socket = new MulticastSocket(new java.net . - InetSocketAddress(mcastBindAddress, port)); + InetSocketAddress(address, port)); else socket = new MulticastSocket(port); socket.setLoopbackMode(false); //hint that we don't need loop back m essages if (mcastBindAddress != null) { With the above patch, Tomcat 5.5.27 worked for me as expected - and documented. A comment in Tomcat6 mentions that binding to a multicast address on GNU/Linux might fail, but I did not see any of the log messages in Tomcat6 about this kind of failure, and neither did I find similar Exceptions in the logs for Tomcat5. Either way, the above issue prevails, it just needs to additionally be addressed in a way similar to Tomcat6, that is, catching the Exception and using the MulticastSocket constructor with port as sole argument. I would be glad about any kind of feedback to this, I hope I didn't miss any considerable information on this whole topic that would justify a loud RTFM in my face, and I hope that this could be my humble part of improving the already excellent Apache Tomcat that we all so love. :) For the record: This has been tested on SuSE OpenLinux 10.1 32bit and 10.2 64bit with JDK 1.6.0 (1.6.0_07-b06), both 32bit and 64bit versions. Best regards, Oliver -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org