https://issues.apache.org/bugzilla/show_bug.cgi?id=56828
Bug ID: 56828 Summary: Cluster setup stopped working after 3 months in production Product: Tomcat 6 Version: 6.0.39 Hardware: Other OS: Linux Status: NEW Severity: major Priority: P2 Component: Cluster Assignee: dev@tomcat.apache.org Reporter: krishna.saran...@gmail.com We have J2EE war application deployed in a cluster setup having two nodes. Tomcat 6.0.39 is installed in the both nodes having identical war deployed in both. Its deployed in Amazon AWS environment, and the two ec2-nodes are beneath an ELB , with session stickiness enabled for JSESSIONID. Also the two tomcat nodes are session replication enabled too. Following is Cluster config updated server.xml file: ============================================================================= <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6" channelStartOptions="3"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true" /> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" autoBind="0" selectorTimeout="5000" maxThreads="6" address="x.x.x.x" port="4444" /> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="60000" keepAliveTime="10" keepAliveCount="0" /> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor" staticOnly="true"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor"> <Member className="org.apache.catalina.tribes.membership.StaticMember" host="x.x.x.x" port="4444" uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4}"/> </Interceptor> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="" /> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" /> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> ========================================================================== Receiver ip, static member ip and unique id is different in the server.xml of the other node in the cluster. this was running fine in production environment for 3 months. Suddenly there was an exception logged like this :, and started coming up infinitely. ================================================== Aug 6, 2014 12:00:39 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://10.160.40.12:4444,10.160.40.12,4444, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={}, domain={}, ]] message. Will verify. Aug 6, 2014 12:00:39 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://10.160.40.12:4444,10.160.40.12,4444, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={}, domain={}, ]] Aug 6, 2014 12:00:39 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send SEVERE: Unable to send message through cluster sender. org.apache.catalina.tribes.ChannelException: Operation has timed out(60000 ms.).; Faulty members:tcp://10.160.40.12:4444; at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97) at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53) at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80) at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:76) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:88) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216) at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175) at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:817) at org.apache.catalina.ha.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:791) at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:553) at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:537) at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:519) at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:430) at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:363) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) ============================================================================ After this, the web application is not accessible, and we have to manually kill the tomcat process in one node, thereby disabling the cluster. We are unsure, how all of a sudden this is coming, and disabling application access altogether. If there are any suggestion on remedy, pls provide the same. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org