[ https://issues.apache.org/jira/browse/GEODE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hale Bales resolved GEODE-9906. ------------------------------- Resolution: Won't Fix Relates to version 1.0.0-incubating > Unable to reconnect a node after SO patching "15 seconds have elapsed while > waiting for replies" > ------------------------------------------------------------------------------------------------ > > Key: GEODE-9906 > URL: https://issues.apache.org/jira/browse/GEODE-9906 > Project: Geode > Issue Type: Bug > Reporter: Marco Baldessari > Priority: Major > > I have a cluster situation consisting of 4 total nodes, 3 servers and 1 > management node, working properly. > At the beginning of the month we planned to patch the OS and we started from > the first server node with this procedure: > - Stop service > - S.O. patching > - Server restart > - Start service > The service of the first patched node named "serverA" fails to restart with > this error: > Log entries cluster join: > serverA: > | INFO | region-dm-12 | ache.geode.internal.tcp.Connection | > --> Connection: shared=true ordered=false failed to connect to peer > 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because: > java.net.ConnectException: Connection timed out (Connection timed out) > | WARN | region-dm-12 | ache.geode.internal.tcp.Connection | > --> Connection: Attempting reconnect to peer 10.237.110.195( Server > serverB:9993)<ec><v127>:1024 > > ServerMgmt: > | WARN | pool-3-thread-1 | tributed.internal.ReplyProcessor21 > | --> 15 seconds have elapsed while waiting for replies: > <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies > from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225( > Management:6033)<ec><v111>:1024 whose current membership list is: > [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( > Management:6033)<ec><v111>:1024, 10.237.110.195( Server > serverB:9993)<ec><v127>:1024, 10.237.110.194( Server > serverA:632)<ec><v174>:1024]] > > The connection between the systems was verified with tcpdumps, udp 1024 is > running fine. > > We have tried redeploying the service and making numerous attempts but we > always get the same error during startup. > Any idea? Thank you. > Marco. -- This message was sent by Atlassian Jira (v8.20.7#820007)