[
https://issues.apache.org/jira/browse/GEODE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hale Bales resolved GEODE-9906.
-------------------------------
Resolution: Won't Fix
Relates to version 1.0.0-incubating
> Unable to reconnect a node after SO patching "15 seconds have elapsed while
> waiting for replies"
> ------------------------------------------------------------------------------------------------
>
> Key: GEODE-9906
> URL: https://issues.apache.org/jira/browse/GEODE-9906
> Project: Geode
> Issue Type: Bug
> Reporter: Marco Baldessari
> Priority: Major
>
> I have a cluster situation consisting of 4 total nodes, 3 servers and 1
> management node, working properly.
> At the beginning of the month we planned to patch the OS and we started from
> the first server node with this procedure:
> - Stop service
> - S.O. patching
> - Server restart
> - Start service
> The service of the first patched node named "serverA" fails to restart with
> this error:
> Log entries cluster join:
> serverA:
> | INFO | region-dm-12 | ache.geode.internal.tcp.Connection |
> --> Connection: shared=true ordered=false failed to connect to peer
> 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because:
> java.net.ConnectException: Connection timed out (Connection timed out)
> | WARN | region-dm-12 | ache.geode.internal.tcp.Connection |
> --> Connection: Attempting reconnect to peer 10.237.110.195( Server
> serverB:9993)<ec><v127>:1024
>
> ServerMgmt:
> | WARN | pool-3-thread-1 | tributed.internal.ReplyProcessor21
> | --> 15 seconds have elapsed while waiting for replies:
> <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies
> from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225(
> Management:6033)<ec><v111>:1024 whose current membership list is:
> [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225(
> Management:6033)<ec><v111>:1024, 10.237.110.195( Server
> serverB:9993)<ec><v127>:1024, 10.237.110.194( Server
> serverA:632)<ec><v174>:1024]]
>
> The connection between the systems was verified with tcpdumps, udp 1024 is
> running fine.
>
> We have tried redeploying the service and making numerous attempts but we
> always get the same error during startup.
> Any idea? Thank you.
> Marco.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)