[
https://issues.apache.org/jira/browse/GEODE-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188507#comment-17188507
]
ASF subversion and git services commented on GEODE-8467:
--------------------------------------------------------
Commit e402ed35102a4a885ad24a1052216b0542672bc7 in geode's branch
refs/heads/develop from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=e402ed3 ]
GEODE-8467: server fails to notify of a ForcedDisconnect and fails to tear down
the cache (#5490)
Catch exceptions that occur during XML generation and disable auto
reconnect.
Ensure that the DisconnectThread is launched by placing it in a
"finally" block.
> server fails to notify of a ForcedDisconnect and fails to tear down the cache
> -----------------------------------------------------------------------------
>
> Key: GEODE-8467
> URL: https://issues.apache.org/jira/browse/GEODE-8467
> Project: Geode
> Issue Type: Bug
> Components: membership
> Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
> Reporter: Bruce J Schuchardt
> Assignee: Bruce J Schuchardt
> Priority: Major
> Labels: pull-request-available
>
> A test having auto-reconnect enabled failed while restarting a server and
> hung. The restarting server was building its cache when it was kicked out of
> the cluster due to very high load on the test machine. Membership initiated
> a forced-disconnect
> {noformat}
> [fatal 2020/08/22 00:51:04.508 PDT <unicast
> receiver,rs-GEM-3035-PG2231-2a2i3large-hydra-client-25-42721> tid=0x23]
> Membership service failure: Member isn't responding to heartbeat requests
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
> Member isn't responding to heartbeat requests
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:2012)
> at
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1085)
> at
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:688)
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1331)
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1267)
> {noformat}
>
> and then logged that it was generating a description of the cache
> {noformat}
> [info 2020/08/22 00:51:05.933 PDT <unicast
> receiver,rs-GEM-3035-PG2231-2a2i3large-hydra-client-25-42721> tid=0x23]
> generating XML to rebuild the cache after reconnect completes {noformat}
>
> but it never logged completion of this step and never forked a thread to tear
> down the cache. Any exception thrown by XML generation would have been
> caught by JGroups code, which logs the problem at a WARNING level. We have
> JGroups logging set to FATAL level so you wouldn't see the issue.
> We need to add exception handling around XML generation and, if detected,
> disable reconnect attempts and have the server shut down.
> The bug isn't easy to hit. I've run the test that failed over 5000 times
> without encountering it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)