If anyone really wants to look at and run the test directly, it's
*Bug41733DUnitTest* on develop.

Alternatively, you could pull the branch GEODE-1279-rename-regressionTests
from my fork (see *https://github.com/apache/geode/pull/1672
<https://github.com/apache/geode/pull/1672>*) where I've renamed the test
as *BucketCreationRequesterCrashHARegressionTest*, cleaned it up and
temporarily added a locator to play with membership config (to see what
effect that has on the membership logging and execution time).

On Thu, Mar 22, 2018 at 7:40 PM, Kirk Lund <kl...@apache.org> wrote:

> I'm looking through logs from a DUnit test that takes longer to run than I
> think it should. It looks to me like membership is chewing up more time
> than it should, and I have questions about some membership log messages.
>
> Specifically, I'm worried about:
> 1) double-logging by vm1 when shutdown message is received from vm0
>
> 2) double-logging by GMSHealthMonitor in vm0 AFTER it disconnected
>
> 3) vm1 KEEPS suspecting vm0 well after a) vm1 received shutdown message
> from vm0, b) processed a view with vm0 departed and c) even logged that vm0
> gracefully left. (vm1 seems a bit schizophrenic about vm0)
>
> Below are the log messages with my comments interspersed. Please help me
> understand what's actually going on here and if we're seeing undesired
> behavior (ie minor bugs)...
>
> KIRK: vm0 is shutting down gracefully
>
> [vm0] [info 2018/03/22 19:16:27.010 PDT <Pooled High Priority Message
> Processor 2> tid=87] Shutting down DistributionManager
> 192.168.1.18(34057)<ec><v1>:32771.
>
> [vm1] [info 2018/03/22 19:16:27.014 PDT <Pooled High Priority Message
> Processor 2> tid=66] received leave request from
> 192.168.1.18(34057)<ec><v1>:32771 for 192.168.1.18(34057)<ec><v1>:32771
>
> KIRK: vm1 states that vm0 gracefully left and logs it twice which seems
> like a minor bug
>
> [vm1] [info 2018/03/22 19:16:27.016 PDT <Pooled High Priority Message
> Processor 2> tid=66] Member at 192.168.1.18(34057)<ec><v1>:32771
> gracefully left the distributed cache: shutdown message received
>
> [vm1] [info 2018/03/22 19:16:27.016 PDT <Pooled High Priority Message
> Processor 2> tid=66] Member at 192.168.1.18(34057)<ec><v1>:32771
> gracefully left the distributed cache: shutdown message received
>
> [vm1] [info 2018/03/22 19:16:27.332 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] received new view:
> View[192.168.1.18(34059:locator)<ec><v0>:32770|3] members:
> [192.168.1.18(34059:locator)<ec><v0>:32770, 
> 192.168.1.18(34058)<ec><v2>:32772{lead}]
> shutdown: [192.168.1.18(34057)<ec><v1>:32771]
> [vm1] old view is: View[192.168.1.18(34059:locator)<ec><v0>:32770|2]
> members: [192.168.1.18(34059:locator)<ec><v0>:32770,
> 192.168.1.18(34057)<ec><v1>:32771{lead}, 192.168.1.18(34058)<ec><v2>:
> 32772]
>
> KIRK: even tho vm1 just received a new view it initiates suspect
> processing for vm0 which also seems like a minor bug
>
> [vm1] [info 2018/03/22 19:16:28.186 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding
> to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771
>
> [vm1] [info 2018/03/22 19:16:28.683 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding
> to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771
>
> [vm1] [info 2018/03/22 19:16:29.180 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding
> to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771
>
> [vm1] [info 2018/03/22 19:16:29.681 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding
> to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771
>
> [vm1] [info 2018/03/22 19:16:30.183 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't
> responding to heartbeat requests]]] from non-member
> 192.168.1.18<ec><v1>:32771
>
> KIRK: vm0 surprisingly continues logging -- why is it suspecting other
> members? why does it still have GMSHealthMonitor running? why does it log
> the message twice? and again logging it twice looks like a minor bug to me
> unless all of this double logging that I'm seeing means membership is doing
> twice the work which would be horrible (please put my mind to rest about
> this)
>
> [vm0] [info 2018/03/22 19:06:40.319 PDT <Geode Failure Detection
> Scheduler> tid=41] All other members are suspect at this point
>
> [vm0] [info 2018/03/22 19:06:40.320 PDT <Geode Failure Detection
> Scheduler> tid=41] All other members are suspect at this point
>
> KIRK: after the above 2 messages vm1 continues logging about suspect (why?
> vm0 gracefully left!)
>
> [vm1] [info 2018/03/22 19:16:30.682 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't
> responding to heartbeat requests]]] from non-member
> 192.168.1.18<ec><v1>:32771
>
> [vm1] [info 2018/03/22 19:16:31.181 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't
> responding to heartbeat requests]]] from non-member
> 192.168.1.18<ec><v1>:32771
>
> [vm1] [info 2018/03/22 19:16:31.680 PDT <unicast
> receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect
> request for SuspectMembersMessage [suspectRequests=[SuspectRequest
> [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't
> responding to heartbeat requests]]] from non-member
> 192.168.1.18<ec><v1>:32771
>
>

Reply via email to