If anyone really wants to look at and run the test directly, it's *Bug41733DUnitTest* on develop.
Alternatively, you could pull the branch GEODE-1279-rename-regressionTests from my fork (see *https://github.com/apache/geode/pull/1672 <https://github.com/apache/geode/pull/1672>*) where I've renamed the test as *BucketCreationRequesterCrashHARegressionTest*, cleaned it up and temporarily added a locator to play with membership config (to see what effect that has on the membership logging and execution time). On Thu, Mar 22, 2018 at 7:40 PM, Kirk Lund <kl...@apache.org> wrote: > I'm looking through logs from a DUnit test that takes longer to run than I > think it should. It looks to me like membership is chewing up more time > than it should, and I have questions about some membership log messages. > > Specifically, I'm worried about: > 1) double-logging by vm1 when shutdown message is received from vm0 > > 2) double-logging by GMSHealthMonitor in vm0 AFTER it disconnected > > 3) vm1 KEEPS suspecting vm0 well after a) vm1 received shutdown message > from vm0, b) processed a view with vm0 departed and c) even logged that vm0 > gracefully left. (vm1 seems a bit schizophrenic about vm0) > > Below are the log messages with my comments interspersed. Please help me > understand what's actually going on here and if we're seeing undesired > behavior (ie minor bugs)... > > KIRK: vm0 is shutting down gracefully > > [vm0] [info 2018/03/22 19:16:27.010 PDT <Pooled High Priority Message > Processor 2> tid=87] Shutting down DistributionManager > 192.168.1.18(34057)<ec><v1>:32771. > > [vm1] [info 2018/03/22 19:16:27.014 PDT <Pooled High Priority Message > Processor 2> tid=66] received leave request from > 192.168.1.18(34057)<ec><v1>:32771 for 192.168.1.18(34057)<ec><v1>:32771 > > KIRK: vm1 states that vm0 gracefully left and logs it twice which seems > like a minor bug > > [vm1] [info 2018/03/22 19:16:27.016 PDT <Pooled High Priority Message > Processor 2> tid=66] Member at 192.168.1.18(34057)<ec><v1>:32771 > gracefully left the distributed cache: shutdown message received > > [vm1] [info 2018/03/22 19:16:27.016 PDT <Pooled High Priority Message > Processor 2> tid=66] Member at 192.168.1.18(34057)<ec><v1>:32771 > gracefully left the distributed cache: shutdown message received > > [vm1] [info 2018/03/22 19:16:27.332 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] received new view: > View[192.168.1.18(34059:locator)<ec><v0>:32770|3] members: > [192.168.1.18(34059:locator)<ec><v0>:32770, > 192.168.1.18(34058)<ec><v2>:32772{lead}] > shutdown: [192.168.1.18(34057)<ec><v1>:32771] > [vm1] old view is: View[192.168.1.18(34059:locator)<ec><v0>:32770|2] > members: [192.168.1.18(34059:locator)<ec><v0>:32770, > 192.168.1.18(34057)<ec><v1>:32771{lead}, 192.168.1.18(34058)<ec><v2>: > 32772] > > KIRK: even tho vm1 just received a new view it initiates suspect > processing for vm0 which also seems like a minor bug > > [vm1] [info 2018/03/22 19:16:28.186 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding > to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771 > > [vm1] [info 2018/03/22 19:16:28.683 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding > to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771 > > [vm1] [info 2018/03/22 19:16:29.180 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding > to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771 > > [vm1] [info 2018/03/22 19:16:29.681 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34058)<ec><v2>:32772, reason=Member isn't responding > to heartbeat requests]]] from non-member 192.168.1.18<ec><v1>:32771 > > [vm1] [info 2018/03/22 19:16:30.183 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't > responding to heartbeat requests]]] from non-member > 192.168.1.18<ec><v1>:32771 > > KIRK: vm0 surprisingly continues logging -- why is it suspecting other > members? why does it still have GMSHealthMonitor running? why does it log > the message twice? and again logging it twice looks like a minor bug to me > unless all of this double logging that I'm seeing means membership is doing > twice the work which would be horrible (please put my mind to rest about > this) > > [vm0] [info 2018/03/22 19:06:40.319 PDT <Geode Failure Detection > Scheduler> tid=41] All other members are suspect at this point > > [vm0] [info 2018/03/22 19:06:40.320 PDT <Geode Failure Detection > Scheduler> tid=41] All other members are suspect at this point > > KIRK: after the above 2 messages vm1 continues logging about suspect (why? > vm0 gracefully left!) > > [vm1] [info 2018/03/22 19:16:30.682 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't > responding to heartbeat requests]]] from non-member > 192.168.1.18<ec><v1>:32771 > > [vm1] [info 2018/03/22 19:16:31.181 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't > responding to heartbeat requests]]] from non-member > 192.168.1.18<ec><v1>:32771 > > [vm1] [info 2018/03/22 19:16:31.680 PDT <unicast > receiver,Kirks-MacBook-Pro-22774> tid=41] Membership ignoring suspect > request for SuspectMembersMessage [suspectRequests=[SuspectRequest > [member=192.168.1.18(34059:locator)<ec><v0>:32770, reason=Member isn't > responding to heartbeat requests]]] from non-member > 192.168.1.18<ec><v1>:32771 > >