Eric Shu created GEODE-8465:
-------------------------------

             Summary: A stray event can resurrect on a client due to a race 
condition
                 Key: GEODE-8465
                 URL: https://issues.apache.org/jira/browse/GEODE-8465
             Project: Geode
          Issue Type: Bug
          Components: client queues
            Reporter: Eric Shu


A member with primary queue could send the QRM for the event before it removes 
the event from its HARegionQueue – this causes a race that a newly started 
member can gii and get the events from the member with primary queue after it 
has already sent the QRM for the event to existing members holding the 
secondary queue.

The following analysis showing that the race exists. 

{noformat}
The stray create of Object_5234on edge 10634 shows this issue with this cause.
>From edge 10634, we know the stray create is for v3 of the object and from 
>bridge 15141.
[fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client 
Updater Thread on 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125])
 port 26905> tid=0x541] VersionedThinRegionEntryHeapStringKey2@1b56c386 
(key=Object_5234; rawValue=VMCachedDeserializable@526996532; version=

{v3; rv28; 
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007;
 ds=2; time=1592434556093}
;member=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007)
 dispatching event 
EntryEventImpl[op=CREATE;region=/DefaultRegion;key=Object_5234;oldValue=null;newValue=VMCachedDeserializable@526996532;callbackArg=Update
 event originated in pid 
10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;version=

{v3; rv28; 
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007;
 ds=2; time=1592434556093; remote}
;id=EventIDid=52bytes;threadID=52;sequenceID=108];isFromServer]
[fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client 
Updater Thread on 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125])
 port 26905> tid=0x541] Put entry for region: /DefaultRegion key: Object_5234 
callbackArgument: Update event originated in pid 10655

***This version (v3) is an update. The primary queue at the time is 
bridgegemfire_2_2_host1_10538. The event is enqueued and dispatched ***
[fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message 
reader for 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])
 unshared ordered uid=358 dom #1 port=43914> tid=0x16d] DefaultRegion: 
notifying 1 bridge servers of event: 
EntryEventImpl[op=UPDATE;region=/DefaultRegion;key=Object_5234;oldValue=VMCachedDeserializable@1538889434;newValue=VMCachedDeserializable@2141181294;callbackArg=Update
 event originated in pid 
10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);version=

{v3; rv28; 
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
 ds=2; time=1592434556093; remote}
;context=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;id=EventIDid=52bytes;threadID=52;sequenceID=108];routing=]

[fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message 
reader for 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])
 unshared ordered uid=358 dom #1 port=43914> tid=0x16d] 
VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867; 
rawValue=HAEventWrapper[refCount=2; 
msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update
 event originated in pid 
10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag=

{v3; rv28; 
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
 ds=2; time=1592434556093; remote}
;hasCqs=false]]) dispatching event 
EntryEventImpl[op=CREATE;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=null;newValue=HAEventWrapper[refCount=2;
 
msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update
 event originated in pid 
10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag=

{v3; rv28; 
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
 ds=2; time=1592434556093; remote}
;hasCqs=false]];callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006]

***QRM is sent to the 2 peers at the time with the region ***
[fine 2020/06/17 15:55:56.213 PDT bridgegemfire_2_2_host1_10538 <Queue Removal 
Thread> tid=0x83] Sending 
(QueueRemovalMessage[_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue,
 1, EventIDid=52bytes;threadID=52;sequenceID=108], 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_3_host1_10666:10666:loner):60462:26db73c4:edgegemfire_2_3_host1_10666_1_queue,
 1, EventIDid=52bytes;threadID=52;sequenceID=108]]) to 2 peers 
([rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])@277,
 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_4_host1_23574:23574)<ec><v31>:41011(version:UNKNOWN[ordinal=125])@566])
 via tcp/ip

***the newly started bridgegemfire_2_3_24320 creates the HARegionQueue and gii 
from the member with primary queue (10538) ***
[info 2020/06/17 15:56:08.800 PDT <Client Queue Initialization Thread 1> 
tid=0x76] Initializing region 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue

[info 2020/06/17 15:56:09.285 PDT <Client Queue Initialization Thread 1> 
tid=0x76] 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
 is done getting image from 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006(version:GEODE
 1.3.0). isDeltaGII is false

[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> 
tid=0x76] adding haContainerKey to HARegion at 
1:HAEventWrapper[region=/DefaultRegion; key=Object_5234; refCount=1; 
putInProgress=0; 
event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108];
 no message] for 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue

[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> 
tid=0x76] invoking listeners: 
[org.apache.geode.internal.cache.ha.HARegionQueue$1@2e69472b]

[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> 
tid=0x76] Adding position 1 to available IDs. Region: 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue

[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> 
tid=0x76] RegionQueue on 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue(backup)
 done putting GII data into queue

***Please note that the event was being removed from its primary queue after 
QRM has been sent and bridge 24320 gii the events from it ***
[fine 2020/06/17 15:56:09.666 PDT bridgegemfire_2_2_host1_10538 <Client Message 
Dispatcher for 
rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634>
 tid=0x84] VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867; 
rawValue=REMOVED_PHASE2) dispatching event 
EntryEventImpl[op=LOCAL_DESTROY;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=HAEventWrapper[region=/DefaultRegion;
 key=Object_5234; refCount=1; inContainer=true; putInProgress=false; 
event=EventIDid=52bytes;threadID=52;sequenceID=108]; no 
message];newValue=null;callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006]

***Bridge 24320 now has the event for Object_5234(v3) that has been sent to the 
client and won't get QRM again ***

***Later, bridge 15141 gii from 24320 and got the event ***
[info 2020/06/17 16:17:36.454 PDT <Client Queue Initialization Thread 2> 
tid=0x78] Initializing region 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue

[info 2020/06/17 16:17:40.187 PDT <Client Queue Initialization Thread 2> 
tid=0x78] 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
 is done getting image from 
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_3_host1_24320:24320)<ec><v33>:41010.
 isDeltaGII is false

[debug 2020/06/17 16:17:40.415 PDT <Client Queue Initialization Thread 2> 
tid=0x78] 
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
 processing queue key 1 and value HAEventWrapper[region=/DefaultRegion; 
key=Object_5234; refCount=1; putInProgress=0; 
event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108];
 no message]

***It will send the event to client after it becomes the primary queue holder 
for client 10634 ***
[debug 2020/06/17 16:38:49.125 PDT <ServerConnection on port 26905 Thread 5> 
tid=0xb4] 
CacheClientProxy[identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634,connection=1;
 port=53006; primary=true; version=GEODE 1.3.0]: Queueing marker message. 
<org.apache.geode.internal.cache.tier.sockets.ClientMarkerMessageImpl@75bf95c8>.
 The queue contains 57 entries.
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to