[ https://issues.apache.org/jira/browse/GEODE-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Shu resolved GEODE-8465. ----------------------------- Fix Version/s: 1.14.0 Resolution: Fixed > A stray event can resurrect on a client due to a race condition > --------------------------------------------------------------- > > Key: GEODE-8465 > URL: https://issues.apache.org/jira/browse/GEODE-8465 > Project: Geode > Issue Type: Bug > Components: client queues > Affects Versions: 1.1.0 > Reporter: Eric Shu > Assignee: Eric Shu > Priority: Major > Labels: caching-applications, pull-request-available > Fix For: 1.14.0 > > > A member with primary queue could send the QRM for the event before it > removes the event from its HARegionQueue – this causes a race that a newly > started member can gii and get the events from the member with primary queue > after it has already sent the QRM for the event to existing members holding > the secondary queue. > The following analysis showing that the race exists. > {noformat} > The stray create of Object_5234on edge 10634 shows this issue with this cause. > From edge 10634, we know the stray create is for v3 of the object and from > bridge 15141. > [fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client > Updater Thread on > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125]) > port 26905> tid=0x541] VersionedThinRegionEntryHeapStringKey2@1b56c386 > (key=Object_5234; rawValue=VMCachedDeserializable@526996532; version= > {v3; rv28; > mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007; > ds=2; time=1592434556093} > ;member=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007) > dispatching event > EntryEventImpl[op=CREATE;region=/DefaultRegion;key=Object_5234;oldValue=null;newValue=VMCachedDeserializable@526996532;callbackArg=Update > event originated in pid > 10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;version= > {v3; rv28; > mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007; > ds=2; time=1592434556093; remote} > ;id=EventIDid=52bytes;threadID=52;sequenceID=108];isFromServer] > [fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client > Updater Thread on > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125]) > port 26905> tid=0x541] Put entry for region: /DefaultRegion key: Object_5234 > callbackArgument: Update event originated in pid 10655 > ***This version (v3) is an update. The primary queue at the time is > bridgegemfire_2_2_host1_10538. The event is enqueued and dispatched *** > [fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message > reader for > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]) > unshared ordered uid=358 dom #1 port=43914> tid=0x16d] DefaultRegion: > notifying 1 bridge servers of event: > EntryEventImpl[op=UPDATE;region=/DefaultRegion;key=Object_5234;oldValue=VMCachedDeserializable@1538889434;newValue=VMCachedDeserializable@2141181294;callbackArg=Update > event originated in pid > 10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);version= > {v3; rv28; > mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]); > ds=2; time=1592434556093; remote} > ;context=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;id=EventIDid=52bytes;threadID=52;sequenceID=108];routing=] > [fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message > reader for > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]) > unshared ordered uid=358 dom #1 port=43914> tid=0x16d] > VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867; > rawValue=HAEventWrapper[refCount=2; > msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update > event originated in pid > 10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag= > {v3; rv28; > mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]); > ds=2; time=1592434556093; remote} > ;hasCqs=false]]) dispatching event > EntryEventImpl[op=CREATE;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=null;newValue=HAEventWrapper[refCount=2; > > msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update > event originated in pid > 10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag= > {v3; rv28; > mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]); > ds=2; time=1592434556093; remote} > ;hasCqs=false]];callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006] > ***QRM is sent to the 2 peers at the time with the region *** > [fine 2020/06/17 15:55:56.213 PDT bridgegemfire_2_2_host1_10538 <Queue > Removal Thread> tid=0x83] Sending > (QueueRemovalMessage[_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue, > 1, EventIDid=52bytes;threadID=52;sequenceID=108], > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_3_host1_10666:10666:loner):60462:26db73c4:edgegemfire_2_3_host1_10666_1_queue, > 1, EventIDid=52bytes;threadID=52;sequenceID=108]]) to 2 peers > ([rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])@277, > > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_4_host1_23574:23574)<ec><v31>:41011(version:UNKNOWN[ordinal=125])@566]) > via tcp/ip > ***the newly started bridgegemfire_2_3_24320 creates the HARegionQueue and > gii from the member with primary queue (10538) *** > [info 2020/06/17 15:56:08.800 PDT <Client Queue Initialization Thread 1> > tid=0x76] Initializing region > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > [info 2020/06/17 15:56:09.285 PDT <Client Queue Initialization Thread 1> > tid=0x76] > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > is done getting image from > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006(version:GEODE > 1.3.0). isDeltaGII is false > [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> > tid=0x76] adding haContainerKey to HARegion at > 1:HAEventWrapper[region=/DefaultRegion; key=Object_5234; refCount=1; > putInProgress=0; > event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108]; > no message] for > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> > tid=0x76] invoking listeners: > [org.apache.geode.internal.cache.ha.HARegionQueue$1@2e69472b] > [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> > tid=0x76] Adding position 1 to available IDs. Region: > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > [debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1> > tid=0x76] RegionQueue on > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue(backup) > done putting GII data into queue > ***Please note that the event was being removed from its primary queue after > QRM has been sent and bridge 24320 gii the events from it *** > [fine 2020/06/17 15:56:09.666 PDT bridgegemfire_2_2_host1_10538 <Client > Message Dispatcher for > rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634> > tid=0x84] VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867; > rawValue=REMOVED_PHASE2) dispatching event > EntryEventImpl[op=LOCAL_DESTROY;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=HAEventWrapper[region=/DefaultRegion; > key=Object_5234; refCount=1; inContainer=true; putInProgress=false; > event=EventIDid=52bytes;threadID=52;sequenceID=108]; no > message];newValue=null;callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006] > ***Bridge 24320 now has the event for Object_5234(v3) that has been sent to > the client and won't get QRM again *** > ***Later, bridge 15141 gii from 24320 and got the event *** > [info 2020/06/17 16:17:36.454 PDT <Client Queue Initialization Thread 2> > tid=0x78] Initializing region > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > [info 2020/06/17 16:17:40.187 PDT <Client Queue Initialization Thread 2> > tid=0x78] > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > is done getting image from > rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_3_host1_24320:24320)<ec><v33>:41010. > isDeltaGII is false > [debug 2020/06/17 16:17:40.415 PDT <Client Queue Initialization Thread 2> > tid=0x78] > _gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue > processing queue key 1 and value HAEventWrapper[region=/DefaultRegion; > key=Object_5234; refCount=1; putInProgress=0; > event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108]; > no message] > ***It will send the event to client after it becomes the primary queue holder > for client 10634 *** > [debug 2020/06/17 16:38:49.125 PDT <ServerConnection on port 26905 Thread 5> > tid=0xb4] > CacheClientProxy[identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634,connection=1; > port=53006; primary=true; version=GEODE 1.3.0]: Queueing marker message. > <org.apache.geode.internal.cache.tier.sockets.ClientMarkerMessageImpl@75bf95c8>. > The queue contains 57 entries. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)