[ https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352894#comment-17352894 ]
Barrett Oglesby commented on GEODE-9299: ---------------------------------------- If I simulate this behavior with a sleep on key=5 in Put65, I see the same extra event in the queue. Keys 0-4 are processed normally in servers 1 and 2: Server 1: {noformat} ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=0 ServerConnection on port 57561 Thread 1: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=0; shadowKey=113 ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=1 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=1; shadowKey=114 ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=2 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=2; shadowKey=115 ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=3 ServerConnection on port 57561 Thread 1: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=3; shadowKey=116 ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=4 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=4; shadowKey=117 {noformat} Server 2: {noformat} P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=0; shadowKey=113 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=1; shadowKey=114 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=2; shadowKey=115 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=3; shadowKey=116 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=4; shadowKey=117 {noformat} The ServerConnection thread in server 1 sleeps before processing key=5: {noformat} ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=5 ServerConnection on port 57561 Thread 1: Put65.cmdExecute sleeping key=5 {noformat} The client times out and fails over to server2 and retries key=5 and continues with keys 6-9. Notice the event with key=5 has shadowKey=118. Thats the key in the queue. {noformat} ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing retried key=5 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=5; shadowKey=118 ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=6 ServerConnection on port 57587 Thread 2: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=6; shadowKey=119 ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=7 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=7; shadowKey=120 ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=8 ServerConnection on port 57587 Thread 2: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=8; shadowKey=121 ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=9 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=9; shadowKey=122 {noformat} Server 1 enqueues keys 5-9: {noformat} P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=5; shadowKey=118 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=6; shadowKey=119 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=7; shadowKey=120 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=8; shadowKey=121 P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=9; shadowKey=122 {noformat} The ServerConnection thread in server 1 wakes up and processes key=5. DistributedEventTracker.hasSeenEvent realizes its a duplicate for the data region. That doesn't precent the enqueueing of the event. Notice the event is the same EventID, but a different shadowKey (231). So, this is a second event in the queue for the same event. {noformat} ServerConnection on port 57561 Thread 1: Put65.cmdExecute woke up key=5 ServerConnection on port 57561 Thread 1: DistributedEventTracker.hasSeenEvent event=EntryEventImpl[op=UPDATE;region=/__PR/_B__data_5;key=5;id=EventID[id=25 bytes;threadID=1;sequenceID=5];tailKey=231]; highestSequenceNumber=5 ServerConnection on port 57561 Thread 1: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=5; shadowKey=231 {noformat} Server 2 processes that duplicate event the same way: {noformat} P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: DistributedEventTracker.hasSeenEvent event=EntryEventImpl[op=UPDATE;region=/__PR/_B__data_5;key=5;id=EventID[id=25 bytes;threadID=1;sequenceID=5];routing=;tailKey=231]; highestSequenceNumber=5 P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=5; shadowKey=231 {noformat} Dumping the queues on each server shows 11 events in each queue with 2 events for key=5. Server 1: {noformat} The queue for parallel GatewaySender ny contains the following 6 primary entries: queueKey=113; region=/data; operation=CREATE; eventKey=0; value=0; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10000|1;sequenceID=0;bucketID=0] queueKey=116; region=/data; operation=CREATE; eventKey=3; value=3; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10003|1;sequenceID=3;bucketID=3] queueKey=118; region=/data; operation=CREATE; eventKey=5; value=5; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5] queueKey=120; region=/data; operation=CREATE; eventKey=7; value=7; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10007|1;sequenceID=7;bucketID=7] queueKey=122; region=/data; operation=CREATE; eventKey=9; value=9; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10009|1;sequenceID=9;bucketID=9] queueKey=231; region=/data; operation=UPDATE; eventKey=5; value=5; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5] The queue for parallel GatewaySender ny contains the following 5 secondary entries: queueKey=114; region=/data; operation=CREATE; eventKey=1; value=1; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10001|1;sequenceID=1;bucketID=1] queueKey=115; region=/data; operation=CREATE; eventKey=2; value=2; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10002|1;sequenceID=2;bucketID=2] queueKey=117; region=/data; operation=CREATE; eventKey=4; value=4; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10004|1;sequenceID=4;bucketID=4] queueKey=119; region=/data; operation=CREATE; eventKey=6; value=6; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10006|1;sequenceID=6;bucketID=6] queueKey=121; region=/data; operation=CREATE; eventKey=8; value=8; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10008|1;sequenceID=8;bucketID=8] {noformat} Server 2: {noformat} The queue for parallel GatewaySender ny contains the following 5 primary entries: queueKey=114; region=/data; operation=CREATE; eventKey=1; value=1; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10001|1;sequenceID=1;bucketID=1] queueKey=115; region=/data; operation=CREATE; eventKey=2; value=2; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10002|1;sequenceID=2;bucketID=2] queueKey=117; region=/data; operation=CREATE; eventKey=4; value=4; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10004|1;sequenceID=4;bucketID=4] queueKey=119; region=/data; operation=CREATE; eventKey=6; value=6; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10006|1;sequenceID=6;bucketID=6] queueKey=121; region=/data; operation=CREATE; eventKey=8; value=8; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10008|1;sequenceID=8;bucketID=8] The queue for parallel GatewaySender ny contains the following 6 secondary entries: queueKey=113; region=/data; operation=CREATE; eventKey=0; value=0; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10000|1;sequenceID=0;bucketID=0] queueKey=116; region=/data; operation=CREATE; eventKey=3; value=3; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10003|1;sequenceID=3;bucketID=3] queueKey=118; region=/data; operation=CREATE; eventKey=5; value=5; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5] queueKey=120; region=/data; operation=CREATE; eventKey=7; value=7; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10007|1;sequenceID=7;bucketID=7] queueKey=122; region=/data; operation=CREATE; eventKey=9; value=9; possibleDuplicate=false; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10009|1;sequenceID=9;bucketID=9] queueKey=231; region=/data; operation=UPDATE; eventKey=5; value=5; possibleDuplicate=true; eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5] {noformat} If I start the remote site, I see logging in DistributedEventTracker.hasSeenEvent that ignores the duplicate event, and the region size is correct: {noformat} ServerConnection on port 5417 Thread 8: DistributedEventTracker.hasSeenEvent event=EntryEventImpl[op=UPDATE;region=/__PR/_B__data_5;key=5;id=EventID[id=25 bytes;threadID=0x10005|1;sequenceID=5;bucketId=5];tailKey=231]; highestSequenceNumber=5 Region name=data; size=10 {noformat} I think the way to address this issue is to change the equals check to greater than or equals. > CI Failure: > WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > -------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-9299 > URL: https://issues.apache.org/jira/browse/GEODE-9299 > Project: Geode > Issue Type: Bug > Components: wan > Affects Versions: 1.15.0 > Reporter: Hale Bales > Assignee: Barrett Oglesby > Priority: Major > > {code:java} > org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from_v1.12.2] > FAILED > java.lang.AssertionError: expected:<100> but was:<101> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.stopSenderAndVerifyEvents(WANRollingUpgradeDUnitTest.java:227) > at > org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover(WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.java:98) > {code} > CI Failure: > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/229#B > Artifacts Available here: > http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0253/test-results/upgradeTest/1621635640/ -- This message was sent by Atlassian Jira (v8.3.4#803005)