[ 
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352893#comment-17352893
 ] 

Barrett Oglesby commented on GEODE-9299:
----------------------------------------

The failing assertion is verifying the number of entries in the local secondary 
queues is 100 (which matches the number of puts). Instead, it is 101.
{noformat}
int localServer1QueueSize = localServer1.invoke(() -> 
getQueueRegionSize(senderId, false));
int localServer2QueueSize = localServer2.invoke(() -> 
getQueueRegionSize(senderId, false));
assertEquals(numPuts, localServer1QueueSize + localServer2QueueSize);
{noformat}
Here is some logging that shows the behavior in this test.

Client Starts:
{noformat}
[vm3_v1.12.2] [info 2021/05/21 21:12:16.982 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40@59079e6c
[vm3_v1.12.2] [info 2021/05/21 21:12:17.599 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Using 
org.apache.geode.logging.log4j.internal.impl.Log4jLoggingProvider from 
ServiceLoader for service org.apache.geode.logging.internal.spi.LoggingProvider
[vm3_v1.12.2] [info 2021/05/21 21:12:24.490 GMT  <poolTimer-DEFAULT-3> 
tid=0x32] Updating membership port.  Port changed from 0 to 46166.  ID is now 
7e72072330df(13685:loner):0:6094c590
[vm3_v1.12.2] [info 2021/05/21 21:12:24.526 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Got result: null
[vm3_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40@59079e6c
 (took 7538 ms)
{noformat}
Client does 100 puts in 22069ms with a SocketTimeoutException:
{noformat}
[vm3_v1.12.2] [info 2021/05/21 21:12:24.567 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840@2e8f97c1
[vm3_v1.12.2] [warn 2021/05/21 21:12:42.233 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Pool unexpected socket timed out on client 
connection=Pooled Connection to 7e72072330df:21250: 
Connection[7e72072330df:21250]@93891194)
[vm3_v1.12.2] [info 2021/05/21 21:12:46.638 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Got result: null
[vm3_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840@2e8f97c1
 (took 22069 ms)
{noformat}
The SocketTimeoutException means the client retried the put. That ends up being 
2 puts for the same event.

Server 1 returns secondary queue size:
{noformat}
[vm1_v1.12.2] [info 2021/05/21 21:12:46.668 GMT  <RMI TCP 
Connection(3)-172.17.0.2> tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040@79d1c376
[vm1_v1.12.2] [info 2021/05/21 21:12:47.598 GMT  <RMI TCP 
Connection(3)-172.17.0.2> tid=0x22] Got result: null
[vm1_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040@79d1c376
 (took 929 ms)
{noformat}
Server 2 returns secondary queue size:
{noformat}
[vm2_v1.12.2] [info 2021/05/21 21:12:47.617 GMT  <RMI TCP 
Connection(3)-172.17.0.2> tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40@751350b6
[vm2_v1.12.2] [info 2021/05/21 21:12:47.782 GMT  <RMI TCP 
Connection(3)-172.17.0.2> tid=0x22] Got result: null
[vm2_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40@751350b6
 (took 161 ms)
{noformat}
The assertEquals check fails right after this, and the test shuts down. 

Here is some more detail.

Server 1 buckets are created:
{noformat}
[vm1_v1.12.2] [info 2021/05/21 21:12:24.771 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
[vm1_v1.12.2] [info 2021/05/21 21:12:24.847 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:25.418 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
[vm1_v1.12.2] [info 2021/05/21 21:12:25.439 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:26.012 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
[vm1_v1.12.2] [info 2021/05/21 21:12:26.055 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:26.823 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
[vm1_v1.12.2] [info 2021/05/21 21:12:26.842 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:28.299 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
[vm1_v1.12.2] [info 2021/05/21 21:12:28.317 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:42.903 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
[vm1_v1.12.2] [info 2021/05/21 21:12:42.929 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:43.102 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
[vm1_v1.12.2] [info 2021/05/21 21:12:43.107 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:43.661 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
[vm1_v1.12.2] [info 2021/05/21 21:12:43.692 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:44.095 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
[vm1_v1.12.2] [info 2021/05/21 21:12:44.098 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:44.366 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
[vm1_v1.12.2] [info 2021/05/21 21:12:44.386 GMT  <ServerConnection on port 
27499 Thread 1> tid=0x81] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
 completed
{noformat}
Server 2 buckets are created:
{noformat}
[vm2_v1.12.2] [info 2021/05/21 21:12:24.643 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
[vm2_v1.12.2] [info 2021/05/21 21:12:24.657 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:25.506 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
[vm2_v1.12.2] [info 2021/05/21 21:12:25.539 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:25.935 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
[vm2_v1.12.2] [info 2021/05/21 21:12:25.939 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:27.963 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
[vm2_v1.12.2] [info 2021/05/21 21:12:28.107 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:28.364 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
[vm2_v1.12.2] [info 2021/05/21 21:12:28.410 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:42.804 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
[vm2_v1.12.2] [info 2021/05/21 21:12:42.809 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:43.179 GMT  <Pooled Waiting Message 
Processor 2> tid=0x7a] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
[vm2_v1.12.2] [info 2021/05/21 21:12:43.240 GMT  <Pooled Waiting Message 
Processor 2> tid=0x7a] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:43.543 GMT  <Pooled Waiting Message 
Processor 3> tid=0x7c] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
[vm2_v1.12.2] [info 2021/05/21 21:12:43.550 GMT  <Pooled Waiting Message 
Processor 3> tid=0x7c] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:44.132 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
[vm2_v1.12.2] [info 2021/05/21 21:12:44.176 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
 completed
[vm2_v1.12.2] [info 2021/05/21 21:12:44.297 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
[vm2_v1.12.2] [info 2021/05/21 21:12:44.300 GMT  <Pooled Waiting Message 
Processor 1> tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
 completed
{noformat}
There is a big gap between buckets 4 and 5 being created.

Server 2 has several interesting messages in the gap between 21:12:28 and 
21:12:42:
{noformat}
[vm2_v1.12.2] [warn 2021/05/21 21:12:35.588 GMT  <StatSampler> tid=0x3f] 
Statistics sampling thread detected a wakeup delay of 5627 ms, indicating a 
possible resource issue. Check the GC, memory, and CPU statistics.
[vm2_v1.12.2] [warn 2021/05/21 21:12:42.715 GMT  <ServerConnection on port 
21250 Thread 1> tid=0x6f] Server connection from 
[identity(172.17.0.2(13685:loner):46166:6094c590,connection=1; port=46166]: 
connection disconnect detected by EOF.
{noformat}
And, the client timed out at that time:
{noformat}
[vm3_v1.12.2] [warn 2021/05/21 21:12:42.233 GMT  <RMI TCP 
Connection(2)-172.17.0.2> tid=0x22] Pool unexpected socket timed out on client 
connection=Pooled Connection to 7e72072330df:21250: 
Connection[7e72072330df:21250]@93891194)
{noformat}
So, server 2 had a resource issue (CPU or GC most likely) that caused the 
client to timeout.


> CI Failure: 
> WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > 
> testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9299
>                 URL: https://issues.apache.org/jira/browse/GEODE-9299
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>    Affects Versions: 1.15.0
>            Reporter: Hale Bales
>            Assignee: Barrett Oglesby
>            Priority: Major
>
> {code:java}
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
>  > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from_v1.12.2] 
> FAILED
>     java.lang.AssertionError: expected:<100> but was:<101>
>         at org.junit.Assert.fail(Assert.java:89)
>         at org.junit.Assert.failNotEquals(Assert.java:835)
>         at org.junit.Assert.assertEquals(Assert.java:647)
>         at org.junit.Assert.assertEquals(Assert.java:633)
>         at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.stopSenderAndVerifyEvents(WANRollingUpgradeDUnitTest.java:227)
>         at 
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover(WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.java:98)
> {code}
> CI Failure: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/229#B
> Artifacts Available here: 
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0253/test-results/upgradeTest/1621635640/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to