[ https://issues.apache.org/jira/browse/GEODE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768402#comment-15768402 ]
Jason Huynh commented on GEODE-2205: ------------------------------------ Commit as 164f04fbd85f20de6c7f9edef267d3f48463a954 Was committed with incorrect jira number of GEODE-2215. Should have been GEODE-2205 Commit 164f04fbd85f20de6c7f9edef267d3f48463a954 in geode's branch refs/heads/develop from Jason Huynh [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=164f04f ] GEODE-2215: GatewaySenderAdvisor checks the current processor to see if it has started Previously it was checking the top level sender (possibly a concurrent sendor) This allowed a race condition where the top level sender was still starting up but the individual processors were ready to process. They would check the flag and because the sender was not ready, the processors would act and start initiating failover, which left the processor in a very weird state > Race condition in startup of ConcurrentSerialGatewaySenderProcessor > ------------------------------------------------------------------- > > Key: GEODE-2205 > URL: https://issues.apache.org/jira/browse/GEODE-2205 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: Jason Huynh > Assignee: Jason Huynh > > ConcurrentSerialGatewayEventSenderProcessor spins up the individual > SerialGatewayEventSenderProcessors. During this time, the individual > processors will call waitForPrimary on the GatewaySenderAdvisor. The advisor > uses the stopped flag from ConcurrentSerialGatewayEventSenderProcessor, which > starts off as false (only set to true after all Serial processors are > started). > This is where the timing issue arises. If the serial processors start up and > the GatewaySenderAdvisor uses the flag from the Concurrent processor, the > serial senders will breaks out of the loop for waitingForPrimary and then > tries to handle failover. The Concurrent processor eventually sets it's flag > to true and everything continues to run. > If the serial processor was not a primary, it stays as a secondary and is in > a weird state where anything enqueued will throw an assert error. > This issue began due to changes in GEODE-2107: > c4ae846aa1689e2c5659b6ecc17e38689dd93976 -- This message was sent by Atlassian JIRA (v6.3.4#6332)