[ https://issues.apache.org/jira/browse/GEODE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Schuchardt closed GEODE-2865. ----------------------------------- > data loss in initial-image replication with multicast > ----------------------------------------------------- > > Key: GEODE-2865 > URL: https://issues.apache.org/jira/browse/GEODE-2865 > Project: Geode > Issue Type: Bug > Components: messaging > Reporter: Bruce Schuchardt > Fix For: 1.2.0 > > > During initial image replication ("get initial image") a state-flush > operation is performed to ensure that all in-flight operations are applied to > the region being replicated prior to replication starting. If multicast is > enabled for a region it is currently possible for the state-flush to miss one > or more in-flight operations, so that the new repilcate is missing changes > that are reflected in the region being replicated. > For example, process A sends a multicast put() replication message to process > B. Simultaneously process C is replicating the affected region and performs > a state-flush. Process A sends a state-stabilization message to process B > noting its multicast channel state (NAKACK2 outgoing message counter). > Process B receives this and waits for the multicast channel state to show > that it has received all of the messages. Process B then sends a > state-stabilized message to process C (the new replicate). > The state-stabilization algorithm in this case is faulty because it is > performed in the waiting-thread pool. The algorithm assumes that it is > executing in the serial-executor thread pool so that any messages that > happened before it have been applied to the region. This can allow messages > to have been received and scheduled for the serial-executor but not be > applied to the region before replication begins. > The membership manager should be modified to ensure that the serial-executor > queue has been flushed before giving the state-flush operation the go-ahead > to begin replication. -- This message was sent by Atlassian JIRA (v6.3.15#6346)