Blake Bender created GEODE-9147:
-----------------------------------

             Summary: Dropped keys in single-hop PUTALL request when one or 
more servers is unreachable
                 Key: GEODE-9147
                 URL: https://issues.apache.org/jira/browse/GEODE-9147
             Project: Geode
          Issue Type: Bug
          Components: native client
            Reporter: Blake Bender


For single-hop PUTALL, the request from the app is broken up in Geode native as 
follows:

i. Each value is hashed to a bucket, the server corresponding to the bucket is 
looked up in the metadata, and the value is added to a server-specific list for 
that server.

ii. When all values are added to a list, Geode native spins up a thread for 
each list, and sends a PUTALL to each server.

 

When a server can't be reached by Geode native, its entries are removed from 
the metadata, and the bucket-to-server lookup fails.  This situation is handled 
as follows:
i. the size of the "leftover keys" list is divided by the number of servers, 
then 1 added to compensate for any fractional piece.

ii. That many keys are added to each remaining list going to a server that is 
still reachable.

iii. We proceed normally, and send one list to each server, on its own thread.

 

_Unfortunately_, this scenario can lead to data loss, because each of the 
fractional pieces of the list going to the unreachable server has an eventId 
with the same threadId and incrementing sequenceId.  Thus, if any of our PUTALL 
threads send out-of-order, the earlier sequenceIds will be marked as already 
"seen" on the server and _dropped_.

 

We have identified 3 ways to solve this problem:

i. In the "big" PUTALL, tack all the keys for the unreachable server onto a 
single one of the existing server-specfic lists

ii. Keep the keys for the unreachable server in its own separate list, and just 
send that in a PUTALL to a randomly-selected server we _can_ reach.

iii. Just punt completely and drop back to multi-hop, sending _all_ the keys in 
the "big" PUTALL in a single list.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to