[ 
https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155779#comment-17155779
 ] 

ASF subversion and git services commented on GEODE-8338:
--------------------------------------------------------

Commit 25bb3b53fdb31a28bde5376bb105ee0ed2414c9a in geode's branch 
refs/heads/develop from Sarah Abbey
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=25bb3b5 ]

GEODE-8338: change redis commands not be repeated when a server dies (#5351)

The redis functions are no longer HA.
The product does have some cases when it can safely retry the function
but if a server dies the client will see a redis error containing 
"memberDeparted".
In that case the client app can check to see if the redis operation should be 
done
again, or if it already happened even though a server died.

Co-authored-by: Sarah Abbey <sab...@vmware.com>
Co-authored-by: Darrel Schneider <dar...@vmware.com>

> Redis commands may be repeated when server dies
> -----------------------------------------------
>
>                 Key: GEODE-8338
>                 URL: https://issues.apache.org/jira/browse/GEODE-8338
>             Project: Geode
>          Issue Type: Bug
>          Components: redis
>            Reporter: Sarah Abbey
>            Priority: Major
>
> Since we have one redundant copy of the data, and since we modify the data 
> using a function, I think we may have a data corruption issue with 
> non-idempotent operations. What can happen is that an operation like APPEND 
> can:
>  0) executor called on non-primary redis server, 
>  1) modify the primary (by sending a function exec to it), 
>  2) modify the secondary (by sending a geode delta to it), 
>  3) the primary server fails now (before the function executing on it 
> completes), 
>  4) the non-primary redis server sees the function fail and that it is marked 
> as HA so it retries it. This time it sends it the secondary, which is the new 
> primary, but the operation was actually done on the secondary so this retry 
> will end up doing the operation twice.
> This may be okay for certain ops (like SADD) that are idempotent (but even 
> they could cause extra key events in the future), but for ops like APPEND we 
> end up appending twice.
> This will only happen when a server executing a function dies and our 
> function service retries the function on another server because it is marked 
> HA. The easy way to fix this is to change our function to not be HA. This is 
> just a single one line change.
>  Note that our clients can already see exceptions/errors if the server they 
> are connected to dies. When that happens the operation they requested may 
> have happened, and if they have multiple geode redis servers running it may 
> have been stored and still in memory. So clients will need some logic to 
> decide if they should redo such an operation or not (because it is already 
> done).
> *Note:* By making the function non-HA, it should just give the client another 
> case in which they need to handle a server crash. It can now be for servers 
> they were not connected to but that were involved in performing the operation 
> they requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to