[ https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155779#comment-17155779 ]
ASF subversion and git services commented on GEODE-8338: -------------------------------------------------------- Commit 25bb3b53fdb31a28bde5376bb105ee0ed2414c9a in geode's branch refs/heads/develop from Sarah Abbey [ https://gitbox.apache.org/repos/asf?p=geode.git;h=25bb3b5 ] GEODE-8338: change redis commands not be repeated when a server dies (#5351) The redis functions are no longer HA. The product does have some cases when it can safely retry the function but if a server dies the client will see a redis error containing "memberDeparted". In that case the client app can check to see if the redis operation should be done again, or if it already happened even though a server died. Co-authored-by: Sarah Abbey <sab...@vmware.com> Co-authored-by: Darrel Schneider <dar...@vmware.com> > Redis commands may be repeated when server dies > ----------------------------------------------- > > Key: GEODE-8338 > URL: https://issues.apache.org/jira/browse/GEODE-8338 > Project: Geode > Issue Type: Bug > Components: redis > Reporter: Sarah Abbey > Priority: Major > > Since we have one redundant copy of the data, and since we modify the data > using a function, I think we may have a data corruption issue with > non-idempotent operations. What can happen is that an operation like APPEND > can: > 0) executor called on non-primary redis server, > 1) modify the primary (by sending a function exec to it), > 2) modify the secondary (by sending a geode delta to it), > 3) the primary server fails now (before the function executing on it > completes), > 4) the non-primary redis server sees the function fail and that it is marked > as HA so it retries it. This time it sends it the secondary, which is the new > primary, but the operation was actually done on the secondary so this retry > will end up doing the operation twice. > This may be okay for certain ops (like SADD) that are idempotent (but even > they could cause extra key events in the future), but for ops like APPEND we > end up appending twice. > This will only happen when a server executing a function dies and our > function service retries the function on another server because it is marked > HA. The easy way to fix this is to change our function to not be HA. This is > just a single one line change. > Note that our clients can already see exceptions/errors if the server they > are connected to dies. When that happens the operation they requested may > have happened, and if they have multiple geode redis servers running it may > have been stored and still in memory. So clients will need some logic to > decide if they should redo such an operation or not (because it is already > done). > *Note:* By making the function non-HA, it should just give the client another > case in which they need to handle a server crash. It can now be for servers > they were not connected to but that were involved in performing the operation > they requested. -- This message was sent by Atlassian Jira (v8.3.4#803005)