[jira] [Created] (LUCENE-9630) Allow Shard Leader to give up leadership gracefully via shard terms

Mike Drob (Jira) Wed, 02 Dec 2020 20:36:08 -0800

Mike Drob created LUCENE-9630:
---------------------------------

             Summary: Allow Shard Leader to give up leadership gracefully via 
shard terms
                 Key: LUCENE-9630
                 URL: https://issues.apache.org/jira/browse/LUCENE-9630
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Mike Drob



Currently we have (via SOLR-12412) that when a leader sees an index writing 
error during an update it will give up leadership by deleting the replica and 
adding a new replica. One stated benefit of this was that because we are using 
the overseer and a known code path, that this is done asynchronous and very 
efficiently.

I would argue that this approach is too heavy handed.

In the case of a corrupt index exception, it makes some sense to completely 
delete the index dir and attempt to sync from a good peer. Even in this case, 
however, it might be better to allow fingerprinting and other index delta 
mechanisms take over and allow for a more efficient data transfer.

In an alternate case where the index error arises due to a disconnected file 
system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) and 
the required solution is some kind of reconnect, then this approach has several 
shortcomings - the core delete and creations are going to fail leaving dangling 
replicas. Further, the data is still present so there is no need to do so many 
extra copies.

I propose that we bring in a mechanism to give up leadership via the existing 
shard terms language. I believe we would be able to set all replicas currently 
equal to leader term T to T+1, and then trigger a new leader election. The 
current leader would know it is ineligible, while the other replicas that were 
current before the failed update would be eligible. This improvement would 
entail adding an additional possible operation to terms state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9630) Allow Shard Leader to give up leadership gracefully via shard terms

Reply via email to