[ 
https://issues.apache.org/jira/browse/SOLR-14368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated SOLR-14368:
--------------------------------
    Description: 
h2. History

In the beginning of SolrCloud, to become leader a replica will need to _sync_ 
with other replicas, This process includes
 * Compare the current replica (leader’s candidate) tlog with others replicas. 
For example if current candidate’s data is too behind others, that replica 
should not become leader.
 * Requesting other replicas to do a sync back before become leader, so imagine 
when the old leader got shut down when it trying to send multiple updates (u1, 
u2, u3, u4) to others
 * Replica A may receive updates (u1, u2)
 * Replica B may receive updates (u3, u4)
 * If replica A becomes leader and it does not request replica B to sync back, 
replica B then needs to go into a recovery process which is costly.

But this process have some problem
 # We only sync with live replicas, so in case of no others live replicas at 
the time of the election, current replica can blindly become leader -> data 
loss, this problem was fixed with SOLR-11702
 # For any IOException which is not catched properly during the communication 
process with the current replica and others can prevent that replica becoming 
leader.

h2. Idea

Basically with new ShardTerms information, we can pick arbitrary replicas with 
the highest _term_ to become leader. The reason here is replica’s _term_ 
effectively represents how close a replica is up-to-date with the leader.

The only meaning of _sync_ with other replicas now is to prevent costly 
recovery processes from happening. Therefore SyncStrategy should not prevent a 
replica from becoming a leader.

  was:Update later...


> SyncStrategy result should not prevent a replica to become leader
> -----------------------------------------------------------------
>
>                 Key: SOLR-14368
>                 URL: https://issues.apache.org/jira/browse/SOLR-14368
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>
> h2. History
> In the beginning of SolrCloud, to become leader a replica will need to _sync_ 
> with other replicas, This process includes
>  * Compare the current replica (leader’s candidate) tlog with others 
> replicas. For example if current candidate’s data is too behind others, that 
> replica should not become leader.
>  * Requesting other replicas to do a sync back before become leader, so 
> imagine when the old leader got shut down when it trying to send multiple 
> updates (u1, u2, u3, u4) to others
>  * Replica A may receive updates (u1, u2)
>  * Replica B may receive updates (u3, u4)
>  * If replica A becomes leader and it does not request replica B to sync 
> back, replica B then needs to go into a recovery process which is costly.
> But this process have some problem
>  # We only sync with live replicas, so in case of no others live replicas at 
> the time of the election, current replica can blindly become leader -> data 
> loss, this problem was fixed with SOLR-11702
>  # For any IOException which is not catched properly during the communication 
> process with the current replica and others can prevent that replica becoming 
> leader.
> h2. Idea
> Basically with new ShardTerms information, we can pick arbitrary replicas 
> with the highest _term_ to become leader. The reason here is replica’s _term_ 
> effectively represents how close a replica is up-to-date with the leader.
> The only meaning of _sync_ with other replicas now is to prevent costly 
> recovery processes from happening. Therefore SyncStrategy should not prevent 
> a replica from becoming a leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to