[ 
https://issues.apache.org/jira/browse/KAFKA-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alyssa Huang updated KAFKA-18345:
---------------------------------
    Description: 
Right now this exponential backoff method can return a min backoff of 200ms and 
cap out at the electionBackoffMaxMs (1s default). The logic for calculating 
this exponential backoff isn't very straightforward, and probably no longer 
needed now that Candidate transitions to Prospective after election loss. We 
can simplify states by not tracking "retries" as well.
We _do_ still need a bit of randomness to ensure we can get past gridlocked 
elections, but as Diego's Raft thesis shows, 50ms of random range is sufficient 
to avoid repeat split votes in elections. 

> Using more randomness improves worst-case behavior: with a 50 ms random 
> range, the worst-case completion time (over 1,000 trials) was 513 ms.

We should investigate if it would make sense to replace the exponential backoff 
with a non-exponential backoff that is more straightforward.
e.g.
{code:java}
int randomElectionBackoffMs() {
    long backoff = random(0, 50ms)
    return Math.min(state.remainingElectionTimeoutMs, backoff)
}{code}
 Note, it's fine that the max is set to state.remainingElectionTimeoutMs 
because that value is initialized with randomness in it (up to another 
quorumConfig.electionTimeoutMs).

The alternative could also be using a lower default electionTimeoutMs (like 
300ms mentioned in the Raft paper) and on election failure, waiting the extent 
of state.remainingElectionTimeoutMs (which contains some jitter).

  was:
Right now this exponential backoff method can return a min backoff of 200ms and 
cap out at the electionBackoffMaxMs (1s default). The logic for calculating 
this exponential backoff isn't very straightforward, and probably no longer 
needed now that Candidate transitions to Prospective after election loss. We 
can simplify states by not tracking "retries" as well.
We _do_ still need a bit of randomness to ensure we can get past gridlocked 
elections, but as Diego's Raft thesis shows, 50ms of random range is sufficient 
to avoid repeat split votes in elections. 

> Using more randomness improves worst-case behavior: with a 50 ms random 
> range, the worst-case completion time (over 1,000 trials) was 513 ms.

We should investigate if it would make sense to replace the exponential backoff 
with a non-exponential backoff that is more straightforward.
e.g.
{code:java}
int randomElectionBackoffMs() {
    long backoff = random(0, 50ms)
    return Math.min(state.remainingElectionTimeoutMs, backoff)
}{code}
 
Note, it's fine to return state.remainingElectionTimeoutMs because that also 
has randomness in it.

The alternative could also be using a lower default electionTimeoutMs (like 
300ms mentioned in the Raft paper) and on election failure, waiting the extent 
of state.remainingElectionTimeoutMs (which contains some jitter).


> Investigate if binaryExponentialElectionBackoffMs can be removed or improved
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-18345
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18345
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Alyssa Huang
>            Priority: Minor
>
> Right now this exponential backoff method can return a min backoff of 200ms 
> and cap out at the electionBackoffMaxMs (1s default). The logic for 
> calculating this exponential backoff isn't very straightforward, and probably 
> no longer needed now that Candidate transitions to Prospective after election 
> loss. We can simplify states by not tracking "retries" as well.
> We _do_ still need a bit of randomness to ensure we can get past gridlocked 
> elections, but as Diego's Raft thesis shows, 50ms of random range is 
> sufficient to avoid repeat split votes in elections. 
> > Using more randomness improves worst-case behavior: with a 50 ms random 
> > range, the worst-case completion time (over 1,000 trials) was 513 ms.
> We should investigate if it would make sense to replace the exponential 
> backoff with a non-exponential backoff that is more straightforward.
> e.g.
> {code:java}
> int randomElectionBackoffMs() {
>     long backoff = random(0, 50ms)
>     return Math.min(state.remainingElectionTimeoutMs, backoff)
> }{code}
>  Note, it's fine that the max is set to state.remainingElectionTimeoutMs 
> because that value is initialized with randomness in it (up to another 
> quorumConfig.electionTimeoutMs).
> The alternative could also be using a lower default electionTimeoutMs (like 
> 300ms mentioned in the Raft paper) and on election failure, waiting the 
> extent of state.remainingElectionTimeoutMs (which contains some jitter).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to