[
https://issues.apache.org/jira/browse/KAFKA-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alyssa Huang updated KAFKA-18345:
---------------------------------
Description:
Right now this exponential backoff method can return a min backoff of 200ms and
cap out at the electionBackoffMaxMs (1s default). The logic for calculating
this exponential backoff isn't very straightforward, and probably no longer
needed now that Candidate transitions to Prospective after election loss. We
can simplify states by not tracking "retries" as well.
We _do_ still need a bit of randomness to ensure we can get past gridlocked
elections, but as Diego's Raft thesis shows, 50ms of random range is sufficient
to avoid repeat split votes in elections.
> Using more randomness improves worst-case behavior: with a 50 ms random
> range, the worst-case completion time (over 1,000 trials) was 513 ms.
We should investigate if it would make sense to replace the exponential backoff
with a non-exponential backoff that is more straightforward.
e.g.
{code:java}
int randomElectionBackoffMs() {
long backoff = random(0, 50ms)
return Math.min(state.remainingElectionTimeoutMs, backoff)
}{code}
Note, it's fine that the max is set to state.remainingElectionTimeoutMs
because that value is initialized with randomness in it (up to another
quorumConfig.electionTimeoutMs).
The alternative could also be using a lower default electionTimeoutMs (like
300ms mentioned in the Raft paper) and on election failure, waiting the extent
of state.remainingElectionTimeoutMs (which contains some jitter).
was:
Right now this exponential backoff method can return a min backoff of 200ms and
cap out at the electionBackoffMaxMs (1s default). The logic for calculating
this exponential backoff isn't very straightforward, and probably no longer
needed now that Candidate transitions to Prospective after election loss. We
can simplify states by not tracking "retries" as well.
We _do_ still need a bit of randomness to ensure we can get past gridlocked
elections, but as Diego's Raft thesis shows, 50ms of random range is sufficient
to avoid repeat split votes in elections.
> Using more randomness improves worst-case behavior: with a 50 ms random
> range, the worst-case completion time (over 1,000 trials) was 513 ms.
We should investigate if it would make sense to replace the exponential backoff
with a non-exponential backoff that is more straightforward.
e.g.
{code:java}
int randomElectionBackoffMs() {
long backoff = random(0, 50ms)
return Math.min(state.remainingElectionTimeoutMs, backoff)
}{code}
Note, it's fine to return state.remainingElectionTimeoutMs because that also
has randomness in it.
The alternative could also be using a lower default electionTimeoutMs (like
300ms mentioned in the Raft paper) and on election failure, waiting the extent
of state.remainingElectionTimeoutMs (which contains some jitter).
> Investigate if binaryExponentialElectionBackoffMs can be removed or improved
> ----------------------------------------------------------------------------
>
> Key: KAFKA-18345
> URL: https://issues.apache.org/jira/browse/KAFKA-18345
> Project: Kafka
> Issue Type: Improvement
> Reporter: Alyssa Huang
> Priority: Minor
>
> Right now this exponential backoff method can return a min backoff of 200ms
> and cap out at the electionBackoffMaxMs (1s default). The logic for
> calculating this exponential backoff isn't very straightforward, and probably
> no longer needed now that Candidate transitions to Prospective after election
> loss. We can simplify states by not tracking "retries" as well.
> We _do_ still need a bit of randomness to ensure we can get past gridlocked
> elections, but as Diego's Raft thesis shows, 50ms of random range is
> sufficient to avoid repeat split votes in elections.
> > Using more randomness improves worst-case behavior: with a 50 ms random
> > range, the worst-case completion time (over 1,000 trials) was 513 ms.
> We should investigate if it would make sense to replace the exponential
> backoff with a non-exponential backoff that is more straightforward.
> e.g.
> {code:java}
> int randomElectionBackoffMs() {
> long backoff = random(0, 50ms)
> return Math.min(state.remainingElectionTimeoutMs, backoff)
> }{code}
> Note, it's fine that the max is set to state.remainingElectionTimeoutMs
> because that value is initialized with randomness in it (up to another
> quorumConfig.electionTimeoutMs).
> The alternative could also be using a lower default electionTimeoutMs (like
> 300ms mentioned in the Raft paper) and on election failure, waiting the
> extent of state.remainingElectionTimeoutMs (which contains some jitter).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)