Denis Chudov created IGNITE-28420:
-------------------------------------

             Summary: Leaseholder balancing: Timeout-based replica switch
                 Key: IGNITE-28420
                 URL: https://issues.apache.org/jira/browse/IGNITE-28420
             Project: Ignite
          Issue Type: Improvement
          Components: placement driver ai3
            Reporter: Denis Chudov


This improves on the ungraceful solution by allowing to trade transaction 
failures for a latency spike.

The algorithm is:
 # The user invokes partitions rebalance-primaries --wait-lease 
--extra-wait-time 30sec.
 # Placement driver identifies all partitions P that need to be rebalanced from 
their current replica leases L.
 # Placement driver marks L.isCondemned = true, then sleeps for extraWaitTime.
 # isCondemned signals to txns that the current is will soon expire. On txn 
coordinators, awaitPrimaryReplica treats isCondemned == true as if the lease 
didn't exist, i.e. it waits for the next lease to start.
 # After extraWaitTime passes, the placement driver tells the node 
L.leaseholderId to give up the lease at the end of the current term, and not to 
attempt to be elected in the next term.
 # On the next election, a new primary is elected and the lease is saved 
normally, with isCondemned = false. Awaiting txns all proceed.

The trade-off of this solution vs ungraceful is that we get a latency spike on 
new txns but potentially avoid txn failures:
 * New txns see latency spikes of up to extraWaitTime + leaseExpirationInterval.
 * Existing txns that don't finish before extraWaitTime + 
leaseExpirationInterval still fail.

The advantage here is that the user is in control - they know their system and 
can decide if they can handle failed txns, latency spikes, how long do their 
txns take, etc. If all their txns are shorter than leaseExpirationInterval 
(very common) then rebalance-primaries --wait-lease allows to have no failures 
and just up to leaseExpirationInterval latency spike - which is already 
possible in other failure scenarios. Note also that if the primary is now 
overloaded, the user is already risking or even experiencing latency spikes 
from the overload.

Note that there may be a more clever management of the wait time than proposed, 
especially for cases when all txns are sub-second - in that case we may only 
need to condemn the lease for the its last fraction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to