[
https://issues.apache.org/jira/browse/IGNITE-28395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandr Shapkin reassigned IGNITE-28395:
-----------------------------------------
Assignee: Denis Chudov
> Lease updater accumulates concurrent in-flight invocations causing constant
> CAS failures
> ----------------------------------------------------------------------------------------
>
> Key: IGNITE-28395
> URL: https://issues.apache.org/jira/browse/IGNITE-28395
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Assignee: Denis Chudov
> Priority: Major
> Labels: ignite-3
>
> Lease Updater fires invoke to Meta storage asynchronously every 500ms without
> waiting for the previous one to complete. This causes multiple concurrent
> invocations with the same expected lease state — only one wins the CAS, the
> rest fail with
> {code:java}
> Lease update invocation failed because of outdated lease data on this
> node{code}
> As a result, roughly once per minute the lease expires before renewal.
> Simply reading fresh data from storage before each invoke does not help:
> previous invocations are already in-flight and will complete after the read,
> making the freshly-read state outdated by the time the new invoke reaches
> storage.
> *Fix*
> Track in-flight invoke as a future. On each tick, if the previous future is
> not complete — block with `future.get(timeout)` before reading from lease
> tracker and firing the next invoke. This guarantees at most one in-flight
> invoke at any time and that the lease state is read only after the previous
> update has landed. Timeout should be around leaseInterval/2 - after that, the
> leases most likely will expire anyway.
> Also, there may be lag between future completion and lease map update in
> lease tracker, so lease map still may be stale. We can return written leases
> from successful invoke itself. In the case of invoke failure, the map from
> lease tracker should be used.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)