[
https://issues.apache.org/jira/browse/IGNITE-28395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-28395:
----------------------------------
Description:
Lease Updater fires invoke to Meta storage asynchronously every 500ms without
waiting for the previous one to complete. This causes multiple concurrent
invocations with the same expected lease state — only one wins the CAS, the
rest fail with `Lease update invocation failed because of outdated lease data
on this node`. As a side effect, roughly once per minute the lease expires
before renewal.
Simply reading fresh data from storage before each invoke does not help:
previous invocations are already in-flight and will complete after the read,
making the freshly-read state outdated by the time the new invoke reaches
storage.
*Fix*
Track in-flight invoke as a future. On each tick, if the previous future is not
complete — block with `future.get(timeout)` before reading from lease tracker
and firing the next invoke. This guarantees at most one in-flight invoke at any
time and that the lease state is read only after the previous update has
landed. Timeout should be well below lease duration to guarantee renewal even
under degraded network.
was:
Lease Updater fires invoke to Meta Storage asynchronously every 500ms without
waiting for the previous one to complete. This causes multiple concurrent
invocations with the same expected lease state — only one wins the CAS, the
rest fail with `Lease update invocation failed because of outdated lease data
on this node`. As a side effect, roughly once per minute the lease expires
before renewal.
Simply reading fresh data from storage before each invoke does not help:
previous invocations are already in-flight and will complete after the read,
making the freshly-read state outdated by the time the new invoke reaches
storage.
*Fix*
Track in-flight invoke as a future. On each tick, if the previous future is not
complete — block with `future.get(timeout)` before reading from lease tracker
and firing the next invoke. This guarantees at most one in-flight invoke at any
time and that the lease state is read only after the previous update has
landed. Timeout should be well below lease duration to guarantee renewal even
under degraded network.
> Lease Updater accumulates concurrent in-flight invocations causing constant
> CAS failures
> ----------------------------------------------------------------------------------------
>
> Key: IGNITE-28395
> URL: https://issues.apache.org/jira/browse/IGNITE-28395
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Priority: Major
> Labels: ignite-3
>
> Lease Updater fires invoke to Meta storage asynchronously every 500ms without
> waiting for the previous one to complete. This causes multiple concurrent
> invocations with the same expected lease state — only one wins the CAS, the
> rest fail with `Lease update invocation failed because of outdated lease data
> on this node`. As a side effect, roughly once per minute the lease expires
> before renewal.
> Simply reading fresh data from storage before each invoke does not help:
> previous invocations are already in-flight and will complete after the read,
> making the freshly-read state outdated by the time the new invoke reaches
> storage.
> *Fix*
> Track in-flight invoke as a future. On each tick, if the previous future is
> not complete — block with `future.get(timeout)` before reading from lease
> tracker and firing the next invoke. This guarantees at most one in-flight
> invoke at any time and that the lease state is read only after the previous
> update has landed. Timeout should be well below lease duration to guarantee
> renewal even under degraded network.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)