[
https://issues.apache.org/jira/browse/CASSANDRA-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071542#comment-18071542
]
Runtian Liu edited comment on CASSANDRA-21249 at 4/7/26 3:47 AM:
-----------------------------------------------------------------
4.1 PR, if this looks good, I can also add change for 4.0 and 5.0:
https://github.com/apache/cassandra/pull/4713
was (Author: JIRAUSER291682):
4.1 PR, if this looks good, I can also add change for 4.0 and 5.0:
https://github.com/apache/cassandra/pull/4711
> nodetool assassinate blocks the GOSSIP stage, causing the executing node to
> be marked down and the liveness check to be ineffective
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21249
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Runtian Liu
> Priority: Normal
>
> h2. Summary
> When {{nodetool assassinate}} is executed against a remote endpoint, two
> problems occur:
> The node running the command is marked as DOWN by other nodes in the cluster.
> Peers lose the ability to gossip with the executing node for ~34 seconds,
> causing the phi accrual failure detector to convict it.
> The liveness check is ineffective. The command is supposed to refuse to
> assassinate a live node by checking whether the target's heartbeat changes
> during a 30-second observation window. In practice, the check always passes
> trivially, meaning a fully healthy node can be assassinated without any
> warning.
> h2. Root Cause
> The {{assassinateEndpoint}} method in {{Gossiper.java}} runs its entire body
> — including a 30-second sleep ({{{}RING_DELAY{}}}) and a subsequent 4-second
> sleep — inside {{{}runInGossipStageBlocking(){}}}. This submits the work to
> {{{}Stage.GOSSIP{}}}, which is a single-threaded executor.
> While the ~34 seconds of sleep execute on the GOSSIP stage thread:
> All inbound gossip message processing is blocked. {{{}GossipDigestSyn{}}},
> {{{}GossipDigestAck{}}}, and {{GossipDigestAck2}} handlers are all dispatched
> to {{{}Stage.GOSSIP{}}}. They queue up and cannot execute.
> The failure detector on peer nodes receives no heartbeat updates from the
> executing node, because heartbeat reporting ({{{}notifyFailureDetector{}}})
> only happens during ACK/ACK2 processing — which is blocked. After ~10–20
> seconds of silence, peers convict the executing node as dead.
> The liveness check cannot detect a live target. The check compares the
> target's heartbeat before and after the 30-second sleep. However, the only
> code path that updates a remote endpoint's heartbeat on the local node is
> {{applyNewStates}} → {{{}setHeartBeatState{}}}, which runs on
> {{Stage.GOSSIP}} during gossip message processing — the very processing that
> is blocked. The heartbeat cannot change during the sleep, so the comparison
> always passes.
> h2. Regression Origin
> This was introduced by CASSANDRA-15059 , "Fix assorted gossip races and add
> related runtime checks", 2019-03-21).
> The commit's goal was to ensure that gossip state mutations only happen on
> the GOSSIP stage thread, adding {{runInGossipStageBlocking()}} wrappers
> throughout {{{}Gossiper{}}}. This was correct for methods like
> {{{}convict{}}}, {{{}markAsShutdown{}}}, {{{}removeEndpoint{}}}, and
> {{{}evictFromMembership{}}}. However, wrapping the entire
> {{assassinateEndpoint}} moved the long-duration sleeps onto the GOSSIP stage
> as an unintended side effect.
> Before CASSANDRA-15059: {{assassinateEndpoint}} ran entirely on the JMX
> thread. The GOSSIP stage remained free to process messages during the
> 30-second sleep, heartbeat updates from the target were visible, and the
> liveness check correctly detected live nodes.
> After CASSANDRA-15059: The entire method body runs on the GOSSIP stage. The
> liveness check is broken and the executing node gets convicted.
> h2. Affected Versions
> All versions carrying CASSANDRA-15059 that use gossip-based assassination:
> 3.0.28+, 3.11.x, 4.0.x, 4.1.x, 5.0.x. Trunk (5.1+) is not affected because
> gossip-based assassination was replaced by the TCM system.
> h2. Proposed Fix
> Split {{assassinateEndpoint}} into phases so that only the state mutations
> run on the GOSSIP stage, while reads and sleeps run on the caller (JMX)
> thread:
> Phase 1 (JMX thread): Snapshot the target's heartbeat generation and version.
> This is safe off the GOSSIP stage because {{endpointStateMap}} is a
> {{ConcurrentHashMap}} and {{HeartBeatState}} fields are {{{}volatile{}}}.
> Phase 2 (JMX thread): Sleep for {{RING_DELAY}} — the GOSSIP stage remains
> free to process messages, so if the target is alive, its heartbeat updates
> will be received.
> Phase 3 (GOSSIP stage): Re-read the target's heartbeat. If generation or
> version changed, throw {{{}RuntimeException("Endpoint still alive"){}}}.
> Otherwise, perform the assassination (set status to LEFT, call
> {{{}handleMajorStateChange{}}}).
> Phase 4 (JMX thread): Sleep for {{intervalInMillis * 4}} to allow gossip
> propagation.
> This restores the pre-CASSANDRA-15059 behavior: the GOSSIP stage is only held
> for the brief duration of state mutations in Phase 3, the liveness check
> works because heartbeat updates are processed during the sleep, and the
> executing node is not convicted by peers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]