[ 
https://issues.apache.org/jira/browse/CASSANDRA-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071542#comment-18071542
 ] 

Runtian Liu edited comment on CASSANDRA-21249 at 4/7/26 3:47 AM:
-----------------------------------------------------------------

4.1 PR, if this looks good, I can also add change for 4.0 and 5.0: 
https://github.com/apache/cassandra/pull/4713


was (Author: JIRAUSER291682):
4.1 PR, if this looks good, I can also add change for 4.0 and 5.0: 
https://github.com/apache/cassandra/pull/4711

> nodetool assassinate blocks the GOSSIP stage, causing the executing node to 
> be marked down and the liveness check to be ineffective
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21249
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21249
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Runtian Liu
>            Priority: Normal
>
> h2. Summary
> When {{nodetool assassinate}} is executed against a remote endpoint, two 
> problems occur:
> The node running the command is marked as DOWN by other nodes in the cluster. 
> Peers lose the ability to gossip with the executing node for ~34 seconds, 
> causing the phi accrual failure detector to convict it.
> The liveness check is ineffective. The command is supposed to refuse to 
> assassinate a live node by checking whether the target's heartbeat changes 
> during a 30-second observation window. In practice, the check always passes 
> trivially, meaning a fully healthy node can be assassinated without any 
> warning.
> h2. Root Cause
> The {{assassinateEndpoint}} method in {{Gossiper.java}} runs its entire body 
> — including a 30-second sleep ({{{}RING_DELAY{}}}) and a subsequent 4-second 
> sleep — inside {{{}runInGossipStageBlocking(){}}}. This submits the work to 
> {{{}Stage.GOSSIP{}}}, which is a single-threaded executor.
> While the ~34 seconds of sleep execute on the GOSSIP stage thread:
> All inbound gossip message processing is blocked. {{{}GossipDigestSyn{}}}, 
> {{{}GossipDigestAck{}}}, and {{GossipDigestAck2}} handlers are all dispatched 
> to {{{}Stage.GOSSIP{}}}. They queue up and cannot execute.
> The failure detector on peer nodes receives no heartbeat updates from the 
> executing node, because heartbeat reporting ({{{}notifyFailureDetector{}}}) 
> only happens during ACK/ACK2 processing — which is blocked. After ~10–20 
> seconds of silence, peers convict the executing node as dead.
> The liveness check cannot detect a live target. The check compares the 
> target's heartbeat before and after the 30-second sleep. However, the only 
> code path that updates a remote endpoint's heartbeat on the local node is 
> {{applyNewStates}} → {{{}setHeartBeatState{}}}, which runs on 
> {{Stage.GOSSIP}} during gossip message processing — the very processing that 
> is blocked. The heartbeat cannot change during the sleep, so the comparison 
> always passes.
> h2. Regression Origin
> This was introduced by CASSANDRA-15059 , "Fix assorted gossip races and add 
> related runtime checks", 2019-03-21).
> The commit's goal was to ensure that gossip state mutations only happen on 
> the GOSSIP stage thread, adding {{runInGossipStageBlocking()}} wrappers 
> throughout {{{}Gossiper{}}}. This was correct for methods like 
> {{{}convict{}}}, {{{}markAsShutdown{}}}, {{{}removeEndpoint{}}}, and 
> {{{}evictFromMembership{}}}. However, wrapping the entire 
> {{assassinateEndpoint}} moved the long-duration sleeps onto the GOSSIP stage 
> as an unintended side effect.
> Before CASSANDRA-15059: {{assassinateEndpoint}} ran entirely on the JMX 
> thread. The GOSSIP stage remained free to process messages during the 
> 30-second sleep, heartbeat updates from the target were visible, and the 
> liveness check correctly detected live nodes.
> After CASSANDRA-15059: The entire method body runs on the GOSSIP stage. The 
> liveness check is broken and the executing node gets convicted.
> h2. Affected Versions
> All versions carrying CASSANDRA-15059 that use gossip-based assassination: 
> 3.0.28+, 3.11.x, 4.0.x, 4.1.x, 5.0.x. Trunk (5.1+) is not affected because 
> gossip-based assassination was replaced by the TCM system.
> h2. Proposed Fix
> Split {{assassinateEndpoint}} into phases so that only the state mutations 
> run on the GOSSIP stage, while reads and sleeps run on the caller (JMX) 
> thread:
> Phase 1 (JMX thread): Snapshot the target's heartbeat generation and version. 
> This is safe off the GOSSIP stage because {{endpointStateMap}} is a 
> {{ConcurrentHashMap}} and {{HeartBeatState}} fields are {{{}volatile{}}}.
> Phase 2 (JMX thread): Sleep for {{RING_DELAY}} — the GOSSIP stage remains 
> free to process messages, so if the target is alive, its heartbeat updates 
> will be received.
> Phase 3 (GOSSIP stage): Re-read the target's heartbeat. If generation or 
> version changed, throw {{{}RuntimeException("Endpoint still alive"){}}}. 
> Otherwise, perform the assassination (set status to LEFT, call 
> {{{}handleMajorStateChange{}}}).
> Phase 4 (JMX thread): Sleep for {{intervalInMillis * 4}} to allow gossip 
> propagation.
> This restores the pre-CASSANDRA-15059 behavior: the GOSSIP stage is only held 
> for the brief duration of state mutations in Phase 3, the liveness check 
> works because heartbeat updates are processed during the sleep, and the 
> executing node is not convicted by peers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to