[jira] [Commented] (CASSANDRA-18866) Node sends multiple inflight echos

Stefan Miklosovic (Jira) Tue, 10 Mar 2026 10:57:50 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064584#comment-18064584
 ]


Stefan Miklosovic commented on CASSANDRA-18866:
-----------------------------------------------

Thank you [~cam1982] for the next batch of fixes. Builds for that here, I do 
not see any failure anymore. 

[4.0 
circle|https://app.circleci.com/pipelines/github/instaclustr/cassandra/6295/workflows/f98dca01-45c2-429d-ae65-94f8b96117c2]
[4.1 circle| 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/6297/workflows/39cb7427-a64b-4978-96f4-c9f8dfb3075c]
[5.0 pre-ci|https://pre-ci.cassandra.apache.org/job/cassandra-5.0/101/]
[trunk|https://pre-ci.cassandra.apache.org/job/cassandra/475/#showFailuresLink]

> Node sends multiple inflight echos
> ----------------------------------
>
>                 Key: CASSANDRA-18866
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Gossip
>            Reporter: Cameron Zemek
>            Assignee: Cameron Zemek
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 18866-regression.patch, CASSANDRA-18866-4.0.patch, 
> CASSANDRA-18866-4.1.patch, CASSANDRA-18866-5.0.patch, duplicates.log, echo.log
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18866) Node sends multiple inflight echos

Reply via email to