> Sounds like the request was to hit the pause button until TCM merged rather > than skipping the work entirely so that's promising.
Correct, I was only asked to wait a few days and to rebase after TCM merged. The issue was that I had to time box this work and the fact it hit issues kinda became the reason this didn’t get merged… I didn’t have time to debug them! I would be more than glad to see someone pickup this work, I can spare cycles for review =) > On May 16, 2024, at 10:57 AM, Josh McKenzie <jmcken...@apache.org> wrote: > > I'm +1 to continuing work on CASSANDRA-18917 for all the reasons Jordan > listed. > > Sounds like the request was to hit the pause button until TCM merged rather > than skipping the work entirely so that's promising. > > On Thu, May 16, 2024, at 1:43 PM, Jon Haddad wrote: >> I have also recently worked with a teams who lost critical data as a result >> of gossip issues combined with collision in our token allocation. I haven’t >> filed a jira yet as it slipped my mind but I’ve seen it in my own testing as >> well. I’ll get a JIRA in describing it in detail. >> >> It’s severe enough that it should probably block 5.0. >> >> Jon >> >> On Thu, May 16, 2024 at 10:37 AM Jordan West <jw...@apache.org >> <mailto:jw...@apache.org>> wrote: >> I’m a big +1 on 18917 or more testing of gossip. While I appreciate that it >> makes TCM more complicated, gossip and schema propagation bugs have been the >> source of our two worst data loss events in the last 3 years. Data loss >> should immediately cause us to evaluate what we can do better. >> >> We will likely live with gossip for at least 1, maybe 2, more years. >> Otherwise outside of bug fixes (and to some degree even still) I think the >> only other solution is to not touch gossip *at all* until we are all >> TCM-only which I don’t think is practical or realistic. recent changes to >> gossip in 4.1 introduced several subtle bugs that had serious impact (from >> data loss to loss of ability to safely replace nodes in the cluster). >> >> I am happy to contribute some time to this if lack of folks is the issue. >> >> Jordan >> >> On Mon, May 13, 2024 at 17:05 David Capwell <dcapw...@apple.com >> <mailto:dcapw...@apple.com>> wrote: >> So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which >> lets you do deterministic gossip simulation testing cross large clusters >> within seconds… I stopped this work as it conflicted with TCM (they were >> trying to merge that week) and it hit issues where some nodes never >> converged… I didn’t have time to debug so I had to drop the patch… >> >> This type of change would be a good reason to resurrect that patch as >> testing gossip is super dangerous right now… its behavior is only in a few >> peoples heads and even then its just bits and pieces scattered cross >> multiple people (and likely missing pieces)… >> >> My brain is far too fried right now to say your idea is safe or not, but >> honestly feel that we would need to improve our tests (we have 0) before >> making such a change… >> >> I do welcome the patch though... >> >> >>> On May 12, 2024, at 8:05 PM, Zemek, Cameron via dev >>> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote: >>> >>> In looking into CASSANDRA-19580 I noticed something that raises a question. >>> With Gossip SYN it doesn't check for missing digests. If its empty for >>> shadow round it will add everything from endpointStateMap to the reply. But >>> why not included missing entries in normal replies? The branching for reply >>> handling of SYN requests could then be merged into single code path (though >>> shadow round handles empty state different with CASSANDRA-16213). Potential >>> is performance impact as this requires doing a set difference. >>> >>> For example, something along the lines of: >>> >>> ``` >>> Set<InetAddressAndPort> missing = new >>> HashSet<>(endpointStateMap.keySet()); >>> >>> missing.removeAll(gDigestList.stream().map(GossipDigest::getEndpoint).collect(Collectors.toSet())); >>> for ( InetAddressAndPort endpoint : missing) >>> { >>> gDigestList.add(new GossipDigest(endpoint, 0, 0)); >>> } >>> ``` >>> >>> It seems odd to me that after shadow round for a new node we have >>> endpointStateMap with only itself as an entry. Then the only way it gets >>> the gossip state is by another node choosing to send the new node a gossip >>> SYN. The choosing of this is random. Yeah this happens every second so >>> eventually its going to receive one (outside the issue of CASSANDRA-19580 >>> were it doesn't if its in a dead state like hibernate) , but doesn't this >>> open up bootstrapping to failures on very large clusters as it can take >>> longer before its sent a SYN (as the odds of being chosen for SYN get >>> lower)? For years been seeing bootstrap failures with 'Unable to contact >>> any seeds' but they are infrequent and never been able to figure out how to >>> reproduce in order to open a ticket, but I wonder if some of them have been >>> due to not receiving a SYN message before it does the seenAnySeed check.