Re: [DISCUSS] Gossip Protocol Change

David Capwell Thu, 16 May 2024 11:07:09 -0700

> Sounds like the request was to hit the pause button until TCM merged rather 
> than skipping the work entirely so that's promising.


Correct, I was only asked to wait a few days and to rebase after TCM merged.  
The issue was that I had to time box this work and the fact it hit issues kinda 
became the reason this didn’t get merged… I didn’t have time to debug them!

I would be more than glad to see someone pickup this work, I can spare cycles 
for review =)

> On May 16, 2024, at 10:57 AM, Josh McKenzie <jmcken...@apache.org> wrote:
> 
> I'm +1 to continuing work on CASSANDRA-18917 for all the reasons Jordan 
> listed.
> 
> Sounds like the request was to hit the pause button until TCM merged rather 
> than skipping the work entirely so that's promising.
> 
> On Thu, May 16, 2024, at 1:43 PM, Jon Haddad wrote:
>> I have also recently worked with a teams who lost critical data as a result 
>> of gossip issues combined with collision in our token allocation.  I haven’t 
>> filed a jira yet as it slipped my mind but I’ve seen it in my own testing as 
>> well. I’ll get a JIRA in describing it in detail. 
>>  
>> It’s severe enough that it should probably block 5.0. 
>> 
>> Jon
>> 
>> On Thu, May 16, 2024 at 10:37 AM Jordan West <jw...@apache.org 
>> <mailto:jw...@apache.org>> wrote:
>> I’m a big +1 on 18917 or more testing of gossip. While I appreciate that it 
>> makes TCM more complicated, gossip and schema propagation bugs have been the 
>> source of our two worst data loss events in the last 3 years. Data loss 
>> should immediately cause us to evaluate what we can do better. 
>> 
>> We will likely live with gossip for at least 1, maybe 2, more years. 
>> Otherwise outside of bug fixes (and to some degree even still) I think the 
>> only other solution is to not touch gossip *at all* until we are all 
>> TCM-only which I don’t think is practical or realistic. recent changes to 
>> gossip in 4.1 introduced several subtle bugs that had serious impact (from 
>> data loss to loss of ability to safely replace nodes in the cluster). 
>> 
>> I am happy to contribute some time to this if lack of folks is the issue. 
>> 
>> Jordan 
>> 
>> On Mon, May 13, 2024 at 17:05 David Capwell <dcapw...@apple.com 
>> <mailto:dcapw...@apple.com>> wrote:
>> So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which 
>> lets you do deterministic gossip simulation testing cross large clusters 
>> within seconds… I stopped this work as it conflicted with TCM (they were 
>> trying to merge that week) and it hit issues where some nodes never 
>> converged… I didn’t have time to debug so I had to drop the patch…
>> 
>> This type of change would be a good reason to resurrect that patch as 
>> testing gossip is super dangerous right now… its behavior is only in a few 
>> peoples heads and even then its just bits and pieces scattered cross 
>> multiple people (and likely missing pieces)… 
>> 
>> My brain is far too fried right now to say your idea is safe or not, but 
>> honestly feel that we would need to improve our tests (we have 0) before 
>> making such a change… 
>> 
>> I do welcome the patch though...
>> 
>> 
>>> On May 12, 2024, at 8:05 PM, Zemek, Cameron via dev 
>>> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote:
>>> 
>>> In looking into CASSANDRA-19580 I noticed something that raises a question. 
>>> With Gossip SYN it doesn't check for missing digests. If its empty for 
>>> shadow round it will add everything from endpointStateMap to the reply. But 
>>> why not included missing entries in normal replies? The branching for reply 
>>> handling of SYN requests could then be merged into single code path (though 
>>> shadow round handles empty state different with CASSANDRA-16213). Potential 
>>> is performance impact as this requires doing a set difference.
>>> 
>>> For example, something along the lines of:
>>> 
>>> ```
>>>         Set<InetAddressAndPort> missing = new 
>>> HashSet<>(endpointStateMap.keySet());
>>>         
>>> missing.removeAll(gDigestList.stream().map(GossipDigest::getEndpoint).collect(Collectors.toSet()));
>>>         for ( InetAddressAndPort endpoint : missing)
>>>         {
>>>             gDigestList.add(new GossipDigest(endpoint, 0, 0));
>>>         }
>>> ```
>>> 
>>> It seems odd to me that after shadow round for a new node we have 
>>> endpointStateMap with only itself as an entry. Then the only way it gets 
>>> the gossip state is by another node choosing to send the new node a gossip 
>>> SYN. The choosing of this is random. Yeah this happens every second so 
>>> eventually its going to receive one (outside the issue of CASSANDRA-19580 
>>> were it doesn't if its in a dead state like hibernate) , but doesn't this 
>>> open up bootstrapping to failures on very large clusters as it can take 
>>> longer before its sent a SYN (as the odds of being chosen for SYN get 
>>> lower)? For years been seeing bootstrap failures with 'Unable to contact 
>>> any seeds' but they are infrequent and never been able to figure out how to 
>>> reproduce in order to open a ticket, but I wonder if some of them have been 
>>> due to not receiving a SYN message before it does the seenAnySeed check.

Re: [DISCUSS] Gossip Protocol Change

Reply via email to