Im on leave so not really going to look too close, but as someone who has 
worked on gossip a lot, im hesitant to add more state to it; more chances for 
hard to understand race bugs that brick gossip (its taken years to get 
stable-ish… there are still issues that we don’t know how to repo/fix).

Just looking at the linked JIRA summary "Instances from a 2nd ring join another 
ring when running on the same nodes” it feels that internode auth (block nodes 
from joining the wrong ring) is the best solution here? Also gossip does 
validate the cluster id / partitioner, we do this in `GossipDigestSyn`.  So 
feels like there is something else going on and modifying gossip isn’t the 
right track?  This metadata isn’t part of the application state, but its 
already part of the gossip protocol, so adding a json payload to have the same 
details im not sure how that solves the reported problem?

> On Mar 3, 2026, at 10:39 AM, Caleb Rackliffe <[email protected]> wrote:
> 
> I'm a little hesitant to allow a generic JSON payload, but still need to 
> think a bit on it.
> 
> On Tue, Mar 3, 2026 at 1:43 AM Berenguer Blasi <[email protected]> 
> wrote:
> Hi,
> 
> We've seen this issue in some production systems and I've been asked to 
> raise this to the list for visibility.
> 
> The main idea[1] is to propagate partitioner and cluster name through 
> Gossip and validate these. The approach I took is to Json encode those 
> in a generic JSON_PAYLOAD new AppState but I lack the historical context 
> as to why enum ordinals were used in the first place: Imo Json encoding 
> going forward:
> 
> - Prevents burning extra AppStates
> 
> - Prevents forks with custom AppStates on online rolling upgrades to OSS 
> conflict on the mapping (scary)
> 
> - Friendlier to being extended, customized and more robust towards 
> modifications
> 
> Options:
> 
> A. Introduce this new generic state (4-0 -> trunk) and we use this onward
> 
> B. Drop the idea of a generic json AppState and just add one new 
> AppState for this ticket 4.0->5.0 as this is not an issue in trunk due 
> to TCM. This one de-risks the upcoming trunk release and could be 
> repurposed in the future to become A if we chose so.
> 
> Thoughts welcomed, thanks in advance.
> 
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-20910
> 

Reply via email to