Re: Potential issues during 4.0 upgrade

Scott Andreas Mon, 23 Aug 2021 11:37:23 -0700

Thank you for raising this, Sam!

Agreed this is a bug that warrants releasing 4.0.1 and notifying user@.


To elaborate on impact, this issue can produce a state in rolling 3.x -> 4.0 
upgrades in which 4.0 nodes fail to serialize gossip state during the shadow 
round once the size of this state exceeds 128kb. This prevents new instances 
from coming up. Once in this state, it is also not possible for new instances 
to start up and join the ring. If existing 4.0 instances restart, they will 
also be unable to gossip and remain down.

It's a pretty serious situation without an obvious way out aside from deploying 
this patch. We should get a new release out quickly.

– Scott

________________________________________
From: Sam Tunnicliffe <[email protected]>
Sent: Monday, August 23, 2021 11:27 AM
To: [email protected]
Subject: Potential issues during 4.0 upgrade

Hi all,

I just opened a JIRA which is relevant to those running large clusters (around 
the 400 node range) and who have plans to upgrade to 4.0 upgrades soon.

https://issues.apache.org/jira/browse/CASSANDRA-16877 
<https://issues.apache.org/jira/browse/CASSANDRA-16877>

The issue is that in large clusters, the size of gossip messages sent when a 
node (re)starts may exceed the hard limit of the urgent message channel. This 
causes an error on the sender and ultimately the message is dropped. This in 
turn can cause startup failures and/or partial loss of availability.

Fortunately, the fix is quite simple and I’ve submitted a patch that I and 
other contributors have been running since discovering this issue and can 
confirm resolves the problem. It would be great to get it reviewed and merged 
ASAP and then cut a 4.0.1 release. In the meantime, it may be wise to suggest 
that operators of large clusters hold off on any planned 4.0 upgrades.

Thanks,
Sam


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Potential issues during 4.0 upgrade

Reply via email to