Hi all,

I just opened a JIRA which is relevant to those running large clusters (around 
the 400 node range) and who have plans to upgrade to 4.0 upgrades soon. 

https://issues.apache.org/jira/browse/CASSANDRA-16877 
<https://issues.apache.org/jira/browse/CASSANDRA-16877> 

The issue is that in large clusters, the size of gossip messages sent when a 
node (re)starts may exceed the hard limit of the urgent message channel. This 
causes an error on the sender and ultimately the message is dropped. This in 
turn can cause startup failures and/or partial loss of availability.  

Fortunately, the fix is quite simple and I’ve submitted a patch that I and 
other contributors have been running since discovering this issue and can 
confirm resolves the problem. It would be great to get it reviewed and merged 
ASAP and then cut a 4.0.1 release. In the meantime, it may be wise to suggest 
that operators of large clusters hold off on any planned 4.0 upgrades.

Thanks,
Sam

Reply via email to