Re: Gossip issues after upgrading to 4.0.4

2022-06-07 Thread Gil Ganz
Will do. On Tue, Jun 7, 2022 at 6:12 PM Jeff Jirsa wrote: > This deserves a JIRA ticket please. > > (I assume the sending host is randomly choosing the bad IP and blocking on > it for some period of time, causing other tasks to pile up, but it should > be investigated as a regression). > > > > O

Re: Gossip issues after upgrading to 4.0.4

2022-06-07 Thread Jeff Jirsa
This deserves a JIRA ticket please. (I assume the sending host is randomly choosing the bad IP and blocking on it for some period of time, causing other tasks to pile up, but it should be investigated as a regression). On Tue, Jun 7, 2022 at 7:52 AM Gil Ganz wrote: > Yes, I know the issue wit

Re: Gossip issues after upgrading to 4.0.4

2022-06-07 Thread Gil Ganz
Yes, I know the issue with the peers table, we had it in different clusters, in this case it appears the cause of the problem was indeed a bad ip in the seed list. After removing it from all nodes and reloading seeds, running a rolling restart does not cause any gossip issues, and in general the nu

Re: Gossip issues after upgrading to 4.0.4

2022-06-07 Thread Bowen Song
Regarding the "ghost IP", you may want to check the system.peers_v2 table by doing "select * from system.peers_v2 where peer = '123.456.789.012';" I've seen this (non-)issue many times, and I had to do "delete from system.peers_v2 where peer=..." to fix it, as on our client side, the Python c

Re: Gossip issues after upgrading to 4.0.4

2022-06-06 Thread Gil Ganz
Only errors I see in the logs prior to gossip pending issue are things like this INFO [Messaging-EventLoop-3-32] 2022-06-02 20:29:44,833 NoSpamLogger.java:92 - /X:7000->/Y:7000-URGENT_MESSAGES-[no-channel] failed to connect io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect

Re: Gossip issues after upgrading to 4.0.4

2022-06-06 Thread C. Scott Andreas
Hi Gil, thanks for reaching out.Can you check Cassandra's logs to see if any uncaught exceptions are being thrown? What you described suggests the possibility of an uncaught exception being thrown in the Gossiper thread, preventing further tasks from making progress; however I'm not aware of any

Gossip issues after upgrading to 4.0.4

2022-06-06 Thread Gil Ganz
Hey We have a big cluster (>500 nodes, onprem, multiple datacenters, most with vnodes=32, but some with 128), that was recently upgraded from 3.11.9 to 4.0.4. Servers are all centos 7. We have been dealing with a few issues related to gossip since : 1 - The moment the last node in the cluster was