RE: [EXTERNAL] Re: Bursts of Thrift threads make cluster unresponsive

2019-06-28 Thread Durity, Sean R
@cassandra.apache.org Subject: [EXTERNAL] Re: Bursts of Thrift threads make cluster unresponsive > Is there an order in which the events you described happened, or is the order > with which you presented them the order you notice things going wrong? At first, threads count (Thrift)

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
> Is there an order in which the events you described happened, or is the order with which you presented them the order you notice things going wrong? At first, threads count (Thrift) start increasing. After 2 or 3 minutes they consume all CPU cores. After that, simultaneously: message drops occur

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Avinash Mandava
Yeah i skimmed too fast, don't add more work if CPU is pegged, and if using thrift protocol NTR would not have values. Is there an order in which the events you described happened, or is the order with which you presented them the order you notice things going wrong? On Thu, Jun 27, 2019 at 1:29

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
Thanks for your reply! > Have you tried increasing concurrent reads until you see more activity in disk? When problem occurs, freshly created 1.2k - 2k Thrift threads consume all CPU on all cores. Does increasing concurrent reads may help in this situation? > org.apache.cassandra.metrics.type=Thr

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Avinash Mandava
Have you tried increasing concurrent reads until you see more activity in disk? If you've always got 32 active reads and high pending reads it could just be dropping the reads because the queues are saturated. Could be artificially bottlenecking at the C* process level. Also what does this metric

Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
Hello! We've met several times the following problem. Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes: - all CPUs have 100% load (normally we have LA 5 on 16-cores machine) - cassandra's threads count raises from 300 to 1300 - 2000,most of them are Thrift threads in java.net.Sock