Re: Dropped messages on random nodes.

2017-01-23 Thread Brandon Williams
The lion's share of your drops are from cross-node timeouts, which require clock synchronization, so check that first. If your clocks are synced, that means not only are you showing eager dropping based on time, but despite the eager dropping you are still facing overload. That local, non-gc paus

Re: Dropped messages on random nodes.

2017-01-23 Thread Roopa Tangirala
Dikang, Did you take a look at the heap health on those nodes? A quick heap histogram or dump would help you figure out if it is related to data issue(wide rows, or bad model) where few nodes may be coming under heap pressure and dropping messages. Thanks, Roopa *Regards,* *Roopa Tangirala*

Re: Dropped messages on random nodes.

2017-01-23 Thread Blake Eggleston
Hi Dikang, Do you have any GC logging or metrics you can correlate with the dropped messages? A 13 second pause sounds like a bad GC pause. Thanks, Blake On January 22, 2017 at 10:37:22 PM, Dikang Gu (dikan...@gmail.com) wrote: Btw, the C* version is 2.2.5, with several backported patches.

Re: WriteTimeoutException when doing paralel DELETE IF EXISTS

2017-01-23 Thread Blake Eggleston
Hi Jaroslav, That's pretty much expected behavior for the current LWT implementation, which has problems with key contention (the usage pattern you're describing here). Typically, you want to avoid having multiple clients doing LWT operations on the same partition key at the same time. Thanks,

Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-23 Thread Michael Shuler
This vote is being failed for CASSANDRA-13058 (committed after tentative tag) and CASSANDRA-13025 (patch available). Vote count was 5 binding +1, 1 binding -1, and one non-binding -1. I'll re-roll a "Take 5" when CASSANDRA-13025 gets committed, tests appear stable, and we'll try again. -- Kind

Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-23 Thread Nate McCall
Indeed I conflated the two - thanks Sylvain. On Mon, Jan 23, 2017 at 11:19 PM, Sylvain Lebresne wrote: > On Mon, Jan 23, 2017 at 2:31 AM, Nate McCall wrote: > >> What was the resolution on this? >> >> Looks like we resolved/Fixed CASSANDRA-13058. Can we re-roll and go again? >> > > As I mentione

Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-23 Thread Sylvain Lebresne
On Mon, Jan 23, 2017 at 2:31 AM, Nate McCall wrote: > What was the resolution on this? > > Looks like we resolved/Fixed CASSANDRA-13058. Can we re-roll and go again? > As I mentioned, CASSANDRA-13025 is also a regression and should be fix before we re-roll. It's ready for review if someone's int