Re: Dropped messages on random nodes.

2017-01-24 Thread Dikang Gu
Thanks guys! Jeff Jirsa helped me take a look, and I found a 10sec young gc pause in the GC log. 3071128K->282000K(3495296K), 0.1144648 secs] 25943529K->23186623K(66409856K), 9.8971781 secs] [Times: user=2.33 sys=0.00, real=9.89 secs] I'm trying to get a histogram or heap dump. Thanks! On Mo

Re: Dropped messages on random nodes.

2017-01-23 Thread Brandon Williams
The lion's share of your drops are from cross-node timeouts, which require clock synchronization, so check that first. If your clocks are synced, that means not only are you showing eager dropping based on time, but despite the eager dropping you are still facing overload. That local, non-gc paus

Re: Dropped messages on random nodes.

2017-01-23 Thread Roopa Tangirala
Dikang, Did you take a look at the heap health on those nodes? A quick heap histogram or dump would help you figure out if it is related to data issue(wide rows, or bad model) where few nodes may be coming under heap pressure and dropping messages. Thanks, Roopa *Regards,* *Roopa Tangirala*

Re: Dropped messages on random nodes.

2017-01-23 Thread Blake Eggleston
Hi Dikang, Do you have any GC logging or metrics you can correlate with the dropped messages? A 13 second pause sounds like a bad GC pause. Thanks, Blake On January 22, 2017 at 10:37:22 PM, Dikang Gu (dikan...@gmail.com) wrote: Btw, the C* version is 2.2.5, with several backported patches.

Re: Dropped messages on random nodes.

2017-01-22 Thread Dikang Gu
Btw, the C* version is 2.2.5, with several backported patches. On Sun, Jan 22, 2017 at 10:36 PM, Dikang Gu wrote: > Hello there, > > We have a 100 nodes ish cluster, I find that there are dropped messages on > random nodes in the cluster, which caused error spikes and P99 latency > spikes as wel

Re: dropped messages

2010-09-24 Thread Benjamin Black
Complete information, including everything in tpstats, is available for your monitoring systems via JMX. For production clusters, it is essential you at least collect the JMX stats, if not alarm on various problems (such as backed up stages). b On Wed, Sep 22, 2010 at 6:47 AM, Carl Bruecken wr

Re: dropped messages

2010-09-22 Thread Jonathan Ellis
that's your cluster's way of telling you to set up monitoring On Wed, Sep 22, 2010 at 8:47 AM, Carl Bruecken wrote: >  On 9/22/10 9:37 AM, Jonathan Ellis wrote: >> >> it's easy to tell from tpstats which stage(s) are overloaded >> >> On Wed, Sep 22, 2010 at 8:29 AM, Carl Bruecken >>  wrote: >>>

Re: dropped messages

2010-09-22 Thread Carl Bruecken
On 9/22/10 9:37 AM, Jonathan Ellis wrote: it's easy to tell from tpstats which stage(s) are overloaded On Wed, Sep 22, 2010 at 8:29 AM, Carl Bruecken wrote: With current implementation, it's impossible to tell from logs what the message types (verb) were dropped. I read this was changed f

Re: dropped messages

2010-09-22 Thread Jonathan Ellis
it's easy to tell from tpstats which stage(s) are overloaded On Wed, Sep 22, 2010 at 8:29 AM, Carl Bruecken wrote: >  With current implementation, it's impossible to tell from logs what the > message types (verb) were dropped.  I read this was changed for spamming, > but I think the behavior shou