Re: Intermittent long application pauses on nodes

2014-10-31 Thread graham sanderson
I have to admit that I haven’t tried the SafepointTimeout (I just noticed that it was actually a production VM option in the JVM code, after my initial suggestions below for debugging without it). There doesn’t seem to be an obvious bug in SafepointTimeout, though I may not be looking at the sa

Re: Intermittent long application pauses on nodes

2014-10-31 Thread Dan van Kley
Well I tried the SafepointTimeout option, but unfortunately it seems like the long safepoint syncs don't actually trigger the SafepointTimeout mechanism, so we didn't get any logs on it. It's possible I'm just doing it wrong, I used the following options: JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticV

Re: Intermittent long application pauses on nodes

2014-10-27 Thread Dan van Kley
Excellent, thanks for the tips, Graham. I'll give SafepointTimeout a try and see if that gives us anything to act on. On Fri, Oct 24, 2014 at 3:52 PM, graham sanderson wrote: > And -XX:SafepointTimeoutDelay=xxx > > to set how long before it dumps output (defaults to 1 I believe)… > > Note it

Re: Intermittent long application pauses on nodes

2014-10-24 Thread graham sanderson
And -XX:SafepointTimeoutDelay=xxx to set how long before it dumps output (defaults to 1 I believe)… Note it doesn’t actually timeout by default, it just prints the problematic threads after that time and keeps on waiting > On Oct 24, 2014, at 2:44 PM, graham sanderson wrote: > > Actually

Re: Intermittent long application pauses on nodes

2014-10-24 Thread graham sanderson
Actually - there is -XX:+SafepointTimeout which will print out offending threads (assuming you reach a 10 second pause)… That is probably your best bet. > On Oct 24, 2014, at 2:38 PM, graham sanderson wrote: > > This certainly sounds like a JVM bug. > > We are running C* 2.0.9 on pretty hig

Re: Intermittent long application pauses on nodes

2014-10-24 Thread graham sanderson
This certainly sounds like a JVM bug. We are running C* 2.0.9 on pretty high end machines with pretty large heaps, and don’t seem to have seen this (note we are on 7u67, so that might be an interesting data point, though since the old thread predated that probably not) 1) From the app/java side

Re: Intermittent long application pauses on nodes

2014-10-24 Thread Dan van Kley
I'm also curious to know if this was ever resolved or if there's any other recommended steps to take to continue to track it down. I'm seeing the same issue in our production cluster, which is running Cassandra 2.0.10 and JVM 1.7u71, using the CMS collector. Just as described above, the issue is lo

Re: Intermittent long application pauses on nodes

2014-04-14 Thread Ken Hancock
My searching my list archives shows this thread evaporated. Was a root cause ever found? Very curious. On Mon, Feb 3, 2014 at 11:52 AM, Benedict Elliott Smith < belliottsm...@datastax.com> wrote: > Hi Frank, > > The "9391" under RevokeBias is the number of milliseconds spent > synchronising

Re: Intermittent long application pauses on nodes

2014-02-27 Thread Benedict Elliott Smith
>> Sum: 120] >>>>>>>>> [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, >>>>>>>>> Sum: 46.5] >>>>>>>>> [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff: >>>>>>>

Re: Intermittent long application pauses on nodes

2014-02-27 Thread Frank Ng
163.8, Avg: 163.8, Max: 163.8, >>>>>>>> Diff: 0.0, Sum: 327.6] >>>>>>>> [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max: >>>>>>>> 222346382.1, Diff: 0.0] >>>>>>>> [Code Root Fixup: 0

Re: Intermittent long application pauses on nodes

2014-02-27 Thread Benedict Elliott Smith
T: 0.4 ms] >>>>>>> [Other: 2.1 ms] >>>>>>> [Choose CSet: 0.0 ms] >>>>>>> [Ref Proc: 1.1 ms] >>>>>>> [Ref Enq: 0.0 ms] >>>>>>> [Free CSet: 0.4 ms] >>>>>>

Re: Intermittent long application pauses on nodes

2014-02-27 Thread Frank Ng
, >>>>>> 0x0007f5c0, 0x0007f5c0) >>>>>> region size 4096K, 17 young (69632K), 17 survivors (69632K) >>>>>> compacting perm gen total 28672K, used 27428K [0x0007f5c0, >>>>>> 0x0007f780, 0x

Re: Intermittent long application pauses on nodes

2014-02-21 Thread Joel Samuelsson
;>> 0x0007f76c9200, 0x0007f780) >>>>> No shared spaces configured. >>>>> } >>>>> [Times: user=0.35 sys=0.00, real=27.58 secs] >>>>> 222346.219: G1IncCollectionPause [ 111 0 >>>>>

Re: Intermittent long application pauses on nodes

2014-02-20 Thread Joel Samuelsson
CMS behaves in a similar manner. We thought it would be GC, waiting for >>>> mmaped files being read from disk (the thread cannot reach safepoint during >>>> this operation), but it doesn't explain the huge time. >>>> >>>> We'll try jhiccup to see i

Re: Intermittent long application pauses on nodes

2014-02-17 Thread Benedict Elliott Smith
rovides any additional information. The >>> test was done on mixed aws/openstack environment, openjdk 1.7.0_45, >>> cassandra 1.2.11. Upgrading to 2.0.x is no option for us. >>> >>> regards, >>> >>> ondrej cernos >>> >>> >>>

Re: Intermittent long application pauses on nodes

2014-02-17 Thread Ondřej Černoš
chance to file a JIRA ticket. We have not been >>> able to resolve the issue. But since Joel mentioned that upgrading to >>> Cassandra 2.0.X solved it for them, we may need to upgrade. We are >>> currently on Java 1.7 and Cassandra 1.2.8 >>> >>> >>&g

Re: Intermittent long application pauses on nodes

2014-02-17 Thread Benedict Elliott Smith
Cassandra 2.0.X solved it for them, we may need to upgrade. We are >> currently on Java 1.7 and Cassandra 1.2.8 >> >> >> >> On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright wrote: >> >>> You’re running 2.0.* in production? May I ask what C* version and OS? >

Re: Intermittent long application pauses on nodes

2014-02-17 Thread Ondřej Černoš
as well. Thx! >> >> From: Joel Samuelsson >> Reply-To: "user@cassandra.apache.org" >> Date: Thursday, February 13, 2014 at 11:39 AM >> >> To: "user@cassandra.apache.org" >> Subject: Re: Intermittent long application pauses on nodes >> &g

Re: Intermittent long application pauses on nodes

2014-02-14 Thread Frank Ng
gt; > To: "user@cassandra.apache.org" > Subject: Re: Intermittent long application pauses on nodes > > We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems > to have helped our issues. > > > 2014-02-13 Keith Wright : > >> Frank did you

Re: Intermittent long application pauses on nodes

2014-02-13 Thread Keith Wright
r@cassandra.apache.org>> Date: Thursday, February 13, 2014 at 11:39 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Intermittent long application pauses on nodes We have had similar issues and upgrading

Re: Intermittent long application pauses on nodes

2014-02-13 Thread Joel Samuelsson
nks > > From: Robert Coli > Reply-To: "user@cassandra.apache.org" > Date: Monday, February 3, 2014 at 6:10 PM > To: "user@cassandra.apache.org" > Subject: Re: Intermittent long application pauses on nodes > > On Mon, Feb 3, 2014 at 8:52 AM, Bene

Re: Intermittent long application pauses on nodes

2014-02-12 Thread Keith Wright
gt;" mailto:user@cassandra.apache.org>> Date: Monday, February 3, 2014 at 6:10 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Intermittent long application pauses on nodes On Mon, Feb 3, 2014 at

Re: Intermittent long application pauses on nodes

2014-02-03 Thread Robert Coli
On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith < belliottsm...@datastax.com> wrote: > > It's possible that this is a JVM issue, but if so there may be some > remedial action we can take anyway. There are some more flags we should > add, but we can discuss that once you open a ticket. If yo

Re: Intermittent long application pauses on nodes

2014-02-03 Thread Benedict Elliott Smith
Hi Frank, The "9391" under RevokeBias is the number of milliseconds spent synchronising on the safepoint prior to the VM operation, i.e. the time it took to ensure all application threads were stopped. So this is the culprit. Notice that the time spent spinning/blocking for the threads we are supp

Re: Intermittent long application pauses on nodes

2014-02-03 Thread Frank Ng
I was able to send SafePointStatistics to another log file via the additional JVM flags and recently noticed a pause of 9.3936600 seconds. Here are the log entries: GC Log file: --- 2014-01-31T07:49:14.755-0500: 137460.842: Total time for which application threads were stopped: 0.1

Re: Intermittent long application pauses on nodes

2014-01-30 Thread Sylvain Lebresne
> > > I never figured out what kills stdout for C*. It's a library we depend on, > didn't try too hard to figure out which one. > Nah, it's Cassandra itself (in org.apache.cassandra.service.CassandraDaemon.activate()), but you can pass -f (for 'foreground') to not do it. > > > On 29 January 2014

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Benedict Elliott Smith
Add some more flags: -XX:+UnlockDiagnosticVMOptions -XX:LogFile=${path} -XX:+LogVMOutput I never figured out what kills stdout for C*. It's a library we depend on, didn't try too hard to figure out which one. On 29 January 2014 21:07, Frank Ng wrote: > Benedict, > Thanks for the advice. I've

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Benedict, Thanks for the advice. I've tried turning on PrintSafepointStatistics. However, that info is only sent to the STDOUT console. The cassandra startup script closes the STDOUT when it finishes, so nothing is shown for safepoint statistics once it's done starting up. Do you know how to sta

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Benedict Elliott Smith
Frank, The same advice for investigating holds: add the VM flags -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you could put something above 1 there, to reduce the amount of logging, since a pause of 52s will be pretty obvious even if aggregated with lots of other safe points

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the saf

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Shao-Chuan Wang
We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng wrote: > All, > > We've been having intermittent long application pauses (version 1.2.8) and > not sure if it's a cassandra bug. During

Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example o