Thanks. That's good to know. On Thu, Mar 3, 2016 at 8:48 PM, Benedict Elliott Smith <bened...@apache.org> wrote:
> Yep, definitely a bug. Introduced by CASSANDRA-9240 (me; mea culpa). > > I've filed a JIRA for you: CASSANDRA-11301 > > On 3 March 2016 at 14:10, horschi <hors...@gmail.com> wrote: > > > Hi, > > > > I just found another one. Its basically the same, but I'll post it > anyway: > > > > Thread 84311: (state = BLOCKED) > > - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may > be > > imprecise) > > - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame) > > - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled > > frame) > > - > > > > > com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long, > > java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame) > > - > > > > > com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long) > > @bci=10, line=701 (Compiled frame) > > - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42, > > line=405 (Compiled frame) > > - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer() > > @bci=11, line=43 (Compiled frame) > > - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147, > > line=287 (Compiled frame) > > - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long) > > @bci=22, line=65 (Compiled frame) > > - > > > org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long) > > @bci=5, line=1751 (Compiled frame) > > - > > > > > org.apache.cassandra.io.sstable.format.big.SimpleSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader, > > org.apache.cassandra.db.RowIndexEntry, > > org.apache.cassandra.io.util.FileDataInput, > > org.apache.cassandra.db.composites.Composite) @bci=36, line=57 (Compiled > > frame) > > - > > > > > org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader, > > org.apache.cassandra.db.RowIndexEntry, > > org.apache.cassandra.io.util.FileDataInput, > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=38, line=66 > > (Compiled frame) > > - > > > > > org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader, > > org.apache.cassandra.db.DecoratedKey, > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43 > > (Compiled frame) > > - > > > > > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey, > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75 > > (Compiled frame) > > - > > > > > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader, > > org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame) > > - > > > > > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader) > > @bci=9, line=62 (Compiled frame) > > - org.apache.cassandra.db.CollationController.collectAllData(boolean) > > @bci=350, line=270 (Compiled frame) > > - > org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean) > > @bci=39, line=64 (Compiled frame) > > - > > > > > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter, > > int) @bci=40, line=2011 (Compiled frame) > > - > > > > > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter) > > @bci=141, line=1815 (Compiled frame) > > - > > > > > org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter) > > @bci=11, line=360 (Compiled frame) > > - > > > > > org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace) > > @bci=222, line=85 (Compiled frame) > > - > > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow() > > @bci=16, line=1587 (Compiled frame) > > - org.apache.cassandra.service.StorageProxy$DroppableRunnable.run() > > @bci=37, line=2232 (Compiled frame) > > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > > (Compiled frame) > > - > > > > > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run() > > @bci=5, line=164 (Compiled frame) > > - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105 > > (Compiled frame) > > - java.lang.Thread.run() @bci=11, line=745 (Compiled frame) > > > > kind regards, > > Christian > > > > > > On Thu, Mar 3, 2016 at 2:31 PM, horschi <hors...@gmail.com> wrote: > > > > > Hi, > > > > > > we just had such an incident again, and this time we were able to grab > a > > > jstack-threaddump. > > > > > > One thread from this stackdump actually shows a ReadVerbHandler calling > > > the RateLimiter. From my understanding this should not be: > > > > > > Thread 30346: (state = BLOCKED) > > > - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may > > be > > > imprecise) > > > - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame) > > > - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 > (Compiled > > > frame) > > > - > > > > > > com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long, > > > java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame) > > > - > > > > > > com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long) > > > @bci=10, line=701 (Compiled frame) > > > - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42, > > > line=405 (Compiled frame) > > > - > org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer() > > > @bci=11, line=43 (Compiled frame) > > > - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147, > > > line=287 (Compiled frame) > > > - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long) > > > @bci=22, line=65 (Compiled frame) > > > - > > > > > > org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long) > > > @bci=5, line=1751 (Compiled frame) > > > - > > > > > > org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.setToRowStart(org.apache.cassandra.db.RowIndexEntry, > > > org.apache.cassandra.io.util.FileDataInput) @bci=13, line=108 (Compiled > > > frame) > > > - > > > > > > org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader, > > > org.apache.cassandra.db.RowIndexEntry, > > > org.apache.cassandra.io.util.FileDataInput, > > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=110, > line=84 > > > (Compiled frame) > > > - > > > > > > org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader, > > > org.apache.cassandra.db.RowIndexEntry, > > > org.apache.cassandra.io.util.FileDataInput, > > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=54, line=66 > > > (Compiled frame) > > > - > > > > > > org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader, > > > org.apache.cassandra.db.DecoratedKey, > > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43 > > > (Compiled frame) > > > - > > > > > > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey, > > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75 > > > (Compiled frame) > > > - > > > > > > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader, > > > org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled > frame) > > > - > > > > > > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader) > > > @bci=9, line=62 (Compiled frame) > > > - org.apache.cassandra.db.CollationController.collectAllData(boolean) > > > @bci=350, line=270 (Compiled frame) > > > - > > org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean) > > > @bci=39, line=64 (Compiled frame) > > > - > > > > > > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter, > > > int) @bci=40, line=2011 (Compiled frame) > > > - > > > > > > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter) > > > @bci=141, line=1815 (Compiled frame) > > > - > > > > > > org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter) > > > @bci=11, line=360 (Compiled frame) > > > - > > > > > > org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace) > > > @bci=222, line=85 (Compiled frame) > > > - > > > > > > org.apache.cassandra.db.ReadVerbHandler.doVerb(org.apache.cassandra.net.MessageIn, > > > int) @bci=39, line=38 (Compiled frame) > > > - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=82, line=67 > > > (Compiled frame) > > > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, > line=511 > > > (Compiled frame) > > > - > > > > > > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run() > > > @bci=5, line=164 (Compiled frame) > > > - > > > > > > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run() > > > @bci=12, line=136 (Compiled frame) > > > - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105 > > > (Compiled frame) > > > - java.lang.Thread.run() @bci=11, line=745 (Compiled frame) > > > > > > > > > Dominik and I would appreciate if you could give some feedback. We > think > > > it is a bug :-) > > > > > > Cheers, > > > Christian > > > > > > > > > On Thu, Mar 3, 2016 at 1:08 PM, Dominik Keil < > dominik.k...@movilizer.com > > > > > > wrote: > > > > > >> Hi, > > >> > > >> seeing this here as well. Basically at some point during a repair > within > > >> the cluster (incremental, parallel, partitioner range, only one node > at > > a > > >> time) some node (sometimes the repairing node, sometimes another) > starts > > >> piling up READs, i.e. pending reads go through the roof. Despite only > > one > > >> node affected at a time and enough other, well behaving nodes, are > > >> available to satisfy our quorum reads, this impacts the read > > performance of > > >> the whole cluster. > > >> > > >> We have multiple C* 2.2 (2.2.5) clusters, basically running the same > > >> application but with different amount of load. This symptom only > > appears on > > >> one of our clusters, which has significantly more usage than most > > others. > > >> This is also our largest cluster, having about 3 times as many > machines > > as > > >> most other ones (and RF 5 instead of RF 3). > > >> > > >> We did not see this before 2.0 and also only started to see this on > that > > >> particular cluster. We didn't see this on any other cluster after > > upgrading > > >> form 2.0 (but then again, they're not as loaded). > > >> > > >> At first we suspected the incremental repair, because we did have > issues > > >> with that as well with heap pressure with 4GB of heap. Went to 6GB and > > no > > >> more heap pressure but the problem persists. The suspect does not > > change as > > >> this bad behaviour coincides with repair and specifically with > > >> anticompaction going on. > > >> > > >> We see a clear timely correlation between open file handles rising and > > at > > >> the same time active validations skyrocketing and the beginning of > such > > an > > >> "event". Same goes for the end of that event, which is clearly timely > > >> correlated to the validations being completed and the number of file > > >> handles dropping dramatically. However, these "beginning markers" seem > > to > > >> be both present. Active validations skyrocketing without open file > > handlers > > >> skyrocketing at the same time does not produce these symptoms. > > >> > > >> The only thing we have found so far that made a difference in these > > >> situations is compaction throughput. When we decreased the compaction > > >> throughput in these events, pending reads piled up even more and even > > more > > >> quickly. Beyond that we're still pretty much in the dark. Anyway > > something > > >> is locking up Cassandra internally. > > >> > > >> We suspect that there may be a "synchronized" somewhere it shouldn't > be > > >> (or should be solved differently) but that's just a guess. We'll try > to > > >> produce some jstacks but the events are pretty elusive because they > > happen > > >> suddenly and don't last very long (except when we're not watching > > closely > > >> -_-). > > >> > > >> Anyway, here's some graphs to illustrate what I've tried to describe: > > >> > > >> (1) CPU Usage of Cassandra (green) and open file descriptors (yellow, > > >> second y-axis) > > >> > > >> > > >> (2) Threads: new threads per second (orange line, second y-axis) > > >> > > >> > > >> (3) See below; active and pending tasks on second y-axis > > >> > > >> > > >> (4) Compactions and Validations: Active Tasks (Bars, second y-axis) > and > > >> Completed Tasks / s (lines) > > >> > > >> > > >> You can see around 08:03 an event started with a sudden rise in active > > >> validations and multiple sudden increases in open file descriptors. > The > > >> event lasts until 08:46 with a sudden drop in open file descriptors > and > > a > > >> huge peak in new threads per second. > > >> > > >> During the event you can see Cassandra's CPU usage drops > significantly. > > >> Same goes for GC activity (graph not included here, because STW GC > only > > >> happens about once every 50 minutes and then takes only a fraction of > a > > >> second). > > >> > > >> As you can see there's another such event later on but much smaller > and > > >> shorter and between the events the pattern with the validations > > continues > > >> the same way without problems - only difference: No significant change > > in > > >> open file descriptor count. > > >> > > >> I have system graphs as well but not included because they show no > > >> problems: CPU usage goes down during that event, no I/O wait on the > CPU > > and > > >> disk OP/s as well as throughput actually go down as well. > > >> > > >> During the depicted time frame there was a repair (incremental, > > parallel, > > >> partitioner range) running on a different machine within the cluster. > > We've > > >> switched back to -pr because when running it without -pr these event > > happen > > >> more often and more articulated but I think that it's just affected by > > the > > >> same underlying problem. > > >> > > >> Interestingly we had a similar issue in another cluster last night, > > which > > >> runs C* 2.1.13 and does NOT yet use incremental repair (just full > repair > > >> with -pr). > > >> > > >> Any chance something in the read path is affected by the set > compaction > > >> throughput and/or running compactions? It definitely seems that > > Cassandra > > >> is severly restricting itself here. > > >> > > >> Best regards, > > >> Dominik > > >> > > >> > > >> Am 26.02.2016 um 17:42 schrieb horschi: > > >> > > >> Hi, > > >> > > >> I just had a weird behaviour on one of our Cassandra nodes, which I > > would > > >> like to share: > > >> > > >> Short version: > > >> My pending reads went up from ~0 to the hundreds when I reduced the > > >> compactionthroughput from 16 to 2. > > >> > > >> > > >> Long version: > > >> > > >> One of our more powerful nodes had a few pending reads, while the > other > > >> ones didn't. So far nothing special. > > >> > > >> Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was > > >> particularly high. So I was wondering. > > >> > > >> That machine had two compactions and a validation(incremental) > running, > > >> so I set the compactionthroughput to 2. To my surprise I saw the > pending > > >> reads go up to the hundreds within 5-10 seconds. Setting the > > >> compactionthroughput back to 16 and the pending reads went back to 0 > > (or at > > >> least close to zero). > > >> > > >> I kept the compactionthroughput on 2 for less than a minute. So the > > issue > > >> is not compactions falling behind. > > >> > > >> I was able to reproduce this behaviour 5-10 times. The pending reads > > went > > >> up, everytime I *de*creased the compactionthroughput. I watched the > > >> pending reads while the compactionthroughput was on 16, and I never > > >> observed even a two digit pending read count while it was on > > >> compactionthroughput 16. > > >> > > >> Unfortunetaly the machine does not show this behaviour any more. Also > it > > >> was only a single machine. > > >> > > >> > > >> > > >> Our setup: > > >> C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap > > >> > > >> > > >> My question: > > >> Did someone else ever observe such a behaviour? > > >> > > >> Is it perhaps possible that the read-path shares a lock with > > >> repair/compaction that waits on ThrottledReader while holding that > lock? > > >> > > >> > > >> kind regards, > > >> Christian > > >> > > >> > > >> -- > > >> *Dominik Keil* > > >> Phone: + 49 (0) 621 150 207 31 > > >> Mobile: + 49 (0) 151 626 602 14 > > >> > > >> Movilizer GmbH > > >> Julius-Hatry-Strasse 1 > > >> 68163 Mannheim > > >> Germany > > >> > > >> movilizer.com > > >> > > >> [image: Visit company website] <http://movilizer.com/> > > >> *Reinvent Your Mobile Enterprise* > > >> > > >> <http://movilizer.com/training> > > >> <http://movilizer.com/training> > > >> > > >> *Be the first to know:* > > >> Twitter <https://twitter.com/Movilizer> | LinkedIn > > >> <https://www.linkedin.com/company/movilizer-gmbh> | Facebook > > >> <https://www.facebook.com/Movilizer> | stack overflow > > >> <http://stackoverflow.com/questions/tagged/movilizer> > > >> > > >> Company's registered office: Mannheim HRB: 700323 / Country Court: > > >> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver > > Lesche > > >> Please inform us immediately if this e-mail and/or any attachment was > > >> transmitted incompletely or was not intelligible. > > >> > > >> This e-mail and any attachment is for authorized use by the intended > > >> recipient(s) only. It may contain proprietary material, confidential > > >> information and/or be subject to legal privilege. It should not be > > >> copied, disclosed to, retained or used by any other party. If you are > > not > > >> an intended recipient then please promptly delete this e-mail and any > > >> attachment and all copies and inform the sender. > > > > > > > > > > > >