Re: Low compactionthroughput blocks reads?

horschi Thu, 03 Mar 2016 12:12:17 -0800

Thanks. That's good to know.

On Thu, Mar 3, 2016 at 8:48 PM, Benedict Elliott Smith <bened...@apache.org>
wrote:


> Yep, definitely a bug. Introduced by CASSANDRA-9240 (me; mea culpa).
>
> I've filed a JIRA for you: CASSANDRA-11301
>
> On 3 March 2016 at 14:10, horschi <hors...@gmail.com> wrote:
>
> > Hi,
> >
> > I just found another one. Its basically the same, but I'll post it
> anyway:
> >
> > Thread 84311: (state = BLOCKED)
> >  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may
> be
> > imprecise)
> >  - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
> >  - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled
> > frame)
> >  -
> >
> >
> com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
> > java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
> >  -
> >
> >
> com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
> > @bci=10, line=701 (Compiled frame)
> >  - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
> > line=405 (Compiled frame)
> >  - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
> > @bci=11, line=43 (Compiled frame)
> >  - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
> > line=287 (Compiled frame)
> >  - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
> > @bci=22, line=65 (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
> > @bci=5, line=1751 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.SimpleSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput,
> > org.apache.cassandra.db.composites.Composite) @bci=36, line=57 (Compiled
> > frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=38, line=66
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.DecoratedKey,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
> > @bci=9, line=62 (Compiled frame)
> >  - org.apache.cassandra.db.CollationController.collectAllData(boolean)
> > @bci=350, line=270 (Compiled frame)
> >  -
> org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
> > @bci=39, line=64 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
> > int) @bci=40, line=2011 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
> > @bci=141, line=1815 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
> > @bci=11, line=360 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
> > @bci=222, line=85 (Compiled frame)
> >  -
> > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()
> > @bci=16, line=1587 (Compiled frame)
> >  - org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()
> > @bci=37, line=2232 (Compiled frame)
> >  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
> > @bci=5, line=164 (Compiled frame)
> >  - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105
> > (Compiled frame)
> >  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> >
> > kind regards,
> > Christian
> >
> >
> > On Thu, Mar 3, 2016 at 2:31 PM, horschi <hors...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > we just had such an incident again, and this time we were able to grab
> a
> > > jstack-threaddump.
> > >
> > > One thread from this stackdump actually shows a ReadVerbHandler calling
> > > the RateLimiter. From my understanding this should not be:
> > >
> > > Thread 30346: (state = BLOCKED)
> > >  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may
> > be
> > > imprecise)
> > >  - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
> > >  - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386
> (Compiled
> > > frame)
> > >  -
> > >
> >
> com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
> > > java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
> > >  -
> > >
> >
> com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
> > > @bci=10, line=701 (Compiled frame)
> > >  - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
> > > line=405 (Compiled frame)
> > >  -
> org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
> > > @bci=11, line=43 (Compiled frame)
> > >  - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
> > > line=287 (Compiled frame)
> > >  - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
> > > @bci=22, line=65 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
> > > @bci=5, line=1751 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.setToRowStart(org.apache.cassandra.db.RowIndexEntry,
> > > org.apache.cassandra.io.util.FileDataInput) @bci=13, line=108 (Compiled
> > > frame)
> > >  -
> > >
> >
> org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> > > org.apache.cassandra.db.RowIndexEntry,
> > > org.apache.cassandra.io.util.FileDataInput,
> > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=110,
> line=84
> > > (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
> > > org.apache.cassandra.db.RowIndexEntry,
> > > org.apache.cassandra.io.util.FileDataInput,
> > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=54, line=66
> > > (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> > > org.apache.cassandra.db.DecoratedKey,
> > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
> > > (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
> > > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
> > > (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
> > > org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled
> frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
> > > @bci=9, line=62 (Compiled frame)
> > >  - org.apache.cassandra.db.CollationController.collectAllData(boolean)
> > > @bci=350, line=270 (Compiled frame)
> > >  -
> > org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
> > > @bci=39, line=64 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
> > > int) @bci=40, line=2011 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
> > > @bci=141, line=1815 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
> > > @bci=11, line=360 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
> > > @bci=222, line=85 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.db.ReadVerbHandler.doVerb(org.apache.cassandra.net.MessageIn,
> > > int) @bci=39, line=38 (Compiled frame)
> > >  - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=82, line=67
> > > (Compiled frame)
> > >  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4,
> line=511
> > > (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
> > > @bci=5, line=164 (Compiled frame)
> > >  -
> > >
> >
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run()
> > > @bci=12, line=136 (Compiled frame)
> > >  - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105
> > > (Compiled frame)
> > >  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> > >
> > >
> > > Dominik and I would appreciate if you could give some feedback. We
> think
> > > it is a bug :-)
> > >
> > > Cheers,
> > > Christian
> > >
> > >
> > > On Thu, Mar 3, 2016 at 1:08 PM, Dominik Keil <
> dominik.k...@movilizer.com
> > >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> seeing this here as well. Basically at some point during a repair
> within
> > >> the cluster (incremental, parallel, partitioner range, only one node
> at
> > a
> > >> time) some node (sometimes the repairing node, sometimes another)
> starts
> > >> piling up READs, i.e. pending reads go through the roof. Despite only
> > one
> > >> node affected at a time and enough other, well behaving nodes, are
> > >> available to satisfy our quorum reads, this impacts the read
> > performance of
> > >> the whole cluster.
> > >>
> > >> We have multiple C* 2.2 (2.2.5) clusters, basically running the same
> > >> application but with different amount of load. This symptom only
> > appears on
> > >> one of our clusters, which has significantly more usage than most
> > others.
> > >> This is also our largest cluster, having about 3 times as many
> machines
> > as
> > >> most other ones (and RF 5 instead of RF 3).
> > >>
> > >> We did not see this before 2.0 and also only started to see this on
> that
> > >> particular cluster. We didn't see this on any other cluster after
> > upgrading
> > >> form 2.0 (but then again, they're not as loaded).
> > >>
> > >> At first we suspected the incremental repair, because we did have
> issues
> > >> with that as well with heap pressure with 4GB of heap. Went to 6GB and
> > no
> > >> more heap pressure but the problem persists. The suspect does not
> > change as
> > >> this bad behaviour coincides with repair and specifically with
> > >> anticompaction going on.
> > >>
> > >> We see a clear timely correlation between open file handles rising and
> > at
> > >> the same time active validations skyrocketing and the beginning of
> such
> > an
> > >> "event". Same goes for the end of that event, which is clearly timely
> > >> correlated to the validations being completed and the number of file
> > >> handles dropping dramatically. However, these "beginning markers" seem
> > to
> > >> be both present. Active validations skyrocketing without open file
> > handlers
> > >> skyrocketing at the same time does not produce these symptoms.
> > >>
> > >> The only thing we have found so far that made a difference in these
> > >> situations is compaction throughput. When we decreased the compaction
> > >> throughput in these events, pending reads piled up even more and even
> > more
> > >> quickly. Beyond that we're still pretty much in the dark. Anyway
> > something
> > >> is locking up Cassandra internally.
> > >>
> > >> We suspect that there may be a "synchronized" somewhere it shouldn't
> be
> > >> (or should be solved differently) but that's just a guess. We'll try
> to
> > >> produce some jstacks but the events are pretty elusive because they
> > happen
> > >> suddenly and don't last very long (except when we're not watching
> > closely
> > >> -_-).
> > >>
> > >> Anyway, here's some graphs to illustrate what I've tried to describe:
> > >>
> > >> (1) CPU Usage of Cassandra (green) and open file descriptors (yellow,
> > >> second y-axis)
> > >>
> > >>
> > >> (2) Threads: new threads per second (orange line, second y-axis)
> > >>
> > >>
> > >> (3) See below; active and pending tasks on second y-axis
> > >>
> > >>
> > >> (4) Compactions and Validations: Active Tasks (Bars, second y-axis)
> and
> > >> Completed Tasks / s (lines)
> > >>
> > >>
> > >> You can see around 08:03 an event started with a sudden rise in active
> > >> validations and multiple sudden increases in open file descriptors.
> The
> > >> event lasts until 08:46 with a sudden drop in open file descriptors
> and
> > a
> > >> huge peak in new threads per second.
> > >>
> > >> During the event you can see Cassandra's CPU usage drops
> significantly.
> > >> Same goes for GC activity (graph not included here, because STW GC
> only
> > >> happens about once every 50 minutes and then takes only a fraction of
> a
> > >> second).
> > >>
> > >> As you can see there's another such event later on but much smaller
> and
> > >> shorter and between the events the pattern with the validations
> > continues
> > >> the same way without problems - only difference: No significant change
> > in
> > >> open file descriptor count.
> > >>
> > >> I have system graphs as well but not included because they show no
> > >> problems: CPU usage goes down during that event, no I/O wait on the
> CPU
> > and
> > >> disk OP/s as well as throughput actually go down as well.
> > >>
> > >> During the depicted time frame there was a repair (incremental,
> > parallel,
> > >> partitioner range) running on a different machine within the cluster.
> > We've
> > >> switched back to -pr because when running it without -pr these event
> > happen
> > >> more often and more articulated but I think that it's just affected by
> > the
> > >> same underlying problem.
> > >>
> > >> Interestingly we had a similar issue in another cluster last night,
> > which
> > >> runs C* 2.1.13 and does NOT yet use incremental repair (just full
> repair
> > >> with -pr).
> > >>
> > >> Any chance something in the read path is affected by the set
> compaction
> > >> throughput and/or running compactions? It definitely seems that
> > Cassandra
> > >> is severly restricting itself here.
> > >>
> > >> Best regards,
> > >> Dominik
> > >>
> > >>
> > >> Am 26.02.2016 um 17:42 schrieb horschi:
> > >>
> > >> Hi,
> > >>
> > >> I just had a weird behaviour on one of our Cassandra nodes, which I
> > would
> > >> like to share:
> > >>
> > >> Short version:
> > >> My pending reads went up from ~0 to the hundreds when I reduced the
> > >> compactionthroughput from 16 to 2.
> > >>
> > >>
> > >> Long version:
> > >>
> > >> One of our more powerful nodes had a few pending reads, while the
> other
> > >> ones didn't. So far nothing special.
> > >>
> > >> Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was
> > >> particularly high. So I was wondering.
> > >>
> > >> That machine had two compactions and a validation(incremental)
> running,
> > >> so I set the compactionthroughput to 2. To my surprise I saw the
> pending
> > >> reads go up to the hundreds within 5-10 seconds. Setting the
> > >> compactionthroughput back to 16 and the pending reads went back to 0
> > (or at
> > >> least close to zero).
> > >>
> > >> I kept the compactionthroughput on 2 for less than a minute. So the
> > issue
> > >> is not compactions falling behind.
> > >>
> > >> I was able to reproduce this behaviour 5-10 times. The pending reads
> > went
> > >> up, everytime I *de*creased the compactionthroughput. I watched the
> > >> pending reads while the compactionthroughput was on 16, and I never
> > >> observed even a two digit pending read count while it was on
> > >> compactionthroughput 16.
> > >>
> > >> Unfortunetaly the machine does not show this behaviour any more. Also
> it
> > >> was only a single machine.
> > >>
> > >>
> > >>
> > >> Our setup:
> > >> C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap
> > >>
> > >>
> > >> My question:
> > >> Did someone else ever observe such a behaviour?
> > >>
> > >> Is it perhaps possible that the read-path shares a lock with
> > >> repair/compaction that waits on ThrottledReader while holding that
> lock?
> > >>
> > >>
> > >> kind regards,
> > >> Christian
> > >>
> > >>
> > >> --
> > >> *Dominik Keil*
> > >> Phone: + 49 (0) 621 150 207 31
> > >> Mobile: + 49 (0) 151 626 602 14
> > >>
> > >> Movilizer GmbH
> > >> Julius-Hatry-Strasse 1
> > >> 68163 Mannheim
> > >> Germany
> > >>
> > >> movilizer.com
> > >>
> > >> [image: Visit company website] <http://movilizer.com/>
> > >> *Reinvent Your Mobile Enterprise*
> > >>
> > >> <http://movilizer.com/training>
> > >> <http://movilizer.com/training>
> > >>
> > >> *Be the first to know:*
> > >> Twitter <https://twitter.com/Movilizer> | LinkedIn
> > >> <https://www.linkedin.com/company/movilizer-gmbh> | Facebook
> > >> <https://www.facebook.com/Movilizer> | stack overflow
> > >> <http://stackoverflow.com/questions/tagged/movilizer>
> > >>
> > >> Company's registered office: Mannheim HRB: 700323 / Country Court:
> > >> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver
> > Lesche
> > >> Please inform us immediately if this e-mail and/or any attachment was
> > >> transmitted incompletely or was not intelligible.
> > >>
> > >> This e-mail and any attachment is for authorized use by the intended
> > >> recipient(s) only. It may contain proprietary material, confidential
> > >> information and/or be subject to legal privilege. It should not be
> > >> copied, disclosed to, retained or used by any other party. If you are
> > not
> > >> an intended recipient then please promptly delete this e-mail and any
> > >> attachment and all copies and inform the sender.
> > >
> > >
> > >
> >
>

Re: Low compactionthroughput blocks reads?

Reply via email to