Without yet reading the code, what you describe sounds like a reasonable 
optimization / fix, suitable for 3.0+ (probably not 2.2, definitely not 2.1)

-- 
Jeff Jirsa


> On Nov 23, 2016, at 7:52 AM, Marcus Olsson <marcus.ols...@ericsson.com> wrote:
> 
> Hi everyone,
> 
> TL;DR
> Should LCS be changed to always prefer an STCS compaction in L0 if it's 
> falling behind? Assuming that STCS in L0 is enabled.
> Currently LCS seems to check if there is a possible L0->L1 compaction before 
> checking if it's falling behind, which in our case used between 15-30% of the 
> compaction thread CPU.
> TL;DR
> 
> So first some background:
> We have a Apache Cassandra 2.2 cluster running with a high load. In that 
> cluster there is a table with a moderate amount of writes per second that is 
> using LeveledCompactionStrategy. The test was to run repair on that table 
> while we monitored the cluster through JMC and with Flight Recordings 
> enabled. This resulted in a large amount of sstables for that table, which I 
> assume others have experienced as well. In this case I think it was between 
> 15-20k.
> 
> From the Flight Recording one thing we saw was that 15-30% of the CPU time in 
> each of the compaction threads was spent on "getNextBackgroundTask()" which 
> retrieves the next compaction job. With some further investigation this seems 
> to mostly be when it's checking for overlap in L0 sstables before performing 
> an L0->L1 compaction. There is a JIRA which seems to be related to this 
> https://issues.apache.org/jira/browse/CASSANDRA-11571 which we backported to 
> 2.2 and tested. In our testing it seemed to improve the situation but it was 
> still using noticeable CPU.
> 
> My interpretation of the current logic of LCS is (if STCS in L0 is enabled):
> 1. Check each level (L1+)
>  - If a L1+ compaction is needed check if L0 is behind and do STCS if that's 
> the case, otherwise do the L1+ compaction.
> 2. Check L0 -> L1 compactions and if none is needed/possible check for STCS 
> in L0.
> 
> My proposal is to change this behavior to always check if L0 is far behind 
> first and do a STCS compaction in that case. This would avoid the overlap 
> check for L0 -> L1 compactions when L0 is behind and I think it makes sense 
> since we already prefer STCS to L1+ compactions. This would not solve the 
> repair situation, but it would lower some of the impact that repair has on 
> LCS.
> 
> For what version this could get in I think trunk would be enough since 
> compaction is pluggable.
> 
> 
> -- 
>  
> <mime-attachment.png>
> 
> MARCUS OLSSON 
> Software Developer
> 
> Ericsson
> Sweden
> marcus.ols...@ericsson.com
> www.ericsson.com

Reply via email to