Hi,
I ran some benchmarks on my laptop
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16656821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16656821
For a random read workload, varying chunk size:
Chunk size Time
64k 25:20
64k 25:33
32k 20:01
16k 19:19
16k 19:14
8k 16:51
4k 15:39
Ariel
On Thu, Oct 18, 2018, at 2:55 PM, Ariel Weisberg wrote:
> Hi,
>
> For those who were asking about the performance impact of block size on
> compression I wrote a microbenchmark.
>
> https://pastebin.com/RHDNLGdC
>
> [java] Benchmark Mode
> Cnt Score Error Units
> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16k thrpt
> 15 331190055.685 ± 8079758.044 ops/s
> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32k thrpt
> 15 353024925.655 ± 7980400.003 ops/s
> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64k thrpt
> 15 365664477.654 ± 10083336.038 ops/s
> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt
> 15 305518114.172 ± 11043705.883 ops/s
> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k thrpt
> 15 688369529.911 ± 25620873.933 ops/s
> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k thrpt
> 15 703635848.895 ± 5296941.704 ops/s
> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k thrpt
> 15 695537044.676 ± 17400763.731 ops/s
> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k thrpt
> 15 727725713.128 ± 4252436.331 ops/s
>
> To summarize, compression is 8.5% slower and decompression is 1% faster.
> This is measuring the impact on compression/decompression not the huge
> impact that would occur if we decompressed data we don't need less
> often.
>
> I didn't test decompression of Snappy and LZ4 high, but I did test
> compression.
>
> Snappy:
> [java] CompactIntegerSequenceBench.benchCompressSnappy16k thrpt
> 2 196574766.116 ops/s
> [java] CompactIntegerSequenceBench.benchCompressSnappy32k thrpt
> 2 198538643.844 ops/s
> [java] CompactIntegerSequenceBench.benchCompressSnappy64k thrpt
> 2 194600497.613 ops/s
> [java] CompactIntegerSequenceBench.benchCompressSnappy8k thrpt
> 2 186040175.059 ops/s
>
> LZ4 high compressor:
> [java] CompactIntegerSequenceBench.bench16k thrpt 2
> 20822947.578 ops/s
> [java] CompactIntegerSequenceBench.bench32k thrpt 2
> 12037342.253 ops/s
> [java] CompactIntegerSequenceBench.bench64k thrpt 2
> 6782534.469 ops/s
> [java] CompactIntegerSequenceBench.bench8k thrpt 2
> 32254619.594 ops/s
>
> LZ4 high is the one instance where block size mattered a lot. It's a bit
> suspicious really when you look at the ratio of performance to block
> size being close to 1:1. I couldn't spot a bug in the benchmark though.
>
> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
>
> Chunk size 8192, ratio 0.709473
> Chunk size 16384, ratio 0.667236
> Chunk size 32768, ratio 0.634735
> Chunk size 65536, ratio 0.607208
>
> By way of comparison I also ran deflate with maximum compression:
>
> Chunk size 8192, ratio 0.426434
> Chunk size 16384, ratio 0.402423
> Chunk size 32768, ratio 0.381627
> Chunk size 65536, ratio 0.364865
>
> Ariel
>
> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> > FWIW, I’m not -0, just think that long after the freeze date a change
> > like this needs a strong mandate from the community. I think the change
> > is a good one.
> >
> >
> >
> >
> >
> > > On 17 Oct 2018, at 22:09, Ariel Weisberg <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > It's really not appreciably slower compared to the decompression we are
> > > going to do which is going to take several microseconds. Decompression is
> > > also going to be faster because we are going to do less unnecessary
> > > decompression and the decompression itself may be faster since it may fit
> > > in a higher level cache better. I ran a microbenchmark comparing them.
> > >
> > > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > >
> > > Fetching a long from memory: 56 nanoseconds
> > > Compact integer sequence : 80 nanoseconds
> > > Summing integer sequence : 165 nanoseconds
> > >
> > > Currently we have one +1 from Kurt to change the representation and
> > > possibly a -0 from Benedict. That's not really enough to make an
> > > exception to the code freeze. If you want it to happen (or not) you need
> > > to speak up otherwise only the default will change.
> > >
> > > Regards,
> > > Ariel
> > >
> > > On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
> > >> I think if we're going to drop it to 16k, we should invest in the compact
> > >> sequencing as well. Just lowering it to 16k will have potentially a
> > >> painful
> > >> impact on anyone running low memory nodes, but if we can do it without
> > >> the
> > >> memory impact I don't think there's any reason to wait another major
> > >> version to implement it.
> > >>
> > >> Having said that, we should probably benchmark the two representations
> > >> Ariel has come up with.
> > >>
> > >> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ <[email protected]> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> I would guess a lot of C* clusters/tables have this option set to the
> > >>> default value, and not many of them are having the need for reading so
> > >>> big
> > >>> chunks of data.
> > >>> I believe this will greatly limit disk overreads for a fair amount (a
> > >>> big
> > >>> majority?) of new users. It seems fair enough to change this default
> > >>> value,
> > >>> I also think 4.0 is a nice place to do this.
> > >>>
> > >>> Thanks for taking care of this Ariel and for making sure there is a
> > >>> consensus here as well,
> > >>>
> > >>> C*heers,
> > >>> -----------------------
> > >>> Alain Rodriguez - [email protected]
> > >>> France / Spain
> > >>>
> > >>> The Last Pickle - Apache Cassandra Consulting
> > >>> http://www.thelastpickle.com
> > >>>
> > >>> Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg <[email protected]> a
> > >>> écrit :
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> This would only impact new tables, existing tables would get their
> > >>>> chunk_length_in_kb from the existing schema. It's something we record
> > >>>> in
> > >>> a
> > >>>> system table.
> > >>>>
> > >>>> I have an implementation of a compact integer sequence that only
> > >>>> requires
> > >>>> 37% of the memory required today. So we could do this with only
> > >>>> slightly
> > >>>> more than doubling the memory used. I'll post that to the JIRA soon.
> > >>>>
> > >>>> Ariel
> > >>>>
> > >>>> On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> > >>>>>
> > >>>>>
> > >>>>> I think 16k is a better default, but it should only affect new tables.
> > >>>>> Whoever changes it, please make sure you think about the upgrade path.
> > >>>>>
> > >>>>>
> > >>>>>> On Oct 12, 2018, at 2:31 AM, Ben Bromhead <[email protected]>
> > >>> wrote:
> > >>>>>>
> > >>>>>> This is something that's bugged me for ages, tbh the performance gain
> > >>>> for
> > >>>>>> most use cases far outweighs the increase in memory usage and I would
> > >>>> even
> > >>>>>> be in favor of changing the default now, optimizing the storage cost
> > >>>> later
> > >>>>>> (if it's found to be worth it).
> > >>>>>>
> > >>>>>> For some anecdotal evidence:
> > >>>>>> 4kb is usually what we end setting it to, 16kb feels more reasonable
> > >>>> given
> > >>>>>> the memory impact, but what would be the point if practically, most
> > >>>> folks
> > >>>>>> set it to 4kb anyway?
> > >>>>>>
> > >>>>>> Note that chunk_length will largely be dependent on your read sizes,
> > >>>> but 4k
> > >>>>>> is the floor for most physical devices in terms of ones block size.
> > >>>>>>
> > >>>>>> +1 for making this change in 4.0 given the small size and the large
> > >>>>>> improvement to new users experience (as long as we are explicit in
> > >>> the
> > >>>>>> documentation about memory consumption).
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg <[email protected]>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> This is regarding
> > >>>> https://issues.apache.org/jira/browse/CASSANDRA-13241
> > >>>>>>>
> > >>>>>>> This ticket has languished for a while. IMO it's too late in 4.0 to
> > >>>>>>> implement a more memory efficient representation for compressed
> > >>> chunk
> > >>>>>>> offsets. However I don't think we should put out another release
> > >>> with
> > >>>> the
> > >>>>>>> current 64k default as it's pretty unreasonable.
> > >>>>>>>
> > >>>>>>> I propose that we lower the value to 16kb. 4k might never be the
> > >>>> correct
> > >>>>>>> default anyways as there is a cost to compression and 16k will still
> > >>>> be a
> > >>>>>>> large improvement.
> > >>>>>>>
> > >>>>>>> Benedict and Jon Haddad are both +1 on making this change for 4.0.
> > >>> In
> > >>>> the
> > >>>>>>> past there has been some consensus about reducing this value
> > >>> although
> > >>>> maybe
> > >>>>>>> with more memory efficiency.
> > >>>>>>>
> > >>>>>>> The napkin math for what this costs is:
> > >>>>>>> "If you have 1TB of uncompressed data, with 64k chunks that's 16M
> > >>>> chunks
> > >>>>>>> at 8 bytes each (128MB).
> > >>>>>>> With 16k chunks, that's 512MB.
> > >>>>>>> With 4k chunks, it's 2G.
> > >>>>>>> Per terabyte of data (pre-compression)."
> > >>>>>>>
> > >>>>>>>
> > >>>>
> > >>> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
> > >>>>>>>
> > >>>>>>> By way of comparison memory mapping the files has a similar cost per
> > >>>> 4k
> > >>>>>>> page of 8 bytes. Multiple mappings makes this more expensive. With a
> > >>>>>>> default of 16kb this would be 4x less expensive than memory mapping
> > >>> a
> > >>>> file.
> > >>>>>>> I only mention this to give a sense of the costs we are already
> > >>>> paying. I
> > >>>>>>> am not saying they are directly related.
> > >>>>>>>
> > >>>>>>> I'll wait a week for discussion and if there is consensus make the
> > >>>> change.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Ariel
> > >>>>>>>
> > >>>>>>>
> > >>> ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: [email protected]
> > >>>>>>> For additional commands, e-mail: [email protected]
> > >>>>>>>
> > >>>>>>> --
> > >>>>>> Ben Bromhead
> > >>>>>> CTO | Instaclustr <https://www.instaclustr.com/>
> > >>>>>> +1 650 284 9692
> > >>>>>> Reliability at Scale
> > >>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> > >>>>>
> > >>>>> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: [email protected]
> > >>>>> For additional commands, e-mail: [email protected]
> > >>>>>
> > >>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: [email protected]
> > >>>> For additional commands, e-mail: [email protected]
> > >>>>
> > >>>>
> > >>>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]