Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-11-03 Thread Carl Mueller
IMO slightly bigger memory requirements for substantial improvements is a good exchange, especially for a 4.0 release of the database. Optane and lots of other memory are coming down the hardware pipeline, and risk-wise almost all cassandra people know to testbed the major versions, so major versio

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-29 Thread Jonathan Haddad
Looks straightforward, I can review today. On Mon, Oct 29, 2018 at 12:25 PM Ariel Weisberg wrote: > Hi, > > Seeing too many -'s for changing the representation and essentially no +1s > so I submitted a patch for just changing the default. I could use a > reviewer for https://issues.apache.org/ji

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-29 Thread Ariel Weisberg
Hi, Seeing too many -'s for changing the representation and essentially no +1s so I submitted a patch for just changing the default. I could use a reviewer for https://issues.apache.org/jira/browse/CASSANDRA-13241 I created https://issues.apache.org/jira/browse/CASSANDRA-14857 "Use a more spa

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Joshua McKenzie
+1. I use the smiley to let you know I'm mostly just giving you shit. ;) On Wed, Oct 24, 2018 at 11:43 AM Benedict Elliott Smith wrote: > If you undertake sufficiently many low risk things, some will bite you, I > think everyone understands that. It’s still valuable to factor a risk > assessmen

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Benedict Elliott Smith
If you undertake sufficiently many low risk things, some will bite you, I think everyone understands that. It’s still valuable to factor a risk assessment into the equation, I think? Either way, somebody asked who didn’t have the context to easily answer, so I did my best to offer them that in

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Joshua McKenzie
| The risk from such a patch is very low If I had a nickel for every time I've heard that... ;) I'm neutral on the default change, -.5 (i.e. don't agree with it but won't die on that hill) on the data structure change post-freeze. We put this in, and that's a slippery slope as I'm sure we can find

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Jeff Jirsa
My objection (-0.5) is based on freeze not in code complexity -- Jeff Jirsa > On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith > wrote: > > To discuss the concerns about the patch for a more efficient representation: > > The risk from such a patch is very low. It’s a very simple in-me

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Benedict Elliott Smith
To discuss the concerns about the patch for a more efficient representation: The risk from such a patch is very low. It’s a very simple in-memory data structure, that we can introduce thorough fuzz tests for. The reason to exclude it would be for reasons of wanting to begin strictly enforcing

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Ariel Weisberg
Hi, I just asked Jeff. He is -0 and -0.5 respectively. Ariel On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote: > I’m +1 change of default. I think Jeff was -1 on that though. > > > > On 23 Oct 2018, at 16:46, Ariel Weisberg wrote: > > > > Hi, > > > > To summarize who we have

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Benedict Elliott Smith
I’m +1 change of default. I think Jeff was -1 on that though. > On 23 Oct 2018, at 16:46, Ariel Weisberg wrote: > > Hi, > > To summarize who we have heard from so far > > WRT to changing just the default: > > +1: > Jon Haddadd > Ben Bromhead > Alain Rodriguez > Sankalp Kohli (not explicit)

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Ariel Weisberg
Hi, To summarize who we have heard from so far WRT to changing just the default: +1: Jon Haddadd Ben Bromhead Alain Rodriguez Sankalp Kohli (not explicit) -0: Sylvaine Lebresne Jeff Jirsa Not sure: Kurt Greaves Joshua Mckenzie Benedict Elliot Smith WRT to change the representation: +1: Ther

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jonathan Haddad
Sorry, to be clear - I'm +1 on changing the configuration default, but I think changing the compression in memory representations warrants further discussion and investigation before making a case for or against it yet. An optimization that reduces in memory cost by over 50% sounds pretty good and

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jonathan Haddad
I think we should try to do the right thing for the most people that we can. The number of folks impacted by 64KB is huge. I've worked on a lot of clusters created by a lot of different teams, going from brand new to pretty damn knowledgeable. I can't think of a single time over the last 2 years

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Sankalp Kohli
(We should definitely harden the definition for freeze in a separate thread) My thinking is that this is the best time to do this change as we have not even cut alpha or beta. All the people involved in the test will definitely be testing it again when we have these releases. > On Oct 19, 2018

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Michael Shuler
On 10/19/18 9:16 AM, Joshua McKenzie wrote: > > At the risk of hijacking this thread, when are we going to transition from > "no new features, change whatever else you want including refactoring and > changing years-old defaults" to "ok, we think we have something that's > stable, time to start te

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Benedict Elliott Smith
Shall we move this discussion to a separate thread? I agree it needs to be had, but this will definitely derail this discussion. To respond only to the relevant portion for this thread: > changing years-old defaults I don’t see how age is relevant? This isn’t some ‘battle hardened’ feature w

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Joshua McKenzie
> > The predominant phrased used in that thread was 'feature freeze'. At the risk of hijacking this thread, when are we going to transition from "no new features, change whatever else you want including refactoring and changing years-old defaults" to "ok, we think we have something that's stable,

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Ariel Weisberg
Hi, I ran some benchmarks on my laptop https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16656821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16656821 For a random read workload, varying chunk size: Chunk size Time 64k 25:20

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Benedict Elliott Smith
The change of default property doesn’t seem to violate the freeze? The predominant phrased used in that thread was 'feature freeze'. A lot of people are now interpreting it more broadly, so perhaps we need to revisit, but that’s probably a separate discussion? The current default is really ba

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jeff Jirsa
Agree with Sylvain (and I think Benedict) - there’s no compelling reason to violate the freeze here. We’ve had the wrong default for years - add a note to the docs that we’ll be changing it in the future, but let’s not violate the freeze now. -- Jeff Jirsa > On Oct 19, 2018, at 10:06 AM, Syl

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Sylvain Lebresne
Fwiw, as much as I agree this is a change worth doing in general, I do am -0 for 4.0. Both the "compact sequencing" and the change of default really. We're closing on 2 months within the freeze, and for me a freeze do include not changing defaults, because changing default ideally imply a decent am

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-18 Thread Ariel Weisberg
Hi, For those who were asking about the performance impact of block size on compression I wrote a microbenchmark. https://pastebin.com/RHDNLGdC [java] Benchmark Mode Cnt Score Error Units [java] CompactIntegerSequenceB

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-18 Thread Benedict Elliott Smith
FWIW, I’m not -0, just think that long after the freeze date a change like this needs a strong mandate from the community. I think the change is a good one. > On 17 Oct 2018, at 22:09, Ariel Weisberg wrote: > > Hi, > > It's really not appreciably slower compared to the decompression we ar

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-17 Thread Ariel Weisberg
Hi, It's really not appreciably slower compared to the decompression we are going to do which is going to take several microseconds. Decompression is also going to be faster because we are going to do less unnecessary decompression and the decompression itself may be faster since it may fit in

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-17 Thread kurt greaves
I think if we're going to drop it to 16k, we should invest in the compact sequencing as well. Just lowering it to 16k will have potentially a painful impact on anyone running low memory nodes, but if we can do it without the memory impact I don't think there's any reason to wait another major versi

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-17 Thread Alain RODRIGUEZ
+1 I would guess a lot of C* clusters/tables have this option set to the default value, and not many of them are having the need for reading so big chunks of data. I believe this will greatly limit disk overreads for a fair amount (a big majority?) of new users. It seems fair enough to change this

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-12 Thread Ariel Weisberg
Hi, This would only impact new tables, existing tables would get their chunk_length_in_kb from the existing schema. It's something we record in a system table. I have an implementation of a compact integer sequence that only requires 37% of the memory required today. So we could do this with o

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-12 Thread Jeff Jirsa
> On Oct 12, 2018, at 6:46 AM, Pavel Yaskevich wrote: > >> On Thu, Oct 11, 2018 at 4:31 PM Ben Bromhead wrote: >> >> This is something that's bugged me for ages, tbh the performance gain for >> most use cases far outweighs the increase in memory usage and I would even >> be in favor of chan

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Jeff Jirsa
I think 16k is a better default, but it should only affect new tables. Whoever changes it, please make sure you think about the upgrade path. > On Oct 12, 2018, at 2:31 AM, Ben Bromhead wrote: > > This is something that's bugged me for ages, tbh the performance gain for > most use cases fa

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Pavel Yaskevich
On Thu, Oct 11, 2018 at 4:31 PM Ben Bromhead wrote: > This is something that's bugged me for ages, tbh the performance gain for > most use cases far outweighs the increase in memory usage and I would even > be in favor of changing the default now, optimizing the storage cost later > (if it's foun

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Ben Bromhead
This is something that's bugged me for ages, tbh the performance gain for most use cases far outweighs the increase in memory usage and I would even be in favor of changing the default now, optimizing the storage cost later (if it's found to be worth it). For some anecdotal evidence: 4kb is usuall

CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Ariel Weisberg
Hi, This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241 This ticket has languished for a while. IMO it's too late in 4.0 to implement a more memory efficient representation for compressed chunk offsets. However I don't think we should put out another release with the current