Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread Aleksey Yeshchenko
All right. I’ll clarify then.

-0 on switching the default to G1 *this late* just before RC1.
-1 on switching the default offheap_objects *for 4.1 RC1*, but all for it in 
principle, for 4.2, after we run some more test and resolve the concerns raised 
by Jeff.

Let’s please try to avoid this kind of super late defaults switch going forward?

—
AY

> On 16 Nov 2022, at 03:27, Derek Chen-Becker  wrote:
> 
> For the record, I'm +100 on G1. Take it with whatever sized grain of
> salt you think appropriate for a relative newcomer to the list, but
> I've spent my last 7-8 years dealing with the intersection of
> high-throughput, low latency systems and their interaction with GC and
> in my personal experience G1 outperforms CMS in all cases and with
> significantly less work (zero work, in many cases). The only things
> I've seen perform better *with a similar heap footprint* are GenShen
> (currently experimental) and Rust (beyond the scope of this topic).
> 
> Derek
> 
> On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  wrote:
>> 
>> I'm curious what it would take for folks to be OK with merging this into 
>> 4.1?  How much additional time would you want to feel comfortable?
>> 
>> I should probably have been a little more vigorous in my +1 of Mick's PR.  
>> For a little background - I worked on several hundred clusters while at TLP, 
>> mostly dealing with stability and performance issues.  A lot of them stemmed 
>> partially or wholly from the GC settings we ship in the project. Par New 
>> with CMS and small new gen results in a lot of premature promotion leading 
>> to high pause times into the hundreds of ms which pushes p99 latency through 
>> the roof.
>> 
>> I'm a big +1 in favor of G1 because it's not just better for most people but 
>> it's better for _every_ new Cassandra user.  The first experience that 
>> people have with the project is important, and our current GC settings are 
>> quite bad - so bad they lead to problems with stability in production.  The 
>> G1 settings are mostly hands off, result in shorter pause times and are a 
>> big improvement over the status quo.
>> 
>> Most folks don't do GC tuning, they use what we supply, and what we 
>> currently supply leads to a poor initial experience with the database.  I 
>> think we owe the community our best effort even if it means pushing the 
>> release back little bit.
>> 
>> Just for some additional context, we're (Netflix) running 25K nodes on G1 
>> across a variety of hardware in AWS with wildly varying workloads, and I 
>> haven't seen G1 be the root cause of a problem even once.  The settings that 
>> Mick is proposing are almost identical to what we use (we use half of heap 
>> up to 30GB).
>> 
>> I'd really appreciate it if we took a second to consider the community 
>> effect of another release that ships settings that cause significant pain 
>> for our users.
>> 
>> Jon
>> 
>> On 2022/11/10 21:49:36 Mick Semb Wever wrote:
 
 In case of GC, reasonably extensive performance testing should be the
 expectations. Potentially revisiting some of the G1 params for the 4.1
 reality - quite a lot has changed since those optional defaults where
 picked.
 
>>> 
>>> 
>>> I've put our battle-tested g1 opts (from consultants at TLP and DataStax)
>>> in the patch for CASSANDRA-18027
>>> 
>>> In reality it is really not much of a change, g1 does make it simple.
>>> Picking the correct ParallelGCThreads and ConcGCThreads and the floor to
>>> the new heap (XX:NewSize) is still required, though we could do a much
>>> better job of dynamic defaults to them.
>>> 
>>> Alex Dejanovski's blog is a starting point:
>>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html
>>> where this gc opt set was used (though it doesn't prove why those options
>>> are chosen)
>>> 
>>> The bar for objection to sneaking these into 4.1 was intended to be low,
>>> and I stand by those that raise concerns.
>>> 
> 
> 
> 
> -- 
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+



Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread Josh McKenzie
To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to prioritize 
digging into G1's behavior on small heaps vs. CMS w/our default tuning sooner 
rather than later. With that info I'd likely be a strong +1 on the shift.

-1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
small step away from being a +1 w/some more rigor around seeing the current 
state of the technology's intersections.

On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
> All right. I’ll clarify then.
> 
> -0 on switching the default to G1 *this late* just before RC1.
> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it in 
> principle, for 4.2, after we run some more test and resolve the concerns 
> raised by Jeff.
> 
> Let’s please try to avoid this kind of super late defaults switch going 
> forward?
> 
> —
> AY
> 
> > On 16 Nov 2022, at 03:27, Derek Chen-Becker  wrote:
> > 
> > For the record, I'm +100 on G1. Take it with whatever sized grain of
> > salt you think appropriate for a relative newcomer to the list, but
> > I've spent my last 7-8 years dealing with the intersection of
> > high-throughput, low latency systems and their interaction with GC and
> > in my personal experience G1 outperforms CMS in all cases and with
> > significantly less work (zero work, in many cases). The only things
> > I've seen perform better *with a similar heap footprint* are GenShen
> > (currently experimental) and Rust (beyond the scope of this topic).
> > 
> > Derek
> > 
> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  
> > wrote:
> >> 
> >> I'm curious what it would take for folks to be OK with merging this into 
> >> 4.1?  How much additional time would you want to feel comfortable?
> >> 
> >> I should probably have been a little more vigorous in my +1 of Mick's PR.  
> >> For a little background - I worked on several hundred clusters while at 
> >> TLP, mostly dealing with stability and performance issues.  A lot of them 
> >> stemmed partially or wholly from the GC settings we ship in the project. 
> >> Par New with CMS and small new gen results in a lot of premature promotion 
> >> leading to high pause times into the hundreds of ms which pushes p99 
> >> latency through the roof.
> >> 
> >> I'm a big +1 in favor of G1 because it's not just better for most people 
> >> but it's better for _every_ new Cassandra user.  The first experience that 
> >> people have with the project is important, and our current GC settings are 
> >> quite bad - so bad they lead to problems with stability in production.  
> >> The G1 settings are mostly hands off, result in shorter pause times and 
> >> are a big improvement over the status quo.
> >> 
> >> Most folks don't do GC tuning, they use what we supply, and what we 
> >> currently supply leads to a poor initial experience with the database.  I 
> >> think we owe the community our best effort even if it means pushing the 
> >> release back little bit.
> >> 
> >> Just for some additional context, we're (Netflix) running 25K nodes on G1 
> >> across a variety of hardware in AWS with wildly varying workloads, and I 
> >> haven't seen G1 be the root cause of a problem even once.  The settings 
> >> that Mick is proposing are almost identical to what we use (we use half of 
> >> heap up to 30GB).
> >> 
> >> I'd really appreciate it if we took a second to consider the community 
> >> effect of another release that ships settings that cause significant pain 
> >> for our users.
> >> 
> >> Jon
> >> 
> >> On 2022/11/10 21:49:36 Mick Semb Wever wrote:
>  
>  In case of GC, reasonably extensive performance testing should be the
>  expectations. Potentially revisiting some of the G1 params for the 4.1
>  reality - quite a lot has changed since those optional defaults where
>  picked.
>  
> >>> 
> >>> 
> >>> I've put our battle-tested g1 opts (from consultants at TLP and DataStax)
> >>> in the patch for CASSANDRA-18027
> >>> 
> >>> In reality it is really not much of a change, g1 does make it simple.
> >>> Picking the correct ParallelGCThreads and ConcGCThreads and the floor to
> >>> the new heap (XX:NewSize) is still required, though we could do a much
> >>> better job of dynamic defaults to them.
> >>> 
> >>> Alex Dejanovski's blog is a starting point:
> >>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html
> >>> where this gc opt set was used (though it doesn't prove why those options
> >>> are chosen)
> >>> 
> >>> The bar for objection to sneaking these into 4.1 was intended to be low,
> >>> and I stand by those that raise concerns.
> >>> 
> > 
> > 
> > 
> > -- 
> > +---+
> > | Derek Chen-Becker |
> > | GPG Key available at https://keybase.io/dchenbecker and   |
> > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> > | Fngrprnt: EB8A 6480 F0A3 C8EB C1

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread J. D. Jordan
Heap -+1 for G1 in trunk+0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I understand pushback against changing this so late in the game.Memtable --1 for off heap in 4.1. I think this needs more testing and isn’t something to change at the last minute.+1 for running performance/fuzz tests against the alternate memtable choices in trunk and switching if they don’t show regressions.On Nov 16, 2022, at 10:48 AM, Josh McKenzie  wrote:To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to prioritize digging into G1's behavior on small heaps vs. CMS w/our default tuning sooner rather than later. With that info I'd likely be a strong +1 on the shift.-1 on switching to offheap_objects for 4.1 RC; again, think this is just a small step away from being a +1 w/some more rigor around seeing the current state of the technology's intersections.On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:All right. I’ll clarify then.-0 on switching the default to G1 *this late* just before RC1.-1 on switching the default offheap_objects *for 4.1 RC1*, but all for it in principle, for 4.2, after we run some more test and resolve the concerns raised by Jeff.Let’s please try to avoid this kind of super late defaults switch going forward?—AY> On 16 Nov 2022, at 03:27, Derek Chen-Becker  wrote:> > For the record, I'm +100 on G1. Take it with whatever sized grain of> salt you think appropriate for a relative newcomer to the list, but> I've spent my last 7-8 years dealing with the intersection of> high-throughput, low latency systems and their interaction with GC and> in my personal experience G1 outperforms CMS in all cases and with> significantly less work (zero work, in many cases). The only things> I've seen perform better *with a similar heap footprint* are GenShen> (currently experimental) and Rust (beyond the scope of this topic).> > Derek> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  wrote:>> >> I'm curious what it would take for folks to be OK with merging this into 4.1?  How much additional time would you want to feel comfortable?>> >> I should probably have been a little more vigorous in my +1 of Mick's PR.  For a little background - I worked on several hundred clusters while at TLP, mostly dealing with stability and performance issues.  A lot of them stemmed partially or wholly from the GC settings we ship in the project. Par New with CMS and small new gen results in a lot of premature promotion leading to high pause times into the hundreds of ms which pushes p99 latency through the roof.>> >> I'm a big +1 in favor of G1 because it's not just better for most people but it's better for _every_ new Cassandra user.  The first experience that people have with the project is important, and our current GC settings are quite bad - so bad they lead to problems with stability in production.  The G1 settings are mostly hands off, result in shorter pause times and are a big improvement over the status quo.>> >> Most folks don't do GC tuning, they use what we supply, and what we currently supply leads to a poor initial experience with the database.  I think we owe the community our best effort even if it means pushing the release back little bit.>> >> Just for some additional context, we're (Netflix) running 25K nodes on G1 across a variety of hardware in AWS with wildly varying workloads, and I haven't seen G1 be the root cause of a problem even once.  The settings that Mick is proposing are almost identical to what we use (we use half of heap up to 30GB).>> >> I'd really appreciate it if we took a second to consider the community effect of another release that ships settings that cause significant pain for our users.>> >> Jon>> >> On 2022/11/10 21:49:36 Mick Semb Wever wrote:  In case of GC, reasonably extensive performance testing should be the expectations. Potentially revisiting some of the G1 params for the 4.1 reality - quite a lot has changed since those optional defaults where picked. >>> >>> >>> I've put our battle-tested g1 opts (from consultants at TLP and DataStax)>>> in the patch for CASSANDRA-18027>>> >>> In reality it is really not much of a change, g1 does make it simple.>>> Picking the correct ParallelGCThreads and ConcGCThreads and the floor to>>> the new heap (XX:NewSize) is still required, though we could do a much>>> better job of dynamic defaults to them.>>> >>> Alex Dejanovski's blog is a starting point:>>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html>>> where this gc opt set was used (though it doesn't prove why those options>>> are chosen)>>> >>> The bar for objection to sneaking these into 4.1 was intended to be low,>>> and I stand by those that raise concerns.>>> > > > > -- > +---+> | Derek Chen-Becker |> | GPG Key available at https

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread David Capwell
Getting poked in Slack to be more explicit in this thread… 

Switching to G1 on trunk, +1
Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a bug 
fix but a perf improvement ticket and as such should go through validation that 
the perf improvements are seen, there is not enough time left for that added 
performance work burden so strongly feel it should be pushed to 4.2/5.0 where 
it has plenty of time to be validated against.  The ticket even asks to avoid 
validating the claims; saying 'Hoping we can skip due diligence on this ticket 
because the data is "in the past” already”'.  Others have attempted both 
shenandoah and ZGC and found mixed results, so nothing leads me to believe that 
won’t be true here either.

> On Nov 16, 2022, at 9:15 AM, J. D. Jordan  wrote:
> 
> Heap -
> +1 for G1 in trunk
> +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
> understand pushback against changing this so late in the game.
> 
> Memtable -
> -1 for off heap in 4.1. I think this needs more testing and isn’t something 
> to change at the last minute.
> +1 for running performance/fuzz tests against the alternate memtable choices 
> in trunk and switching if they don’t show regressions.
> 
>> On Nov 16, 2022, at 10:48 AM, Josh McKenzie  wrote:
>> 
>> 
>> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
>> prioritize digging into G1's behavior on small heaps vs. CMS w/our default 
>> tuning sooner rather than later. With that info I'd likely be a strong +1 on 
>> the shift.
>> 
>> -1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
>> small step away from being a +1 w/some more rigor around seeing the current 
>> state of the technology's intersections.
>> 
>> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
>>> All right. I’ll clarify then.
>>> 
>>> -0 on switching the default to G1 *this late* just before RC1.
>>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it 
>>> in principle, for 4.2, after we run some more test and resolve the concerns 
>>> raised by Jeff.
>>> 
>>> Let’s please try to avoid this kind of super late defaults switch going 
>>> forward?
>>> 
>>> —
>>> AY
>>> 
>>> > On 16 Nov 2022, at 03:27, Derek Chen-Becker  wrote:
>>> > 
>>> > For the record, I'm +100 on G1. Take it with whatever sized grain of
>>> > salt you think appropriate for a relative newcomer to the list, but
>>> > I've spent my last 7-8 years dealing with the intersection of
>>> > high-throughput, low latency systems and their interaction with GC and
>>> > in my personal experience G1 outperforms CMS in all cases and with
>>> > significantly less work (zero work, in many cases). The only things
>>> > I've seen perform better *with a similar heap footprint* are GenShen
>>> > (currently experimental) and Rust (beyond the scope of this topic).
>>> > 
>>> > Derek
>>> > 
>>> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  
>>> > wrote:
>>> >> 
>>> >> I'm curious what it would take for folks to be OK with merging this into 
>>> >> 4.1?  How much additional time would you want to feel comfortable?
>>> >> 
>>> >> I should probably have been a little more vigorous in my +1 of Mick's 
>>> >> PR.  For a little background - I worked on several hundred clusters 
>>> >> while at TLP, mostly dealing with stability and performance issues.  A 
>>> >> lot of them stemmed partially or wholly from the GC settings we ship in 
>>> >> the project. Par New with CMS and small new gen results in a lot of 
>>> >> premature promotion leading to high pause times into the hundreds of ms 
>>> >> which pushes p99 latency through the roof.
>>> >> 
>>> >> I'm a big +1 in favor of G1 because it's not just better for most people 
>>> >> but it's better for _every_ new Cassandra user.  The first experience 
>>> >> that people have with the project is important, and our current GC 
>>> >> settings are quite bad - so bad they lead to problems with stability in 
>>> >> production.  The G1 settings are mostly hands off, result in shorter 
>>> >> pause times and are a big improvement over the status quo.
>>> >> 
>>> >> Most folks don't do GC tuning, they use what we supply, and what we 
>>> >> currently supply leads to a poor initial experience with the database.  
>>> >> I think we owe the community our best effort even if it means pushing 
>>> >> the release back little bit.
>>> >> 
>>> >> Just for some additional context, we're (Netflix) running 25K nodes on 
>>> >> G1 across a variety of hardware in AWS with wildly varying workloads, 
>>> >> and I haven't seen G1 be the root cause of a problem even once.  The 
>>> >> settings that Mick is proposing are almost identical to what we use (we 
>>> >> use half of heap up to 30GB).
>>> >> 
>>> >> I'd really appreciate it if we took a second to consider the community 
>>> >> effect of another release that ships settings that cause significant 
>>> >> pain for our users.
>>> >> 
>>> >> Jon
>>> >> 
>>> >> On 2022

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread C. Scott Andreas
I share David and Aleksey’s views on this.

We shouldn’t make major defaults changes right before RC. Might be worth adding 
a release note recommending users try them, and that they may become default in 
a future release though.

— Scott

> On Nov 16, 2022, at 3:38 PM, David Capwell  wrote:
> 
> Getting poked in Slack to be more explicit in this thread… 
> 
> Switching to G1 on trunk, +1
> Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a bug 
> fix but a perf improvement ticket and as such should go through validation 
> that the perf improvements are seen, there is not enough time left for that 
> added performance work burden so strongly feel it should be pushed to 4.2/5.0 
> where it has plenty of time to be validated against.  The ticket even asks to 
> avoid validating the claims; saying 'Hoping we can skip due diligence on this 
> ticket because the data is "in the past” already”'.  Others have attempted 
> both shenandoah and ZGC and found mixed results, so nothing leads me to 
> believe that won’t be true here either.
> 
>> On Nov 16, 2022, at 9:15 AM, J. D. Jordan  wrote:
>> 
>> Heap -
>> +1 for G1 in trunk
>> +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
>> understand pushback against changing this so late in the game.
>> 
>> Memtable -
>> -1 for off heap in 4.1. I think this needs more testing and isn’t something 
>> to change at the last minute.
>> +1 for running performance/fuzz tests against the alternate memtable choices 
>> in trunk and switching if they don’t show regressions.
>> 
 On Nov 16, 2022, at 10:48 AM, Josh McKenzie  wrote:
>>> 
>>> 
>>> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
>>> prioritize digging into G1's behavior on small heaps vs. CMS w/our default 
>>> tuning sooner rather than later. With that info I'd likely be a strong +1 
>>> on the shift.
>>> 
>>> -1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
>>> small step away from being a +1 w/some more rigor around seeing the current 
>>> state of the technology's intersections.
>>> 
>>> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
 All right. I’ll clarify then.
 
 -0 on switching the default to G1 *this late* just before RC1.
 -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it 
 in principle, for 4.2, after we run some more test and resolve the 
 concerns raised by Jeff.
 
 Let’s please try to avoid this kind of super late defaults switch going 
 forward?
 
 —
 AY
 
> On 16 Nov 2022, at 03:27, Derek Chen-Becker  wrote:
> 
> For the record, I'm +100 on G1. Take it with whatever sized grain of
> salt you think appropriate for a relative newcomer to the list, but
> I've spent my last 7-8 years dealing with the intersection of
> high-throughput, low latency systems and their interaction with GC and
> in my personal experience G1 outperforms CMS in all cases and with
> significantly less work (zero work, in many cases). The only things
> I've seen perform better *with a similar heap footprint* are GenShen
> (currently experimental) and Rust (beyond the scope of this topic).
> 
> Derek
> 
> On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  
> wrote:
>> 
>> I'm curious what it would take for folks to be OK with merging this into 
>> 4.1?  How much additional time would you want to feel comfortable?
>> 
>> I should probably have been a little more vigorous in my +1 of Mick's 
>> PR.  For a little background - I worked on several hundred clusters 
>> while at TLP, mostly dealing with stability and performance issues.  A 
>> lot of them stemmed partially or wholly from the GC settings we ship in 
>> the project. Par New with CMS and small new gen results in a lot of 
>> premature promotion leading to high pause times into the hundreds of ms 
>> which pushes p99 latency through the roof.
>> 
>> I'm a big +1 in favor of G1 because it's not just better for most people 
>> but it's better for _every_ new Cassandra user.  The first experience 
>> that people have with the project is important, and our current GC 
>> settings are quite bad - so bad they lead to problems with stability in 
>> production.  The G1 settings are mostly hands off, result in shorter 
>> pause times and are a big improvement over the status quo.
>> 
>> Most folks don't do GC tuning, they use what we supply, and what we 
>> currently supply leads to a poor initial experience with the database.  
>> I think we owe the community our best effort even if it means pushing 
>> the release back little bit.
>> 
>> Just for some additional context, we're (Netflix) running 25K nodes on 
>> G1 across a variety of hardware in AWS with wildly varying workloads, 
>> and I haven't seen G1 be the root cause of a problem ev

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread Derek Chen-Becker
I'm fine with not including G1 in 4.1, but would we consider inclusion
for 4.1.X down the road once validation has been done?

Derek


On Wed, Nov 16, 2022 at 4:39 PM David Capwell  wrote:
>
> Getting poked in Slack to be more explicit in this thread…
>
> Switching to G1 on trunk, +1
> Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a bug 
> fix but a perf improvement ticket and as such should go through validation 
> that the perf improvements are seen, there is not enough time left for that 
> added performance work burden so strongly feel it should be pushed to 4.2/5.0 
> where it has plenty of time to be validated against.  The ticket even asks to 
> avoid validating the claims; saying 'Hoping we can skip due diligence on this 
> ticket because the data is "in the past” already”'.  Others have attempted 
> both shenandoah and ZGC and found mixed results, so nothing leads me to 
> believe that won’t be true here either.
>
> > On Nov 16, 2022, at 9:15 AM, J. D. Jordan  wrote:
> >
> > Heap -
> > +1 for G1 in trunk
> > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
> > understand pushback against changing this so late in the game.
> >
> > Memtable -
> > -1 for off heap in 4.1. I think this needs more testing and isn’t something 
> > to change at the last minute.
> > +1 for running performance/fuzz tests against the alternate memtable 
> > choices in trunk and switching if they don’t show regressions.
> >
> >> On Nov 16, 2022, at 10:48 AM, Josh McKenzie  wrote:
> >>
> >> 
> >> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
> >> prioritize digging into G1's behavior on small heaps vs. CMS w/our default 
> >> tuning sooner rather than later. With that info I'd likely be a strong +1 
> >> on the shift.
> >>
> >> -1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
> >> small step away from being a +1 w/some more rigor around seeing the 
> >> current state of the technology's intersections.
> >>
> >> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
> >>> All right. I’ll clarify then.
> >>>
> >>> -0 on switching the default to G1 *this late* just before RC1.
> >>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it 
> >>> in principle, for 4.2, after we run some more test and resolve the 
> >>> concerns raised by Jeff.
> >>>
> >>> Let’s please try to avoid this kind of super late defaults switch going 
> >>> forward?
> >>>
> >>> —
> >>> AY
> >>>
> >>> > On 16 Nov 2022, at 03:27, Derek Chen-Becker  
> >>> > wrote:
> >>> >
> >>> > For the record, I'm +100 on G1. Take it with whatever sized grain of
> >>> > salt you think appropriate for a relative newcomer to the list, but
> >>> > I've spent my last 7-8 years dealing with the intersection of
> >>> > high-throughput, low latency systems and their interaction with GC and
> >>> > in my personal experience G1 outperforms CMS in all cases and with
> >>> > significantly less work (zero work, in many cases). The only things
> >>> > I've seen perform better *with a similar heap footprint* are GenShen
> >>> > (currently experimental) and Rust (beyond the scope of this topic).
> >>> >
> >>> > Derek
> >>> >
> >>> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  
> >>> > wrote:
> >>> >>
> >>> >> I'm curious what it would take for folks to be OK with merging this 
> >>> >> into 4.1?  How much additional time would you want to feel comfortable?
> >>> >>
> >>> >> I should probably have been a little more vigorous in my +1 of Mick's 
> >>> >> PR.  For a little background - I worked on several hundred clusters 
> >>> >> while at TLP, mostly dealing with stability and performance issues.  A 
> >>> >> lot of them stemmed partially or wholly from the GC settings we ship 
> >>> >> in the project. Par New with CMS and small new gen results in a lot of 
> >>> >> premature promotion leading to high pause times into the hundreds of 
> >>> >> ms which pushes p99 latency through the roof.
> >>> >>
> >>> >> I'm a big +1 in favor of G1 because it's not just better for most 
> >>> >> people but it's better for _every_ new Cassandra user.  The first 
> >>> >> experience that people have with the project is important, and our 
> >>> >> current GC settings are quite bad - so bad they lead to problems with 
> >>> >> stability in production.  The G1 settings are mostly hands off, result 
> >>> >> in shorter pause times and are a big improvement over the status quo.
> >>> >>
> >>> >> Most folks don't do GC tuning, they use what we supply, and what we 
> >>> >> currently supply leads to a poor initial experience with the database. 
> >>> >>  I think we owe the community our best effort even if it means pushing 
> >>> >> the release back little bit.
> >>> >>
> >>> >> Just for some additional context, we're (Netflix) running 25K nodes on 
> >>> >> G1 across a variety of hardware in AWS with wildly varying workloads, 
> >>> >> and I haven't seen G1 be the root cause of a problem

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-16 Thread C. Scott Andreas
We have precedent for changing defaults that have near-universal positive 
impact in patchlevel releases, yep.

disk_access_mode: auto -> mmap_index_only comes to mind.

- Scott

> On Nov 16, 2022, at 6:49 PM, Derek Chen-Becker  wrote:
> 
> I'm fine with not including G1 in 4.1, but would we consider inclusion
> for 4.1.X down the road once validation has been done?
> 
> Derek
> 
> 
>> On Wed, Nov 16, 2022 at 4:39 PM David Capwell  wrote:
>> 
>> Getting poked in Slack to be more explicit in this thread…
>> 
>> Switching to G1 on trunk, +1
>> Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a 
>> bug fix but a perf improvement ticket and as such should go through 
>> validation that the perf improvements are seen, there is not enough time 
>> left for that added performance work burden so strongly feel it should be 
>> pushed to 4.2/5.0 where it has plenty of time to be validated against.  The 
>> ticket even asks to avoid validating the claims; saying 'Hoping we can skip 
>> due diligence on this ticket because the data is "in the past” already”'.  
>> Others have attempted both shenandoah and ZGC and found mixed results, so 
>> nothing leads me to believe that won’t be true here either.
>> 
 On Nov 16, 2022, at 9:15 AM, J. D. Jordan  
 wrote:
>>> 
>>> Heap -
>>> +1 for G1 in trunk
>>> +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
>>> understand pushback against changing this so late in the game.
>>> 
>>> Memtable -
>>> -1 for off heap in 4.1. I think this needs more testing and isn’t something 
>>> to change at the last minute.
>>> +1 for running performance/fuzz tests against the alternate memtable 
>>> choices in trunk and switching if they don’t show regressions.
>>> 
 On Nov 16, 2022, at 10:48 AM, Josh McKenzie  wrote:
 
 
 To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
 prioritize digging into G1's behavior on small heaps vs. CMS w/our default 
 tuning sooner rather than later. With that info I'd likely be a strong +1 
 on the shift.
 
 -1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
 small step away from being a +1 w/some more rigor around seeing the 
 current state of the technology's intersections.
 
 On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
> All right. I’ll clarify then.
> 
> -0 on switching the default to G1 *this late* just before RC1.
> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it 
> in principle, for 4.2, after we run some more test and resolve the 
> concerns raised by Jeff.
> 
> Let’s please try to avoid this kind of super late defaults switch going 
> forward?
> 
> —
> AY
> 
>> On 16 Nov 2022, at 03:27, Derek Chen-Becker  
>> wrote:
>> 
>> For the record, I'm +100 on G1. Take it with whatever sized grain of
>> salt you think appropriate for a relative newcomer to the list, but
>> I've spent my last 7-8 years dealing with the intersection of
>> high-throughput, low latency systems and their interaction with GC and
>> in my personal experience G1 outperforms CMS in all cases and with
>> significantly less work (zero work, in many cases). The only things
>> I've seen perform better *with a similar heap footprint* are GenShen
>> (currently experimental) and Rust (beyond the scope of this topic).
>> 
>> Derek
>> 
>> On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad  
>> wrote:
>>> 
>>> I'm curious what it would take for folks to be OK with merging this 
>>> into 4.1?  How much additional time would you want to feel comfortable?
>>> 
>>> I should probably have been a little more vigorous in my +1 of Mick's 
>>> PR.  For a little background - I worked on several hundred clusters 
>>> while at TLP, mostly dealing with stability and performance issues.  A 
>>> lot of them stemmed partially or wholly from the GC settings we ship in 
>>> the project. Par New with CMS and small new gen results in a lot of 
>>> premature promotion leading to high pause times into the hundreds of ms 
>>> which pushes p99 latency through the roof.
>>> 
>>> I'm a big +1 in favor of G1 because it's not just better for most 
>>> people but it's better for _every_ new Cassandra user.  The first 
>>> experience that people have with the project is important, and our 
>>> current GC settings are quite bad - so bad they lead to problems with 
>>> stability in production.  The G1 settings are mostly hands off, result 
>>> in shorter pause times and are a big improvement over the status quo.
>>> 
>>> Most folks don't do GC tuning, they use what we supply, and what we 
>>> currently supply leads to a poor initial experience with the database.  
>>> I think we owe the community our best effort even if it means pushing 
>>> the