Re: Should we change 4.1 to G1 and offheap_objects ?
Ok, wrt G1 default, this is won't go ahead for 4.1-rc1 We can revisit it for 4.1.x We have a lot of voices here adamantly positive for it, and those of us that have done the performance testing over the years know why. But being called to prove it is totally valid, if you have data to any such tests please add them to the ticket 18027
[ANNOUNCE] Apache Cassandra 4.1-rc1 test artifact available
The test build of Cassandra 4.1-rc1 is available. sha1: d6822c45ae3d476bc2ff674cedf7d4107b8ca2d0 Git: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.1-rc1-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1280/org/apache/cassandra/cassandra-all/4.1-rc1/ The Source and Build Artifacts, and the Debian and RPM packages and repositories, are available here: https://dist.apache.org/repos/dist/dev/cassandra/4.1-rc1/ A vote of this test build will be initiated tomorrow. [1]: CHANGES.txt: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.1-rc1-tentative [2]: NEWS.txt: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.1-rc1-tentative
Re: Should we change 4.1 to G1 and offheap_objects ?
I noticed nobody answered my actual question - what would it take for you to be comfortable? It seems that the need to do a release is now more important than the best interests of the new user's experience - despite having plenty of *production* experience showing that what we ship isn't even remotely close to usable. I tried to offer a compromise, and it's not cool with me that it was ignored by everyone objecting. Jon On 2022/11/17 08:34:53 Mick Semb Wever wrote: > Ok, wrt G1 default, this is won't go ahead for 4.1-rc1 > > We can revisit it for 4.1.x > > We have a lot of voices here adamantly positive for it, and those of us > that have done the performance testing over the years know why. But being > called to prove it is totally valid, if you have data to any such tests > please add them to the ticket 18027 >
Re: Should we change 4.1 to G1 and offheap_objects ?
Have we ever discussed including multiple profiles that are simple to swap between and documented for their tested / intended use cases? Then the burden of having a “sane” default for the wild variance of workloads people use it for would be somewhat mitigated. Sure, there’s always going to be folks that run the default and never think to change it but the UX could be as simple as a one line config change to swap between GC profiles and we could add and deprecate / remove over time. Concretely, having config files such as: > jvm11-CMS-write.options > jvm11-CMS-mixed.options > jvm11-CMS-read.options > jvm11-G1.options > jvm11-ZGC.options > jvm11-Shen.options Arguably we could take it a step further and not actually allow a C* node to startup without pointing to one of the config files from your primary config, and provide a clean mechanism to integrate that selection on headless installs. Notably, this could be a terrible idea. But it *does* seem like we keep butting up against the complexity and mixed pressures of having the One True Way to GC via the default config and the lift to change that. On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote: > I'm fine with not including G1 in 4.1, but would we consider inclusion > for 4.1.X down the road once validation has been done? > > Derek > > > On Wed, Nov 16, 2022 at 4:39 PM David Capwell wrote: > > > > Getting poked in Slack to be more explicit in this thread… > > > > Switching to G1 on trunk, +1 > > Switching to G1 on 4.1, -1. 4.1 is about to be released and this isn’t a > > bug fix but a perf improvement ticket and as such should go through > > validation that the perf improvements are seen, there is not enough time > > left for that added performance work burden so strongly feel it should be > > pushed to 4.2/5.0 where it has plenty of time to be validated against. The > > ticket even asks to avoid validating the claims; saying 'Hoping we can skip > > due diligence on this ticket because the data is "in the past” already”'. > > Others have attempted both shenandoah and ZGC and found mixed results, so > > nothing leads me to believe that won’t be true here either. > > > > > On Nov 16, 2022, at 9:15 AM, J. D. Jordan > > > wrote: > > > > > > Heap - > > > +1 for G1 in trunk > > > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I > > > understand pushback against changing this so late in the game. > > > > > > Memtable - > > > -1 for off heap in 4.1. I think this needs more testing and isn’t > > > something to change at the last minute. > > > +1 for running performance/fuzz tests against the alternate memtable > > > choices in trunk and switching if they don’t show regressions. > > > > > >> On Nov 16, 2022, at 10:48 AM, Josh McKenzie wrote: > > >> > > >> > > >> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to > > >> prioritize digging into G1's behavior on small heaps vs. CMS w/our > > >> default tuning sooner rather than later. With that info I'd likely be a > > >> strong +1 on the shift. > > >> > > >> -1 on switching to offheap_objects for 4.1 RC; again, think this is just > > >> a small step away from being a +1 w/some more rigor around seeing the > > >> current state of the technology's intersections. > > >> > > >> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote: > > >>> All right. I’ll clarify then. > > >>> > > >>> -0 on switching the default to G1 *this late* just before RC1. > > >>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for > > >>> it in principle, for 4.2, after we run some more test and resolve the > > >>> concerns raised by Jeff. > > >>> > > >>> Let’s please try to avoid this kind of super late defaults switch going > > >>> forward? > > >>> > > >>> — > > >>> AY > > >>> > > >>> > On 16 Nov 2022, at 03:27, Derek Chen-Becker > > >>> > wrote: > > >>> > > > >>> > For the record, I'm +100 on G1. Take it with whatever sized grain of > > >>> > salt you think appropriate for a relative newcomer to the list, but > > >>> > I've spent my last 7-8 years dealing with the intersection of > > >>> > high-throughput, low latency systems and their interaction with GC and > > >>> > in my personal experience G1 outperforms CMS in all cases and with > > >>> > significantly less work (zero work, in many cases). The only things > > >>> > I've seen perform better *with a similar heap footprint* are GenShen > > >>> > (currently experimental) and Rust (beyond the scope of this topic). > > >>> > > > >>> > Derek > > >>> > > > >>> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad > > >>> > wrote: > > >>> >> > > >>> >> I'm curious what it would take for folks to be OK with merging this > > >>> >> into 4.1? How much additional time would you want to feel > > >>> >> comfortable? > > >>> >> > > >>> >> I should probably have been a little more vigorous in my +1 of > > >>> >> Mick's PR. For a little background - I worked on several hundred > > >>> >> c
Re: Should we change 4.1 to G1 and offheap_objects ?
On Thu, Nov 17, 2022 at 9:06 AM Josh McKenzie wrote: > > Arguably we could take it a step further and not actually allow a C* node to > startup without pointing to one of the config files from your primary config, > and provide a clean mechanism to integrate that selection on headless > installs. We could also automatically choose one based on the heap size (when we by default have to automatically choose that as well.)
Re: Should we change 4.1 to G1 and offheap_objects ?
I'm surprised we released 4.0 without changing the default to G1 given that many Cassandra deployments have changed the project's default because it is incorrect. I know that 7486 broke a user 7 years ago, but I think we have had a ton of testing since then in the community to build our confidence. Not to mention that Java 9+ (released 2017) made G1 the default and Java 14 (2020) removes CMS entirely. I have personally done targeted AB testing of G1GC vs CMS in a controlled fashion using NDBench and our team had enough confidence in ~2019 to roll it to Netflix's entire fleet of O(1k) clusters and O(10k) instances running Java 8. We found it vastly superior to CMS in practically every way (no more 10s+ compacting STW phases after heap fragmentation, better tail latency at a coordinator/replica level, better average throughput, etc ...), and only identified a single very minor p99 regression on one cluster (~5%) which we didn't consider severe enough to roll back. Right now our project defaults are hurting 99 users to help 1; let that one user change the defaults? 4.1 seems like a great place to fix the bug, absent being able to do that let's at least fix it in trunk? -Joey On Thu, Nov 17, 2022 at 8:27 AM Jon Haddad wrote: > > I noticed nobody answered my actual question - what would it take for you to > be comfortable? > > It seems that the need to do a release is now more important than the best > interests of the new user's experience - despite having plenty of > *production* experience showing that what we ship isn't even remotely close > to usable. > > I tried to offer a compromise, and it's not cool with me that it was ignored > by everyone objecting. > > Jon > > On 2022/11/17 08:34:53 Mick Semb Wever wrote: > > Ok, wrt G1 default, this is won't go ahead for 4.1-rc1 > > > > We can revisit it for 4.1.x > > > > We have a lot of voices here adamantly positive for it, and those of us > > that have done the performance testing over the years know why. But being > > called to prove it is totally valid, if you have data to any such tests > > please add them to the ticket 18027 > >
Re: Should we change 4.1 to G1 and offheap_objects ?
Jon, thanks for flagging that I didn't get a reply to your question on the thread.My main point in this thread is that I don't think post-beta is an appropriate time for a major prop change like this in the release cycle. Ideally at this point in the release cycle, major contributors and large users of Cassandra are running the build at minimum in pre-production environments, and hopefully in production environments too. Prop changes reset much of what's been learned by exercising the beta shortly before RC.Adding some detail on your question re: G1 – which mostly boils down to some experience to the contrary. I don't have data from past tests easily accessible to me, so I'm writing from memory and deductive reasoning here.The problem of garbage collection is minimizing a function of "memory overhead required to safely operate, program pause time, and CPU time burnt." ParNew+CMS are throughput-oriented collectors that commonly have higher throughput, lower CPU usage, and higher pause times than newer collectors like G1 and Shenandoah. This is a poor tradeoff for most applications.Cassandra is unique here: internode requests speculate, masking latency within cluster that can be incurred by the pause phase of a collection. The Java Driver is also great at speculating, masking latency of a coordinator that may be pausing for a collection as well. While ParNew+CMS are an objectively poor choice for many systems, Cassandra's architecture as a majority-quorum database that can speculate both at the client and coordinator level avoids the worst of those pitfalls.In cases where I and my colleagues have evaluated other collectors like G1 and Shenandoah, we've found lower pause times, ~unchanged or slightly higher client latency, and lower throughput. G1 testing may predate me, so I'll offer a more recent Shenandoah example. In a ~12-instance cluster that runs hot - averaging about 80% CPU - enabling Shenandoah resulted in about 5-10% lower request throughput after a couple days and a roughly equal increase in latency. While its micro-pause behavior was nice relative to ParNew's ~100-200ms pauses, it didn't make much of a difference due to internode and client speculation around it.Again, my point in this thread is that I wouldn't alter defaults on the eve of an RC in a release cycle. We do know this will need to change soon. CMS is gone in JDK17, so consider this email an elegy :). As part of JDK17 readiness, our collector defaults must change. If someone is interested in picking up the work, I think now would be a great time to perform that measurement and propose new defaults for the project based on it - and I don't even have an objection to those landing in a patchlevel release if the measurements look really good.But I wouldn't change the defaults on the eve of RC.– ScottOn Nov 17, 2022, at 7:26 AM, Joseph Lynch wrote:I'm surprised we released 4.0 without changing the default to G1 giventhat many Cassandra deployments have changed the project's defaultbecause it is incorrect. I know that 7486 broke a user 7 years ago,but I think we have had a ton of testing since then in the communityto build our confidence. Not to mention that Java 9+ (released 2017)made G1 the default and Java 14 (2020) removes CMS entirely.I have personally done targeted AB testing of G1GC vs CMS in acontrolled fashion using NDBench and our team had enough confidence in~2019 to roll it to Netflix's entire fleet of O(1k) clusters andO(10k) instances running Java 8. We found it vastly superior to CMS inpractically every way (no more 10s+ compacting STW phases after heapfragmentation, better tail latency at a coordinator/replica level,better average throughput, etc ...), and only identified a single veryminor p99 regression on one cluster (~5%) which we didn't considersevere enough to roll back.Right now our project defaults are hurting 99 users to help 1; letthat one user change the defaults? 4.1 seems like a great place to fixthe bug, absent being able to do that let's at least fix it in trunk?-JoeyOn Thu, Nov 17, 2022 at 8:27 AM Jon Haddad wrote:I noticed nobody answered my actual question - what would it take for you to be comfortable?It seems that the need to do a release is now more important than the best interests of the new user's experience - despite having plenty of *production* experience showing that what we ship isn't even remotely close to usable.I tried to offer a compromise, and it's not cool with me that it was ignored by everyone objecting.JonOn 2022/11/17 08:34:53 Mick Semb Wever wrote:> Ok, wrt G1 default, this is won't go ahead for 4.1-rc1>> We can revisit it for 4.1.x>> We have a lot of voices here adamantly positive for it, and those of us> that have done the performance testing over the years know why. But being> called to prove it is totally valid, if you have data to any such tests> please add them to the ticket 18027>
Re: Should we change 4.1 to G1 and offheap_objects ?
It seems like this is a choice most users might not know how to make? On Thu, Nov 17, 2022 at 7:06 AM Josh McKenzie wrote: > > Have we ever discussed including multiple profiles that are simple to swap > between and documented for their tested / intended use cases? > > Then the burden of having a “sane” default for the wild variance of workloads > people use it for would be somewhat mitigated. Sure, there’s always going to > be folks that run the default and never think to change it but the UX could > be as simple as a one line config change to swap between GC profiles and we > could add and deprecate / remove over time. > > Concretely, having config files such as: > > jvm11-CMS-write.options > jvm11-CMS-mixed.options > jvm11-CMS-read.options > jvm11-G1.options > jvm11-ZGC.options > jvm11-Shen.options > > > Arguably we could take it a step further and not actually allow a C* node to > startup without pointing to one of the config files from your primary config, > and provide a clean mechanism to integrate that selection on headless > installs. > > Notably, this could be a terrible idea. But it does seem like we keep butting > up against the complexity and mixed pressures of having the One True Way to > GC via the default config and the lift to change that. > > On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote: > > I'm fine with not including G1 in 4.1, but would we consider inclusion > for 4.1.X down the road once validation has been done? > > Derek > > > On Wed, Nov 16, 2022 at 4:39 PM David Capwell wrote: > > > > Getting poked in Slack to be more explicit in this thread… > > > > Switching to G1 on trunk, +1 > > Switching to G1 on 4.1, -1. 4.1 is about to be released and this isn’t a > > bug fix but a perf improvement ticket and as such should go through > > validation that the perf improvements are seen, there is not enough time > > left for that added performance work burden so strongly feel it should be > > pushed to 4.2/5.0 where it has plenty of time to be validated against. The > > ticket even asks to avoid validating the claims; saying 'Hoping we can skip > > due diligence on this ticket because the data is "in the past” already”'. > > Others have attempted both shenandoah and ZGC and found mixed results, so > > nothing leads me to believe that won’t be true here either. > > > > > On Nov 16, 2022, at 9:15 AM, J. D. Jordan > > > wrote: > > > > > > Heap - > > > +1 for G1 in trunk > > > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I > > > understand pushback against changing this so late in the game. > > > > > > Memtable - > > > -1 for off heap in 4.1. I think this needs more testing and isn’t > > > something to change at the last minute. > > > +1 for running performance/fuzz tests against the alternate memtable > > > choices in trunk and switching if they don’t show regressions. > > > > > >> On Nov 16, 2022, at 10:48 AM, Josh McKenzie wrote: > > >> > > >> > > >> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to > > >> prioritize digging into G1's behavior on small heaps vs. CMS w/our > > >> default tuning sooner rather than later. With that info I'd likely be a > > >> strong +1 on the shift. > > >> > > >> -1 on switching to offheap_objects for 4.1 RC; again, think this is just > > >> a small step away from being a +1 w/some more rigor around seeing the > > >> current state of the technology's intersections. > > >> > > >> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote: > > >>> All right. I’ll clarify then. > > >>> > > >>> -0 on switching the default to G1 *this late* just before RC1. > > >>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for > > >>> it in principle, for 4.2, after we run some more test and resolve the > > >>> concerns raised by Jeff. > > >>> > > >>> Let’s please try to avoid this kind of super late defaults switch going > > >>> forward? > > >>> > > >>> — > > >>> AY > > >>> > > >>> > On 16 Nov 2022, at 03:27, Derek Chen-Becker > > >>> > wrote: > > >>> > > > >>> > For the record, I'm +100 on G1. Take it with whatever sized grain of > > >>> > salt you think appropriate for a relative newcomer to the list, but > > >>> > I've spent my last 7-8 years dealing with the intersection of > > >>> > high-throughput, low latency systems and their interaction with GC and > > >>> > in my personal experience G1 outperforms CMS in all cases and with > > >>> > significantly less work (zero work, in many cases). The only things > > >>> > I've seen perform better *with a similar heap footprint* are GenShen > > >>> > (currently experimental) and Rust (beyond the scope of this topic). > > >>> > > > >>> > Derek > > >>> > > > >>> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad > > >>> > wrote: > > >>> >> > > >>> >> I'm curious what it would take for folks to be OK with merging this > > >>> >> into 4.1? How much additional time would you want to feel > > >>> >> comfortable? > > >>> >>
Re: [ANNOUNCE] Apache Cassandra 4.1-rc1 test artifact available
> > The test build of Cassandra 4.1-rc1 is available. > Our requirements for 4.1-rc were one green CI run. And no regression flakies (except the two we have waivers for). The green circleci runs are here: - https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/40/workflows/fa742c33-7d3c-41ea-a52f-44eb760860ea - https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/40/workflows/2106a359-4d84-41dd-bfbe-c1dbce0139ad And the ci-cassandra.a.o run, with 12 flakies, https://ci-cassandra.apache.org/job/Cassandra-4.1/221/
Re: Should we change 4.1 to G1 and offheap_objects ?
-1 on providing a bunch of choices and forcing users to pick one. We should have a default and it should be “good enough” for most people. The people who want to dig in and try other gc settings can still do it, and we could provide them some profiles to start from, but there needs to be a default. We need to be asking new operators less questions on install, not more. Re:experience with Shenandoah under high load, I have in the past seen the exact same thing for both Shenandoah and ZGC. Both of them have issues at high loads while performing great at moderate loads. I have not seen G1 ever have such issues. So I would not be fine with a switch to Shenandoah or ZGC as the default without extensive testing on current JVM versions that have hopefully improved the behavior under load. > On Nov 17, 2022, at 9:39 AM, Joseph Lynch wrote: > It seems like this is a choice most users might not know how to make? > > On Thu, Nov 17, 2022 at 7:06 AM Josh McKenzie wrote: >> >> Have we ever discussed including multiple profiles that are simple to swap >> between and documented for their tested / intended use cases? >> >> Then the burden of having a “sane” default for the wild variance of >> workloads people use it for would be somewhat mitigated. Sure, there’s >> always going to be folks that run the default and never think to change it >> but the UX could be as simple as a one line config change to swap between GC >> profiles and we could add and deprecate / remove over time. >> >> Concretely, having config files such as: >> >> jvm11-CMS-write.options >> jvm11-CMS-mixed.options >> jvm11-CMS-read.options >> jvm11-G1.options >> jvm11-ZGC.options >> jvm11-Shen.options >> >> >> Arguably we could take it a step further and not actually allow a C* node to >> startup without pointing to one of the config files from your primary >> config, and provide a clean mechanism to integrate that selection on >> headless installs. >> >> Notably, this could be a terrible idea. But it does seem like we keep >> butting up against the complexity and mixed pressures of having the One True >> Way to GC via the default config and the lift to change that. >> >> On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote: >> >> I'm fine with not including G1 in 4.1, but would we consider inclusion >> for 4.1.X down the road once validation has been done? >> >> Derek >> >> >> On Wed, Nov 16, 2022 at 4:39 PM David Capwell wrote: >>> Getting poked in Slack to be more explicit in this thread… >>> Switching to G1 on trunk, +1 >>> Switching to G1 on 4.1, -1. 4.1 is about to be released and this isn’t a >>> bug fix but a perf improvement ticket and as such should go through >>> validation that the perf improvements are seen, there is not enough time >>> left for that added performance work burden so strongly feel it should be >>> pushed to 4.2/5.0 where it has plenty of time to be validated against. The >>> ticket even asks to avoid validating the claims; saying 'Hoping we can skip >>> due diligence on this ticket because the data is "in the past” already”'. >>> Others have attempted both shenandoah and ZGC and found mixed results, so >>> nothing leads me to believe that won’t be true here either. On Nov 16, 2022, at 9:15 AM, J. D. Jordan wrote: Heap - +1 for G1 in trunk +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I understand pushback against changing this so late in the game. Memtable - -1 for off heap in 4.1. I think this needs more testing and isn’t something to change at the last minute. +1 for running performance/fuzz tests against the alternate memtable choices in trunk and switching if they don’t show regressions. > On Nov 16, 2022, at 10:48 AM, Josh McKenzie wrote: > > To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to > prioritize digging into G1's behavior on small heaps vs. CMS w/our > default tuning sooner rather than later. With that info I'd likely be a > strong +1 on the shift. > -1 on switching to offheap_objects for 4.1 RC; again, think this is just > a small step away from being a +1 w/some more rigor around seeing the > current state of the technology's intersections. > On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote: >> All right. I’ll clarify then. >> -0 on switching the default to G1 *this late* just before RC1. >> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for >> it in principle, for 4.2, after we run some more test and resolve the >> concerns raised by Jeff. >> Let’s please try to avoid this kind of super late defaults switch going >> forward? >> — >> AY >>> On 16 Nov 2022, at 03:27, Derek Chen-Becker >>> wrote: >>> For the record, I'm +100 on G1. Take it with whatever sized grain of >>> salt you think appropriate for a relative newcomer to
Re: Should we change 4.1 to G1 and offheap_objects ?
> -1 on providing a bunch of choices and forcing users to pick one. We should > have a default and it should be “good enough” for most people. These are 2 different things (providing choices and whether we provide a default). Sounds like you're against both not having a default *and* providing choices independently; I assume you're not in favor of having something "good enough" as the default but also providing other tuning options should operators be interested in testing them out? I could see there being potentially 3 tiers of operator expertise / interest in this space: 1) No interest. Give me a good enough default; I don't want to think about this. 2) Moderate expertise. Give me a one line config change where I can bounce 3 nodes in a replica set to 3 different pre-configured profiles and see how it works for my workloads and pick one. 3) Expert: Leave me alone. I tune my own GC So the above is possibly moot if we don't have the resources on the project to *test and provide* alternative GC profiles, but it sounds to me like we're not actually short on differently tuned GC config but are instead butting up against timing relative to release + view on what the right default should be. On Thu, Nov 17, 2022, at 3:47 PM, J. D. Jordan wrote: > -1 on providing a bunch of choices and forcing users to pick one. We should > have a default and it should be “good enough” for most people. The people who > want to dig in and try other gc settings can still do it, and we could > provide them some profiles to start from, but there needs to be a default. > We need to be asking new operators less questions on install, not more. > > Re:experience with Shenandoah under high load, I have in the past seen the > exact same thing for both Shenandoah and ZGC. Both of them have issues at > high loads while performing great at moderate loads. I have not seen G1 ever > have such issues. So I would not be fine with a switch to Shenandoah or ZGC > as the default without extensive testing on current JVM versions that have > hopefully improved the behavior under load. > > > On Nov 17, 2022, at 9:39 AM, Joseph Lynch wrote: > > It seems like this is a choice most users might not know how to make? > > > > On Thu, Nov 17, 2022 at 7:06 AM Josh McKenzie wrote: > >> > >> Have we ever discussed including multiple profiles that are simple to swap > >> between and documented for their tested / intended use cases? > >> > >> Then the burden of having a “sane” default for the wild variance of > >> workloads people use it for would be somewhat mitigated. Sure, there’s > >> always going to be folks that run the default and never think to change it > >> but the UX could be as simple as a one line config change to swap between > >> GC profiles and we could add and deprecate / remove over time. > >> > >> Concretely, having config files such as: > >> > >> jvm11-CMS-write.options > >> jvm11-CMS-mixed.options > >> jvm11-CMS-read.options > >> jvm11-G1.options > >> jvm11-ZGC.options > >> jvm11-Shen.options > >> > >> > >> Arguably we could take it a step further and not actually allow a C* node > >> to startup without pointing to one of the config files from your primary > >> config, and provide a clean mechanism to integrate that selection on > >> headless installs. > >> > >> Notably, this could be a terrible idea. But it does seem like we keep > >> butting up against the complexity and mixed pressures of having the One > >> True Way to GC via the default config and the lift to change that. > >> > >> On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote: > >> > >> I'm fine with not including G1 in 4.1, but would we consider inclusion > >> for 4.1.X down the road once validation has been done? > >> > >> Derek > >> > >> > >> On Wed, Nov 16, 2022 at 4:39 PM David Capwell wrote: > >>> Getting poked in Slack to be more explicit in this thread… > >>> Switching to G1 on trunk, +1 > >>> Switching to G1 on 4.1, -1. 4.1 is about to be released and this isn’t a > >>> bug fix but a perf improvement ticket and as such should go through > >>> validation that the perf improvements are seen, there is not enough time > >>> left for that added performance work burden so strongly feel it should be > >>> pushed to 4.2/5.0 where it has plenty of time to be validated against. > >>> The ticket even asks to avoid validating the claims; saying 'Hoping we > >>> can skip due diligence on this ticket because the data is "in the past” > >>> already”'. Others have attempted both shenandoah and ZGC and found mixed > >>> results, so nothing leads me to believe that won’t be true here either. > On Nov 16, 2022, at 9:15 AM, J. D. Jordan > wrote: > Heap - > +1 for G1 in trunk > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I > understand pushback against changing this so late in the game. > Memtable - > -1 for off heap in 4.1. I think this needs more testin
Re: Should we change 4.1 to G1 and offheap_objects ?
On Thu, Nov 17, 2022 at 12:47 PM J. D. Jordan wrote: > -1 on providing a bunch of choices and forcing users to pick one. We > should have a default and it should be “good enough” for most people. The > people who want to dig in and try other gc settings can still do it, and we > could provide them some profiles to start from, but there needs to be a > default. We need to be asking new operators less questions on install, not > more. > > Re:experience with Shenandoah under high load, I have in the past seen the > exact same thing for both Shenandoah and ZGC. Both of them have issues at > high loads while performing great at moderate loads. I have not seen G1 > ever have such issues. So I would not be fine with a switch to Shenandoah > or ZGC as the default without extensive testing on current JVM versions > that have hopefully improved the behavior under load > I have personally reverted hundreds of machines off of G1 with 12G heaps on jdk8, where (intelligently tuned) CMS with the same workload/heap size was fine. It was many years ago, and G1 has changed a lot, but the "zero problems with G1" is AT LEAST 1 problem with G1, by someone who knew how both Cassandra and the JVM works.
Re: Should we change 4.1 to G1 and offheap_objects ?
I wouldn't recommend Shenandoah or ZGC, period. They're not designed for the kind of workload you'll typically see running a database (high throughput of objects that don't tenure) and both will fall over in interesting ways under high allocation rate. GenShen is intended to combine the generational goodness of G1 with the concurrent collection of Shenandoah, and will likely perform better than G1 in terms of pause time and better than Shenandoah for allocation rate and heap utilization. GenShen, however, is experimental. Right now I would say G1 is the best collector generally available. In terms of providing data (beyond anecdotes), do we even agree on what the baseline load test looks like? Are we going off of something that's in dtest, or do we have a defined benchmarking suite somewhere? Cheers, Derek On Thu, Nov 17, 2022 at 1:47 PM J. D. Jordan wrote: > > -1 on providing a bunch of choices and forcing users to pick one. We should > have a default and it should be “good enough” for most people. The people who > want to dig in and try other gc settings can still do it, and we could > provide them some profiles to start from, but there needs to be a default. > We need to be asking new operators less questions on install, not more. > > Re:experience with Shenandoah under high load, I have in the past seen the > exact same thing for both Shenandoah and ZGC. Both of them have issues at > high loads while performing great at moderate loads. I have not seen G1 ever > have such issues. So I would not be fine with a switch to Shenandoah or ZGC > as the default without extensive testing on current JVM versions that have > hopefully improved the behavior under load. > > > On Nov 17, 2022, at 9:39 AM, Joseph Lynch wrote: > > It seems like this is a choice most users might not know how to make? > > > > On Thu, Nov 17, 2022 at 7:06 AM Josh McKenzie wrote: > >> > >> Have we ever discussed including multiple profiles that are simple to swap > >> between and documented for their tested / intended use cases? > >> > >> Then the burden of having a “sane” default for the wild variance of > >> workloads people use it for would be somewhat mitigated. Sure, there’s > >> always going to be folks that run the default and never think to change it > >> but the UX could be as simple as a one line config change to swap between > >> GC profiles and we could add and deprecate / remove over time. > >> > >> Concretely, having config files such as: > >> > >> jvm11-CMS-write.options > >> jvm11-CMS-mixed.options > >> jvm11-CMS-read.options > >> jvm11-G1.options > >> jvm11-ZGC.options > >> jvm11-Shen.options > >> > >> > >> Arguably we could take it a step further and not actually allow a C* node > >> to startup without pointing to one of the config files from your primary > >> config, and provide a clean mechanism to integrate that selection on > >> headless installs. > >> > >> Notably, this could be a terrible idea. But it does seem like we keep > >> butting up against the complexity and mixed pressures of having the One > >> True Way to GC via the default config and the lift to change that. > >> > >> On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote: > >> > >> I'm fine with not including G1 in 4.1, but would we consider inclusion > >> for 4.1.X down the road once validation has been done? > >> > >> Derek > >> > >> > >> On Wed, Nov 16, 2022 at 4:39 PM David Capwell wrote: > >>> Getting poked in Slack to be more explicit in this thread… > >>> Switching to G1 on trunk, +1 > >>> Switching to G1 on 4.1, -1. 4.1 is about to be released and this isn’t a > >>> bug fix but a perf improvement ticket and as such should go through > >>> validation that the perf improvements are seen, there is not enough time > >>> left for that added performance work burden so strongly feel it should be > >>> pushed to 4.2/5.0 where it has plenty of time to be validated against. > >>> The ticket even asks to avoid validating the claims; saying 'Hoping we > >>> can skip due diligence on this ticket because the data is "in the past” > >>> already”'. Others have attempted both shenandoah and ZGC and found mixed > >>> results, so nothing leads me to believe that won’t be true here either. > On Nov 16, 2022, at 9:15 AM, J. D. Jordan > wrote: > Heap - > +1 for G1 in trunk > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I > understand pushback against changing this so late in the game. > Memtable - > -1 for off heap in 4.1. I think this needs more testing and isn’t > something to change at the last minute. > +1 for running performance/fuzz tests against the alternate memtable > choices in trunk and switching if they don’t show regressions. > > On Nov 16, 2022, at 10:48 AM, Josh McKenzie > > wrote: > > > > To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to > > prioritize digging into G1's behavior on
Re: Should we change 4.1 to G1 and offheap_objects ?
On Thu, Nov 17, 2022 at 2:01 PM Josh McKenzie wrote: > 3) Expert: Leave me alone. I tune my own GC This is increasingly not a thing. I haven't looked at ZGC, but G1 and Shenandoah provide a lot of knobs...that the collector will happily ignore if it decides it knows better :) -- +---+ | Derek Chen-Becker | | GPG Key available at https://keybase.io/dchenbecker and | | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | +---+