It definitely looks like a good thing to investigate and fix. However, it's not a regression and not new in 5.0. I think we should push forward with 5.0 and fix/release it separately in a 4.1.x and 5.0.x release.
> On Jun 27, 2024, at 12:46 PM, Brandon Williams <dri...@gmail.com> wrote: > > I don't know that we expect to fix anything if we don't know it is > affected yet. ¯\_(ツ)_/¯ > > Kind Regards, > Brandon > > On Thu, Jun 27, 2024 at 12:37 PM Aleksey Yeshchenko <alek...@apple.com> wrote: >> >> Not voting on this, however, if we expect to fix something specific between >> an RC and GA, then we shouldn’t be starting a vote on RC. In that case it >> should be another beta. >> >>> On 27 Jun 2024, at 18:30, Brandon Williams <dri...@gmail.com> wrote: >>> >>> The last time paxos v2 blocked us in CASSANDRA-19617 which also >>> affected 4.1, I didn't get a sense of strong usage from the community, >>> so I agree that RC shouldn't be blocked but this can get fixed before >>> GA. +1 from me. >>> >>> Kind Regards, >>> Brandon >>> >>> On Tue, Jun 25, 2024 at 11:11 PM Jon Haddad <j...@jonhaddad.com> wrote: >>>> >>>> 5.0 is a massive milestone. A huge thank you to everyone that's invested >>>> their time into the release. I've done a lot of testing, benchmarking, >>>> and tire kicking and it's truly mind blowing how much has gone into 5.0 >>>> and how great it is for the community. >>>> >>>> I am a bit concerned that CASSANDRA-19668, which I found in 4.1, will also >>>> affect 5.0. This is a pretty serious bug, where using Paxos v2 + off heap >>>> memtables can cause a SIGSEV process crash. I've seen this happen about a >>>> dozen times with a client over the last 3 months. Since the new trie >>>> memtables rely on off heap, and both Trie memtables & Paxos V2 is so >>>> compelling (esp for multi-dc users), I think there's a good chance that >>>> we'll be making an already bad problem even worse, for folks that use LWT. >>>> >>>> Unfortunately, until next week I'm unable to put any time into this; I'm >>>> on vacation with my family. I wish I had been able to confirm and raise >>>> this issue as a 5.0 blocker sooner, but I've deliberately tried to keep >>>> work stuff out of my mind. Since I'm not 100% sure if this affects 5.0, >>>> I'm not blocking the RC, but I don't feel comfortable putting a +1 on a >>>> release that I'm at least 80% certain contains a process-crashing bug. >>>> >>>> I have a simple 4.1 patch in CASSANDRA-19668, but I haven't landed a >>>> commit in several years and I have zero recollection of the entire process >>>> of getting it in, nor have I spent any time writing unit or dtests in the >>>> C* repo. I ran a test of 160MM LWTs over several hours with my 4.1 branch >>>> and didn't hit any issues, but my client ran for weeks without hitting it >>>> so it's hard to say if I've actually addressed the problem, as it's a rare >>>> race condition. Fwiw, I don't need to be the one to handle >>>> CASSANDRA-19668, so if someone wants to address it before me, please feel >>>> free. It will likely take me a lot longer to deal with than someone more >>>> involved with the process, and I'd want 2 sets of eyes on it anyways given >>>> what I already mentioned previously about committing and testing. >>>> >>>> Jon >>>> >>>> >>>> On Tue, Jun 25, 2024 at 2:53 PM Mick Semb Wever <m...@apache.org> wrote: >>>>> >>>>> >>>>> >>>>> . >>>>> >>>>>> Proposing the test build of Cassandra 5.0-rc1 for release. >>>>>> >>>>>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a >>>>>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative >>>>>> Maven Artifacts: >>>>>> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/ >>>>> >>>>> >>>>> >>>>> The three green CI runs for this are >>>>> - >>>>> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-2 >>>>> - >>>>> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-3 >>>>> - >>>>> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-4 >>>>> >>>>> >>