Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Blake Eggleston
No one is treating the codebase like a house of cards that can’t be touched.

In this case I think the cost/risk of doing this change outweighs the potential 
benefits the project might see from it. Josh counts ~2000 instances where we’re 
casting objects so we’re talking about a not-insignificant change which may 
introduce it’s own bugs. Even if no new bugs are introduced, this will be an 
refactor annoyance for projects in development, but the real concern I have 
with any large change is how it complicates the process of fixing bugs across 
versions. On the other hand, I don’t think that incorrectly casting objects has 
historically been a source of pain for us, so it seems like the benefit would 
be small if any.

On Fri, May 9, 2025, at 10:38 AM, Jon Haddad wrote:
> Why not?
> 
> Personally, I hate the idea of treating a codebase (any codebase) like a 
> house of cards that can't be touched.  It never made sense to me to try to 
> bundle new features / bug fixes with improvements to code quality.
> 
> Making the code more reliable should be a goal in itself, rather than a side 
> effect of other work.
> 
> Jon
> 
> 
> 
> On Fri, May 9, 2025 at 10:31 AM Blake Eggleston  wrote:
>> __
>> This seems like a cool feature that will be useful in future development 
>> work, but not something we should be proactively refactoring the project to 
>> make use of.
>> 
>> On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:
>>> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to provide 
>>> safer and less poisoning in the code by default. Instead of having a 
>>> production server halt/impaired due to a RuntimeException, instead it is 
>>> verified at compile time. If a new language feature makes code more robust 
>>> and addresses a hazardous, historical design choice, I believe it's time 
>>> has arrived. Curious to see what everyone thinks.
>>> 
>>> Thanks,
>>> Vivekanand K.
>>> 
>>> On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:
 __
> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe 
> explicit casting with instanceOf. 
 We have a rich history of broad sweeping refactors dying on the rocks of 
 the community's aversion to instability and risk w/out a concrete outcome 
 we're trying to achieve. :)
 
 All of which is to say: do we have examples of instanceOf casting blowing 
 things up for users that would warrant going through the codebase to tidy 
 this up? Between src/java and test/unit and test/distributed we have 
 around 2,000 occurrences of this pattern.
 
 On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
> Sounds great. I would like to refactor the codebase (Trunk 5+) to 
> eliminate unsafe explicit casting with instanceOf. 
> 
> Thanks,
> Vivekanand
> 
> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith  
> wrote:
>> Yep, that approach seems more than sufficient to me. No need for lots of 
>> ceremony, but good to keep everyone in the decision loop.
>> 
>>> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
>>> 
 I think it doesn’t cost us much to briefly discuss new language 
 features before using them.
>>> I had that thought as well but on balance my intuition was there were 
>>> enough new features that the volume of discussion to do that would be a 
>>> poor cost/benefit compared to the "lazy consensus, revert" approach.
>>> 
>>> So if I actually do the work required to have an opinion ;):
>>> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>>> 
>>> JDK21:
>>> - Record Patterns 
>>> 
>>> - Pattern Matching for switch Expressions and Statements 
>>> 
>>> - String Templates 
>>> 
>>> - Unnamed Patterns and Variables 
>>> 
>>> - Unnamed Classes and Instance Main Methods 
>>> 
>>> JDK17:
>>> - Sealed Classes 
>>> 
>>> JDK16:
>>> - Pattern Matching for instanceof 
>>> 
>>> JDK15:
>>> - Text Blocks 
>>> 
>>> JDK14:
>>> - Switch Expressions 
>>> 

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Josh McKenzie
> I think it doesn’t cost us much to briefly discuss new language features 
> before using them.
I had that thought as well but on balance my intuition was there were enough 
new features that the volume of discussion to do that would be a poor 
cost/benefit compared to the "lazy consensus, revert" approach.

So if I actually do the work required to have an opinion ;):
https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B

JDK21:
- Record Patterns 

- Pattern Matching for switch Expressions and Statements 

- String Templates 

- Unnamed Patterns and Variables 

- Unnamed Classes and Instance Main Methods 

JDK17:
- Sealed Classes 

JDK16:
- Pattern Matching for instanceof 

JDK15:
- Text Blocks 

JDK14:
- Switch Expressions 

JDK11:
- Local Variable Type Inference 

 (test only, not prod code is where we landed)

Assuming we just lazily evaluate and deal with new features as people* actually 
care about them* and seeing them add value, a simple "[DISCUSS] I'm thinking 
about using new language feature X; any objection?" lazy consensus that we then 
dumped onto a wiki article / code style page as "stuff we're good to use" would 
probably be fine?


On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
> 
> I think it doesn’t cost us much to briefly discuss new language features 
> before using them. Lambdas, Streams and var all have problems - and even with 
> the guidance we publish some are still misused.
> 
> The flow scoping improvement to instanceof seems obviously good though.
> 
> 
>> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>> 
>> For new feature work on trunk, targeting the highest supported language 
>> level featureset (jdk17 right now, jdk21 within the next couple of weeks) 
>> makes sense to me. For bugfixing, targeting the oldest supported GA branch 
>> and the highest language level that works there would allow maximum 
>> flexibility with minimal re-implementation.
>> 
>> If anyone has any misgivings with certain features (i.e. the debate around 
>> usage of "var") they can bring it up on the dev ML and we can adjust, but 
>> otherwise I'd prefer to see us have more modern evolving options on how 
>> contributors engage rather than less.
>> 
>> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
>>> Hello,
>>> 
>>> I want to understand the community's thoughts on using newer features (post 
>>> JDK11) in upcoming releases in Cassandra. An example is flow scoping 
>>> instead of explicitly casting types with instanceOf: 
>>> https://openjdk.org/jeps/395. I want your thoughts on JDK requirements for 
>>> the main Cassandra repository, Accord, and Sidecar. 
>>> 
>>> Much appreciated.
>>> Thanks,
>>> Vivekanand K.  
>> 


Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread C. Scott Andreas
My thinking is most closely aligned with Blake and Benedict’s views here.For the specific refactor in question, I support adoption of the language feature for new code or to cut existing code over to the new syntax as changes are made to the respective areas of the codebase. But I don’t support a sweeping project-wide refactor on trunk in this case.Here is my thinking:- If there are 2000 target sites for the refactor, that means this is going to be a 5000+ line diff.- The safety improvement here is marginal but nonzero.- If we have a 5000 line refactor, it should accomplish a significant and compelling purpose in the project.- Any execution of the refactor will require manual review of each of those 2000 refactor sites on the part of the implementer and two reviewers.- Since the check is compile-time, we’d learn that by the initial refactor the first time it’s compiled, and we short-circuit to having gained 100% of the value by being able to fix the broken callsites.- The act of that per-call site review would inform us as to whether we had incorrect casts; and we would immediately achieve the value of the “safer” approach by having identified the bugs.- 2x reviewer coverage for a 5000 line patch set is a significant commitment of reviewer resources. These reviewer resources have significant opportunity cost and can put to a better purpose.- Blake/others mention that such refactors create conflicts when bug fixes are backported to previous releases, requiring refactors of those rebased patches to bring fixes to versions that predate the large refactor.I think this is a good language feature. I think we should use it. I think it’d be completely reasonable to cut existing syntax over to it as we make changes to the respective subsystems.But I wouldn’t do a big bang refactor in this case. The juice isn’t worth the squeeze for me.- ScottOn May 9, 2025, at 11:33 AM, Blake Eggleston  wrote:No one is treating the codebase like a house of cards that can’t be touched.In this case I think the cost/risk of doing this change outweighs the potential benefits the project might see from it. Josh counts ~2000 instances where we’re casting objects so we’re talking about a not-insignificant change which may introduce it’s own bugs. Even if no new bugs are introduced, this will be an refactor annoyance for projects in development, but the real concern I have with any large change is how it complicates the process of fixing bugs across versions. On the other hand, I don’t think that incorrectly casting objects has historically been a source of pain for us, so it seems like the benefit would be small if any.On Fri, May 9, 2025, at 10:38 AM, Jon Haddad wrote:Why not?Personally, I hate the idea of treating a codebase (any codebase) like a house of cards that can't be touched.  It never made sense to me to try to bundle new features / bug fixes with improvements to code quality.Making the code more reliable should be a goal in itself, rather than a side effect of other work.JonOn Fri, May 9, 2025 at 10:31 AM Blake Eggleston  wrote:This seems like a cool feature that will be useful in future development work, but not something we should be proactively refactoring the project to make use of.On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:I would say that https://openjdk.org/jeps/394 (instanceOf) aims to 
provide safer and less poisoning in the code by default. Instead of 
having a production server halt/impaired due to a RuntimeException, instead it is verified at compile time. If a new language feature makes code more robust and addresses a hazardous, historical design choice, I believe it's time has arrived. Curious to see what everyone thinks.Thanks,Vivekanand K.On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:I would like to refactor the codebase (Trunk 5+) to eliminate unsafe explicit casting with instanceOf. We have a rich history of broad sweeping refactors dying on the rocks of the community's aversion to instability and risk w/out a concrete outcome we're trying to achieve. :)All of which is to say: do we have examples of instanceOf casting blowing things up for users that would warrant going through the codebase to tidy this up? Between src/java and test/unit and test/distributed we have around 2,000 occurrences of this pattern.On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:Sounds great. I would like to refactor the codebase (Trunk 5+) to eliminate unsafe explicit casting with instanceOf. Thanks,VivekanandOn Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith  wrote:Yep, that approach seems more than sufficient to me. No need for lots of ceremony, but good to keep everyone in the decision loop.On 9 May 2025, at 13:10, Josh McKenzie  wrote:I think it doesn’t cost us much to briefly discuss new language features before using them.I had that thought as well but on balance my intuition was there were enough new features th

Fwd: Cassandra 5 JDK21 Command Line actions

2025-05-09 Thread Vivekanand Koya
Hello,

I've talked to ZGC members of the OpenJDK community on GC strategies.
Wanted to share this information with Cassandra devs.

Thanks,
Vivekanand

-- Forwarded message -
From: Stefan Johansson 
Date: Fri, May 9, 2025, 2:00 AM
Subject: Re: Cassandra 5 JDK21 Command Line actions
To: Vivekanand Koya <13vivekk...@gmail.com>, 


Hi Vivekanand,

I'm actually currently trying to figure out how much information I can
share around the setup we used. I can summarize what is already public
information from my presentations.

I'm using Cassandra 4 and JDK 21, with tweaked command-line options
(--add-opens and so) to get it to run properly. I try to do very minimal
JVM/GC tuning apart from setting the heap size and enabling the use of NUMA
and large pages. G1 is using a 31G fixed heap (-Xmx31g -Xms31g) to be able
to make use of compressed oops, while ZGC is using a 32G fixed heap. The
only GC tuning I do is setting a 50ms pause time goal
(-XX:MaxGCPauseMillis=50) for one of the G1 runs to see how lowering the
pause target affect the performance.

The testing methodology is running the same scenario with more and more
threads to add more and more pressure to the server instance. The server is
running on its own host and it is a single node setup. The clients all run
on a different machine and we use the cassandra-stress tool with more and
more threads to generate the load.

Regarding tuning, one of the big things with ZGC is that you generally
should not have to do any tuning. That said, when using JDK 21 you need to
supply the -XX:+ZGenerational option to enable the generational mode. In
JDK 23 and later the generational mode is on by default and in JDK 24 ZGC
is generational and it can't be turned off.

Something to look out for when using ZGC is allocation stalls, those happen
when ZGC can't collect and free up memory fast enough. Generally if you see
allocation stalls, you need to increase the heap size to give ZGC more head
room to complete the GC work in time. If the stalls happen due to
unexpected spikes of allocations the SoftMaxHeapSize option can be used to
set the heap size ZGC should aim at using, but can go over during spikes of
allocations. For example -Xmx32g -XX:SoftMaxHeapSize=28g, would give ZGC 4g
of "reserved head room" that will only be used when it can't keep up.

I hope this helps,
StefanJ

On 2025-05-09 07:27, Vivekanand Koya wrote:

Hello ZGC members,

I work on the Apache Cassandra team implementing support for JDK21 in the
upcoming Cassandra 5+ release. I need insight into JVM options providing
comparable and perhaps improved performance compared to G1GC. Some work has
been done using the previous defaults:
https://github.com/apache/cassandra/commit/b15d4d6980e787ab5f3405ca8cb17a9c92a4aa47.
Also, can you please provide the testing/benchmarking methodology used by
Stefan Johansson when presenting at Devoxx.

Hope to achieve greater outcomes together,
Thanks,
Vivekanand K.


Re: [DISCUSS] CEP-46 Finish Transient Replication/Witnesses

2025-05-09 Thread Ariel Weisberg
Hi,

Planning to call a vote on Monday since there don't seem to be any major 
concerns.

Ariel

On Tue, May 6, 2025, at 4:32 PM, Bernardo Botella wrote:
> +1 (nb)
> 
>> On May 6, 2025, at 1:19 PM, Josh McKenzie  wrote:
>> 
>> +1
>> 
>> On Tue, May 6, 2025, at 4:06 PM, Yifan Cai wrote:
>>> +1 (nb)
>>> 
>>> 
>>> 
>>> *From:* Ariel Weisberg 
>>> *Sent:* Tuesday, May 6, 2025 12:59:09 PM
>>> *To:* Claude Warren, Jr 
>>> *Subject:* Re: [DISCUSS] CEP-46 Finish Transient Replication/Witnesses
>>>  
>>> Hi,
>>> 
>>> On Sun, May 4, 2025, at 4:57 PM, Jordan West wrote:
 I’m generally supportive. The concept is one that I can see the benefits 
 of and I also think the current implementation adds a lot of complexity to 
 the codebase for being stuck in experimental mode. It will be great to 
 have a more robust version built on a better approach. 
>>> 
>>> One of the great things about this is that it actually deletes and 
>>> simplifies implementation code. If you ignore the hat trick of mutation 
>>> tracking making it possible to have log only replication of course.
>>> 
>>> So far it's been mostly deleted and changed lines to get the single 
>>> partition read, range read, and write path working. A lot of the code 
>>> already exists for transient replication so it's changed rather than new 
>>> code. PaxosV2 and Accord will both need to become witness aware and that 
>>> will be new code, but it's relatively straightforward in that it's just 
>>> picking full replicas for reads.
>>> 
>>> On Mon, May 5, 2025, at 1:21 PM, Nate McCall wrote:
 I'd like to see a note on the CEP about documentation overhead as this is 
 an important feature to communicate correctly, but that's just a nit. +1 
 on moving forward with this overall. 
>>> There is documentation for transient replication 
>>> https://cassandra.apache.org/doc/4.0/cassandra/new/transientreplication.html
>>>  which needs to be promoted out of "What's new", updated, and linked to the 
>>> documentation for mutation tracking. I'll update the CEP to cover this.
>>> 
>>> 
>>> On Mon, May 5, 2025, at 1:49 PM, Jon Haddad wrote:
 It took me a bit to wrap my head around how this works, but now that I 
 think I understand the idea, it sounds like a solid improvement.  Being 
 able to achieve the same results as quorum but costing 1/3 less is a *big 
 deal* and I know several teams that would be interested.
>>> 1/3rd is the "free" threshold where you don't give increase your 
>>> probability of experiencing data loss using quorums for common topologies. 
>>> If you have a lot of replicas because say you want copies in many places 
>>> you might be able to reduce further. Voting on what the value is is 
>>> basically decoupled from how redundantly that value is stored long term.
 One thing I'm curious about (and we can break it out into a separate 
 discussion), is how all the functionality that requires coordination and 
 global state (repaired vs non-repaired) will affect backups.  Without a 
 synchronization primitive to take a cluster-wide snapshot, how can we 
 safely restore from eventually consistent backups without risking 
 consistency issues due to out-of-sync repaired status?
>>> Witnesses doesn't make the consistency of backups better/worse, but it does 
>>> add a little bit of complexity if your backups are copying only the 
>>> repaired data.
>>> 
>>> The procedure you follow today where you copy the repaired tables from a 
>>> range from a single replica and copy the unrepaired tables from a quorum 
>>> would continue to apply. The added constraint with witnesses is that the 
>>> single replica you are picking to copy repaired sstables from needs to be a 
>>> full replica not a witness for that range.
>>> 
>>> I don't think we have a way to get a consistent snapshot right now? Like 
>>> there isn't even "run repair and repair will create a consistent snapshot 
>>> for you to copy as a backup". And then as Benedict points out LWT (with 
>>> async commit) and Accord (also defaults to async commit, has multi-key 
>>> transactions that can be torn) both don't make for consistent backups.
>>> 
>>> We definitely need to follow up with leveraging new 
>>> replication/transactions schemes to produce more consistent backups.
>>> 
>>> Ariel
 
 On Sun, May 4, 2025 at 00:27 Benedict  wrote:
> +1
> 
> This is an obviously good feature for operators that are storage-bound in 
> multi-DC deployments but want to retain their latency characteristics 
> during node maintenance. Log replicas are the right approach.
> 
> > On 3 May 2025, at 23:42, sc...@paradoxica.net wrote:
> >
> > Hey everybody, bumping this CEP from Ariel in case you'd like some 
> > weekend reading.
> >
> > We’d like to finish witnesses and bring them out of “experimental” 
> > status now that Transactional Metadata and Mutation Tracking provide 
> > the building blocks need

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

2025-05-09 Thread Ariel Weisberg
Hi,

Great to see MVs getting some attention and it's a good time to start 
addressing their shortcomings.

Looking at the details of the CEP it seems to describe Paxos as PaxosV1, but 
PaxosV2 works slightly differently (it can read during the prepare phase). I 
assume that supporting Paxos means supporting both V1 and V2 for materialized 
views?
 
As has been mentioned Accord doesn't have many restrictions on what you can do 
logically in a transaction. The only significant restrictions are that you must 
know the keys that are going to be read/written to when the transaction starts 
(known in this case) and you can only read once, process the result of the 
read, and then generate a single set of writes to apply. Here are some docs 
explaining how CQL is implemented on Accord 
https://github.com/aweisberg/cassandra-website/blob/20637/content/doc/trunk/cassandra/architecture/cql-on-accord.html

Why mandate `LOCAL_QUORUM` instead of using the consistency level requested by 
the application? If they want to use `LOCAL_QUORUM` they can always request it.

Using a transaction system as a better batch log for a key seems like a pretty 
reasonable way to make reads/writes for materialized views safer. A really good 
implementation of this would overlap the MV read with the read from the 
transaction system for unfinished updates and then discard or augment it if the 
MV read is incomplete. Accord will do this for you automatically just by virtue 
of supporting multi-key transactions.

I am *big* fan of getting repair really working with MVs. It does seem 
problematic that the number of merkle trees will be equal to the number of 
ranges in the cluster and repair of MVs would become an all node operation.  
How would down nodes be handled and how many nodes would simultaneously working 
to validate a given base table range at once? How many base table ranges could 
simultaneously be repairing MVs?

If a row containing a column that creates an MV partition is deleted, and the 
MV isn't updated, then how does the merkle tree approach propagate the deletion 
to the MV? The CEP says that anti-compaction would remove extra rows, but I am 
not clear on how that works. When is anti-compaction performed in the repair 
process and what is/isn't included in the outputs?

Thanks,
Ariel

On Tue, May 6, 2025, at 6:51 PM, Runtian Liu wrote:
> Hi everyone,
> 
> We’d like to propose a new Cassandra Enhancement Proposal: CEP-48: 
> First-Class Materialized View Support 
> .
> 
> This CEP focuses on addressing the long-standing consistency issues in the 
> current Materialized View (MV) implementation by introducing a new 
> architecture that keeps base tables and MVs reliably in sync. It also adds a 
> new validation and repair type to Cassandra’s repair process to support MV 
> repair based on the base table. The goal is to make MV a first-class, 
> production-ready feature that users can depend on—without relying on external 
> reconciliation tools or custom workarounds.
> 
> We’d really appreciate your feedback—please keep the discussion on this 
> mailing list thread.
> 
> 
> Thanks,
> Runtian
> 


Re: [DISCUSS] CEP-48: First-Class Materialized View Support

2025-05-09 Thread Jeff Jirsa


> On May 9, 2025, at 12:59 PM, Ariel Weisberg  wrote:
> 
> 
> I am *big* fan of getting repair really working with MVs. It does seem 
> problematic that the number of merkle trees will be equal to the number of 
> ranges in the cluster and repair of MVs would become an all node operation.  
> How would down nodes be handled and how many nodes would simultaneously 
> working to validate a given base table range at once? How many base table 
> ranges could simultaneously be repairing MVs?
> 
> If a row containing a column that creates an MV partition is deleted, and the 
> MV isn't updated, then how does the merkle tree approach propagate the 
> deletion to the MV? The CEP says that anti-compaction would remove extra 
> rows, but I am not clear on how that works. When is anti-compaction performed 
> in the repair process and what is/isn't included in the outputs?


I thought about these two points last night after I sent my email.

There’s 2 things in this proposal that give me a lot of pause.

One is the lack of tombstones / deletions in the merle trees, which makes 
properly dealing with writes/deletes/inconsistency very hard (afaict)

The second is the reality that repairing a single partition in the base table 
may repair all hosts/ranges in the MV table, and vice versa. Basically scanning 
either base or MV is effectively scanning the whole cluster (modulo what you 
can avoid in the clean/dirty repaired sets). This makes me really, really 
concerned with how it scales, and how likely it is to be able to schedule 
automatically without blowing up. 

The paxos vs accord comments so far are interesting in that I think both could 
be made to work, but I am very concerned about how the merkle tree comparisons 
are likely to work with wide partitions leading to massive fanout in ranges. 




Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Jon Haddad
There’s a pretty simple solution here - breaking it up into several smaller
patches.

* Any changes should include tests that validate the checks are used
correctly.
* It should also alleviate any issues with code conflicts and rebasing as
the merges would happen slowly over time rather than all at once.
* If there’s two committers willing to spend time and work with OP on this,
that should be enough to move it forward.
* There's a thread on user@ right now [1] where someone *just* ran into
this issue, so I'd say addressing that one is a reasonable starting point.

[1] https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3



Jon


On Fri, May 9, 2025 at 12:16 PM C. Scott Andreas 
wrote:

> My thinking is most closely aligned with Blake and Benedict’s views here.
>
> For the specific refactor in question, I support adoption of the language
> feature for new code or to cut existing code over to the new syntax as
> changes are made to the respective areas of the codebase. But I don’t
> support a sweeping project-wide refactor on trunk in this case.
>
> Here is my thinking:
>
> - If there are 2000 target sites for the refactor, that means this is
> going to be a 5000+ line diff.
> - The safety improvement here is marginal but nonzero.
> - If we have a 5000 line refactor, it should accomplish a significant and
> compelling purpose in the project.
> - Any execution of the refactor will require manual review of each of
> those 2000 refactor sites on the part of the implementer and two reviewers.
> - Since the check is compile-time, we’d learn that by the initial refactor
> the first time it’s compiled, and we short-circuit to having gained 100% of
> the value by being able to fix the broken callsites.
> - The act of that per-call site review would inform us as to whether we
> had incorrect casts; and we would immediately achieve the value of the
> “safer” approach by having identified the bugs.
> - 2x reviewer coverage for a 5000 line patch set is a significant
> commitment of reviewer resources. These reviewer resources have significant
> opportunity cost and can put to a better purpose.
> - Blake/others mention that such refactors create conflicts when bug fixes
> are backported to previous releases, requiring refactors of those rebased
> patches to bring fixes to versions that predate the large refactor.
>
> I think this is a good language feature. I think we should use it. I think
> it’d be completely reasonable to cut existing syntax over to it as we make
> changes to the respective subsystems.
>
> But I wouldn’t do a big bang refactor in this case. The juice isn’t worth
> the squeeze for me.
>
> - Scott
>
> On May 9, 2025, at 11:33 AM, Blake Eggleston  wrote:
>
> 
>
> No one is treating the codebase like a house of cards that can’t be
> touched.
>
> In this case I think the cost/risk of doing this change outweighs the
> potential benefits the project might see from it. Josh counts ~2000
> instances where we’re casting objects so we’re talking about a
> not-insignificant change which may introduce it’s own bugs. Even if no new
> bugs are introduced, this will be an refactor annoyance for projects in
> development, but the real concern I have with any large change is how it
> complicates the process of fixing bugs across versions. On the other hand,
> I don’t think that incorrectly casting objects has historically been a
> source of pain for us, so it seems like the benefit would be small if any.
>
> On Fri, May 9, 2025, at 10:38 AM, Jon Haddad wrote:
>
> Why not?
>
> Personally, I hate the idea of treating a codebase (any codebase) like a
> house of cards that can't be touched.  It never made sense to me to try to
> bundle new features / bug fixes with improvements to code quality.
>
> Making the code more reliable should be a goal in itself, rather than a
> side effect of other work.
>
> Jon
>
>
>
> On Fri, May 9, 2025 at 10:31 AM Blake Eggleston 
> wrote:
>
>
> This seems like a cool feature that will be useful in future development
> work, but not something we should be proactively refactoring the project to
> make use of.
>
> On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:
>
> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to
> provide safer and less poisoning in the code by default. Instead of having
> a production server halt/impaired due to a RuntimeException, instead it is
> verified at compile time. If a new language feature makes code more robust
> and addresses a hazardous, historical design choice, I believe it's time
> has arrived. Curious to see what everyone thinks.
>
> Thanks,
> Vivekanand K.
>
> On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:
>
>
> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe
> explicit casting with instanceOf.
>
> We have a rich history of broad sweeping refactors dying on the rocks of
> the community's aversion to instability and risk w/out a concrete outcome
> we're trying to achieve. :)
>
> All of which i

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Vivekanand Koya
Sounds great. I would like to refactor the codebase (Trunk 5+) to eliminate
unsafe explicit casting with instanceOf.

Thanks,
Vivekanand

On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith 
wrote:

> Yep, that approach seems more than sufficient to me. No need for lots of
> ceremony, but good to keep everyone in the decision loop.
>
> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
>
> I think it doesn’t cost us much to briefly discuss new language features
> before using them.
>
> I had that thought as well but on balance my intuition was there were
> enough new features that the volume of discussion to do that would be a
> poor cost/benefit compared to the "lazy consensus, revert" approach.
>
> So if I actually do the work required to have an opinion ;):
>
> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>
> JDK21:
> - Record Patterns
> 
> - Pattern Matching for switch Expressions and Statements
> 
> - String Templates
> 
> - Unnamed Patterns and Variables
> 
> - Unnamed Classes and Instance Main Methods
> 
> JDK17:
> - Sealed Classes
> 
> JDK16:
> - Pattern Matching for instanceof
> 
> JDK15:
> - Text Blocks
> 
> JDK14:
> - Switch Expressions
> 
> JDK11:
> - Local Variable Type Inference
> 
>  (test
> only, not prod code is where we landed)
>
> Assuming we just lazily evaluate and deal with new features as people*
> actually care about them* and seeing them add value, a simple "[DISCUSS]
> I'm thinking about using new language feature X; any objection?" lazy
> consensus that we then dumped onto a wiki article / code style page as
> "stuff we're good to use" would probably be fine?
>
>
> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>
>
> I think it doesn’t cost us much to briefly discuss new language features
> before using them. Lambdas, Streams and var all have problems - and even
> with the guidance we publish some are still misused.
>
> The flow scoping improvement to instanceof seems obviously good though.
>
>
> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>
> 
> For new feature work on trunk, targeting the highest supported language
> level featureset (jdk17 right now, jdk21 within the next couple of weeks)
> makes sense to me. For bugfixing, targeting the oldest supported GA branch
> and the highest language level that works there would allow maximum
> flexibility with minimal re-implementation.
>
> If anyone has any misgivings with certain features (i.e. the debate around
> usage of "var") they can bring it up on the dev ML and we can adjust, but
> otherwise I'd prefer to see us have more modern evolving options on how
> contributors engage rather than less.
>
> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
>
> Hello,
>
> I want to understand the community's thoughts on using newer features
> (post JDK11) in upcoming releases in Cassandra. An example is flow scoping
> instead of explicitly casting types with instanceOf:
> https://openjdk.org/jeps/395. I want your thoughts on JDK requirements
> for the main Cassandra repository, Accord, and Sidecar.
>
> Much appreciated.
> Thanks,
> Vivekanand K.
>
>
>
>
>


Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Vivekanand Koya
I would say that https://openjdk.org/jeps/394 (instanceOf) aims to provide
safer and less poisoning in the code by default. Instead of having a
production server halt/impaired due to a RuntimeException, instead it is
verified at compile time. If a new language feature makes code more robust
and addresses a hazardous, historical design choice, I believe it's time
has arrived. Curious to see what everyone thinks.

Thanks,
Vivekanand K.

On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:

> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe
> explicit casting with instanceOf.
>
> We have a rich history of broad sweeping refactors dying on the rocks of
> the community's aversion to instability and risk w/out a concrete outcome
> we're trying to achieve. :)
>
> All of which is to say: do we have examples of instanceOf casting blowing
> things up for users that would warrant going through the codebase to tidy
> this up? Between src/java and test/unit and test/distributed we have around
> 2,000 occurrences of this pattern.
>
> On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
>
> Sounds great. I would like to refactor the codebase (Trunk 5+) to
> eliminate unsafe explicit casting with instanceOf.
>
> Thanks,
> Vivekanand
>
> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith 
> wrote:
>
> Yep, that approach seems more than sufficient to me. No need for lots of
> ceremony, but good to keep everyone in the decision loop.
>
> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
>
> I think it doesn’t cost us much to briefly discuss new language features
> before using them.
>
> I had that thought as well but on balance my intuition was there were
> enough new features that the volume of discussion to do that would be a
> poor cost/benefit compared to the "lazy consensus, revert" approach.
>
> So if I actually do the work required to have an opinion ;):
>
> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>
> JDK21:
> - Record Patterns
> 
> - Pattern Matching for switch Expressions and Statements
> 
> - String Templates
> 
> - Unnamed Patterns and Variables
> 
> - Unnamed Classes and Instance Main Methods
> 
> JDK17:
> - Sealed Classes
> 
> JDK16:
> - Pattern Matching for instanceof
> 
> JDK15:
> - Text Blocks
> 
> JDK14:
> - Switch Expressions
> 
> JDK11:
> - Local Variable Type Inference
> 
>  (test
> only, not prod code is where we landed)
>
> Assuming we just lazily evaluate and deal with new features as people*
> actually care about them* and seeing them add value, a simple "[DISCUSS]
> I'm thinking about using new language feature X; any objection?" lazy
> consensus that we then dumped onto a wiki article / code style page as
> "stuff we're good to use" would probably be fine?
>
>
> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>
>
> I think it doesn’t cost us much to briefly discuss new language features
> before using them. Lambdas, Streams and var all have problems - and even
> with the guidance we publish some are still misused.
>
> The flow scoping improvement to instanceof seems obviously good though.
>
>
> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>
> 
> For new feature work on trunk, targeting the highest supported language
> level featureset (jdk17 right now, jdk21 within the next couple of weeks)
> makes sense to me. For bugfixing, targeting the oldest supported GA branch
> and the highest language level that works there would allow maximum
> flexibility with minimal re-implementation.
>
> If anyone has any misgivings with certain features (i.e. the debate around
> usage of "var") they can bring it up on the dev ML and we can adjust, but
> otherwise I'd prefer to see us have more modern evolving options on how
> contributors engage rather than less.
>
> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
>
> Hello,
>
> I want to understand the community's thoughts on using newer f

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Vivekanand Koya
I can also provide potential examples if you'd like.

Thanks,
Vivekanand K.

On Fri, May 9, 2025 at 10:18 AM Vivekanand Koya <13vivekk...@gmail.com>
wrote:

> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to
> provide safer and less poisoning in the code by default. Instead of having
> a production server halt/impaired due to a RuntimeException, instead it is
> verified at compile time. If a new language feature makes code more robust
> and addresses a hazardous, historical design choice, I believe it's time
> has arrived. Curious to see what everyone thinks.
>
> Thanks,
> Vivekanand K.
>
> On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:
>
>> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe
>> explicit casting with instanceOf.
>>
>> We have a rich history of broad sweeping refactors dying on the rocks of
>> the community's aversion to instability and risk w/out a concrete outcome
>> we're trying to achieve. :)
>>
>> All of which is to say: do we have examples of instanceOf casting blowing
>> things up for users that would warrant going through the codebase to tidy
>> this up? Between src/java and test/unit and test/distributed we have around
>> 2,000 occurrences of this pattern.
>>
>> On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
>>
>> Sounds great. I would like to refactor the codebase (Trunk 5+) to
>> eliminate unsafe explicit casting with instanceOf.
>>
>> Thanks,
>> Vivekanand
>>
>> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith 
>> wrote:
>>
>> Yep, that approach seems more than sufficient to me. No need for lots of
>> ceremony, but good to keep everyone in the decision loop.
>>
>> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
>>
>> I think it doesn’t cost us much to briefly discuss new language features
>> before using them.
>>
>> I had that thought as well but on balance my intuition was there were
>> enough new features that the volume of discussion to do that would be a
>> poor cost/benefit compared to the "lazy consensus, revert" approach.
>>
>> So if I actually do the work required to have an opinion ;):
>>
>> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>>
>> JDK21:
>> - Record Patterns
>> 
>> - Pattern Matching for switch Expressions and Statements
>> 
>> - String Templates
>> 
>> - Unnamed Patterns and Variables
>> 
>> - Unnamed Classes and Instance Main Methods
>> 
>> JDK17:
>> - Sealed Classes
>> 
>> JDK16:
>> - Pattern Matching for instanceof
>> 
>> JDK15:
>> - Text Blocks
>> 
>> JDK14:
>> - Switch Expressions
>> 
>> JDK11:
>> - Local Variable Type Inference
>> 
>>  (test
>> only, not prod code is where we landed)
>>
>> Assuming we just lazily evaluate and deal with new features as people*
>> actually care about them* and seeing them add value, a simple "[DISCUSS]
>> I'm thinking about using new language feature X; any objection?" lazy
>> consensus that we then dumped onto a wiki article / code style page as
>> "stuff we're good to use" would probably be fine?
>>
>>
>> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>>
>>
>> I think it doesn’t cost us much to briefly discuss new language features
>> before using them. Lambdas, Streams and var all have problems - and even
>> with the guidance we publish some are still misused.
>>
>> The flow scoping improvement to instanceof seems obviously good though.
>>
>>
>> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>>
>> 
>> For new feature work on trunk, targeting the highest supported language
>> level featureset (jdk17 right now, jdk21 within the next couple of weeks)
>> makes sense to me. For bugfixing, targeting the oldest supported GA branch
>> and the highest language level that works there would allow maximum
>> flexibility with minimal re-implementation.
>>
>> If anyone has any misgivings with certain features (i.e. the debate
>> around usage of "var") they can bring it up on the dev ML 

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Blake Eggleston
This seems like a cool feature that will be useful in future development work, 
but not something we should be proactively refactoring the project to make use 
of.

On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:
> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to provide 
> safer and less poisoning in the code by default. Instead of having a 
> production server halt/impaired due to a RuntimeException, instead it is 
> verified at compile time. If a new language feature makes code more robust 
> and addresses a hazardous, historical design choice, I believe it's time has 
> arrived. Curious to see what everyone thinks.
> 
> Thanks,
> Vivekanand K.
> 
> On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:
>> __
>>> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe 
>>> explicit casting with instanceOf. 
>> We have a rich history of broad sweeping refactors dying on the rocks of the 
>> community's aversion to instability and risk w/out a concrete outcome we're 
>> trying to achieve. :)
>> 
>> All of which is to say: do we have examples of instanceOf casting blowing 
>> things up for users that would warrant going through the codebase to tidy 
>> this up? Between src/java and test/unit and test/distributed we have around 
>> 2,000 occurrences of this pattern.
>> 
>> On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
>>> Sounds great. I would like to refactor the codebase (Trunk 5+) to eliminate 
>>> unsafe explicit casting with instanceOf. 
>>> 
>>> Thanks,
>>> Vivekanand
>>> 
>>> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith  
>>> wrote:
 Yep, that approach seems more than sufficient to me. No need for lots of 
 ceremony, but good to keep everyone in the decision loop.
 
> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
> 
>> I think it doesn’t cost us much to briefly discuss new language features 
>> before using them.
> I had that thought as well but on balance my intuition was there were 
> enough new features that the volume of discussion to do that would be a 
> poor cost/benefit compared to the "lazy consensus, revert" approach.
> 
> So if I actually do the work required to have an opinion ;):
> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
> 
> JDK21:
> - Record Patterns 
> 
> - Pattern Matching for switch Expressions and Statements 
> 
> - String Templates 
> 
> - Unnamed Patterns and Variables 
> 
> - Unnamed Classes and Instance Main Methods 
> 
> JDK17:
> - Sealed Classes 
> 
> JDK16:
> - Pattern Matching for instanceof 
> 
> JDK15:
> - Text Blocks 
> 
> JDK14:
> - Switch Expressions 
> 
> JDK11:
> - Local Variable Type Inference 
> 
>  (test only, not prod code is where we landed)
> 
> Assuming we just lazily evaluate and deal with new features as people* 
> actually care about them* and seeing them add value, a simple "[DISCUSS] 
> I'm thinking about using new language feature X; any objection?" lazy 
> consensus that we then dumped onto a wiki article / code style page as 
> "stuff we're good to use" would probably be fine?
> 
> 
> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>> 
>> I think it doesn’t cost us much to briefly discuss new language features 
>> before using them. Lambdas, Streams and var all have problems - and even 
>> with the guidance we publish some are still misused.
>> 
>> The flow scoping improvement to instanceof seems obviously good though.
>> 
>> 
>>> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>>> 
>>> For new feature work on trunk, targeting the highest supported language 
>>> level featureset (jdk17 right now, jdk21 within the next couple of 
>>> weeks) makes sense t

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Jon Haddad
Why not?

Personally, I hate the idea of treating a codebase (any codebase) like a
house of cards that can't be touched.  It never made sense to me to try to
bundle new features / bug fixes with improvements to code quality.

Making the code more reliable should be a goal in itself, rather than a
side effect of other work.

Jon



On Fri, May 9, 2025 at 10:31 AM Blake Eggleston 
wrote:

> This seems like a cool feature that will be useful in future development
> work, but not something we should be proactively refactoring the project to
> make use of.
>
> On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:
>
> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to
> provide safer and less poisoning in the code by default. Instead of having
> a production server halt/impaired due to a RuntimeException, instead it is
> verified at compile time. If a new language feature makes code more robust
> and addresses a hazardous, historical design choice, I believe it's time
> has arrived. Curious to see what everyone thinks.
>
> Thanks,
> Vivekanand K.
>
> On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:
>
>
> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe
> explicit casting with instanceOf.
>
> We have a rich history of broad sweeping refactors dying on the rocks of
> the community's aversion to instability and risk w/out a concrete outcome
> we're trying to achieve. :)
>
> All of which is to say: do we have examples of instanceOf casting blowing
> things up for users that would warrant going through the codebase to tidy
> this up? Between src/java and test/unit and test/distributed we have around
> 2,000 occurrences of this pattern.
>
> On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
>
> Sounds great. I would like to refactor the codebase (Trunk 5+) to
> eliminate unsafe explicit casting with instanceOf.
>
> Thanks,
> Vivekanand
>
> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith 
> wrote:
>
> Yep, that approach seems more than sufficient to me. No need for lots of
> ceremony, but good to keep everyone in the decision loop.
>
> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
>
> I think it doesn’t cost us much to briefly discuss new language features
> before using them.
>
> I had that thought as well but on balance my intuition was there were
> enough new features that the volume of discussion to do that would be a
> poor cost/benefit compared to the "lazy consensus, revert" approach.
>
> So if I actually do the work required to have an opinion ;):
>
> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>
> JDK21:
> - Record Patterns
> 
> - Pattern Matching for switch Expressions and Statements
> 
> - String Templates
> 
> - Unnamed Patterns and Variables
> 
> - Unnamed Classes and Instance Main Methods
> 
> JDK17:
> - Sealed Classes
> 
> JDK16:
> - Pattern Matching for instanceof
> 
> JDK15:
> - Text Blocks
> 
> JDK14:
> - Switch Expressions
> 
> JDK11:
> - Local Variable Type Inference
> 
>  (test
> only, not prod code is where we landed)
>
> Assuming we just lazily evaluate and deal with new features as people*
> actually care about them* and seeing them add value, a simple "[DISCUSS]
> I'm thinking about using new language feature X; any objection?" lazy
> consensus that we then dumped onto a wiki article / code style page as
> "stuff we're good to use" would probably be fine?
>
>
> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>
>
> I think it doesn’t cost us much to briefly discuss new language features
> before using them. Lambdas, Streams and var all have problems - and even
> with the guidance we publish some are still misused.
>
> The flow scoping improvement to instanceof seems obviously good though.
>
>
> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>
> 
> For new feature work on trunk, targeting the highest supported language
> level featureset (jdk17 right

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Benedict
Agreed. This is a change that’s fine to include when editing related (and new) code, but doesn’t come close to warranting a wide scale change.On 9 May 2025, at 18:32, Blake Eggleston  wrote:This seems like a cool feature that will be useful in future development work, but not something we should be proactively refactoring the project to make use of.On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:I would say that https://openjdk.org/jeps/394 (instanceOf) aims to 
provide safer and less poisoning in the code by default. Instead of 
having a production server halt/impaired due to a RuntimeException, instead it is verified at compile time. If a new language feature makes code more robust and addresses a hazardous, historical design choice, I believe it's time has arrived. Curious to see what everyone thinks.Thanks,Vivekanand K.On Fri, May 9, 2025 at 9:51 AM Josh McKenzie  wrote:I would like to refactor the codebase (Trunk 5+) to eliminate unsafe explicit casting with instanceOf. We have a rich history of broad sweeping refactors dying on the rocks of the community's aversion to instability and risk w/out a concrete outcome we're trying to achieve. :)All of which is to say: do we have examples of instanceOf casting blowing things up for users that would warrant going through the codebase to tidy this up? Between src/java and test/unit and test/distributed we have around 2,000 occurrences of this pattern.On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:Sounds great. I would like to refactor the codebase (Trunk 5+) to eliminate unsafe explicit casting with instanceOf. Thanks,VivekanandOn Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith  wrote:Yep, that approach seems more than sufficient to me. No need for lots of ceremony, but good to keep everyone in the decision loop.On 9 May 2025, at 13:10, Josh McKenzie  wrote:I think it doesn’t cost us much to briefly discuss new language features before using them.I had that thought as well but on balance my intuition was there were enough new features that the volume of discussion to do that would be a poor cost/benefit compared to the "lazy consensus, revert" approach.So if I actually do the work required to have an opinion ;):https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9BJDK21:- Record Patterns- Pattern Matching for switch Expressions and Statements- String Templates- Unnamed Patterns and Variables- Unnamed Classes and Instance Main MethodsJDK17:- Sealed ClassesJDK16:- Pattern Matching for instanceofJDK15:- Text BlocksJDK14:- Switch ExpressionsJDK11:- Local Variable Type Inference (test only, not prod code is where we landed)Assuming we just lazily evaluate and deal with new features as people actually care about them and seeing them add value, a simple "[DISCUSS] I'm thinking about using new language feature X; any objection?" lazy consensus that we then dumped onto a wiki article / code style page as "stuff we're good to use" would probably be fine?On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:I think it doesn’t cost us much to briefly discuss new language features before using them. Lambdas, Streams and var all have problems - and even with the guidance we publish some are still misused.The flow scoping improvement to instanceof seems obviously good though.On 9 May 2025, at 12:30, Josh McKenzie  wrote:For new feature work on trunk, targeting the highest supported language level featureset (jdk17 right now, jdk21 within the next couple of weeks) makes sense to me. For bugfixing, targeting the oldest supported GA branch and the highest language level that works there would allow maximum flexibility with minimal re-implementation.If anyone has any misgivings with certain features (i.e. the debate around usage of "var") they can bring it up on the dev ML and we can adjust, but otherwise I'd prefer to see us have more modern evolving options on how contributors engage rather than less.On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:Hello,I want to understand the community's thoughts on using newer features (post JDK11) in upcoming releases in Cassandra. An example is flow scoping instead of explicitly casting types with instanceOf: https://openjdk.org/jeps/395. I want your thoughts on JDK requirements for the main Cassandra repository, Accord, and Sidecar. Much appreciated.Thanks,Vivekanand K.  

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Josh McKenzie
For new feature work on trunk, targeting the highest supported language level 
featureset (jdk17 right now, jdk21 within the next couple of weeks) makes sense 
to me. For bugfixing, targeting the oldest supported GA branch and the highest 
language level that works there would allow maximum flexibility with minimal 
re-implementation.

If anyone has any misgivings with certain features (i.e. the debate around 
usage of "var") they can bring it up on the dev ML and we can adjust, but 
otherwise I'd prefer to see us have more modern evolving options on how 
contributors engage rather than less.

On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
> Hello,
> 
> I want to understand the community's thoughts on using newer features (post 
> JDK11) in upcoming releases in Cassandra. An example is flow scoping instead 
> of explicitly casting types with instanceOf: https://openjdk.org/jeps/395. I 
> want your thoughts on JDK requirements for the main Cassandra repository, 
> Accord, and Sidecar. 
> 
> Much appreciated.
> Thanks,
> Vivekanand K.  


Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Benedict Elliott Smith
Yep, that approach seems more than sufficient to me. No need for lots of 
ceremony, but good to keep everyone in the decision loop.

> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
> 
>> I think it doesn’t cost us much to briefly discuss new language features 
>> before using them.
> I had that thought as well but on balance my intuition was there were enough 
> new features that the volume of discussion to do that would be a poor 
> cost/benefit compared to the "lazy consensus, revert" approach.
> 
> So if I actually do the work required to have an opinion ;):
> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
> 
> JDK21:
> - Record Patterns 
> 
> - Pattern Matching for switch Expressions and Statements 
> 
> - String Templates 
> 
> - Unnamed Patterns and Variables 
> 
> - Unnamed Classes and Instance Main Methods 
> 
> JDK17:
> - Sealed Classes 
> 
> JDK16:
> - Pattern Matching for instanceof 
> 
> JDK15:
> - Text Blocks 
> 
> JDK14:
> - Switch Expressions 
> 
> JDK11:
> - Local Variable Type Inference 
> 
>  (test only, not prod code is where we landed)
> 
> Assuming we just lazily evaluate and deal with new features as people 
> actually care about them and seeing them add value, a simple "[DISCUSS] I'm 
> thinking about using new language feature X; any objection?" lazy consensus 
> that we then dumped onto a wiki article / code style page as "stuff we're 
> good to use" would probably be fine?
> 
> 
> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>> 
>> I think it doesn’t cost us much to briefly discuss new language features 
>> before using them. Lambdas, Streams and var all have problems - and even 
>> with the guidance we publish some are still misused.
>> 
>> The flow scoping improvement to instanceof seems obviously good though.
>> 
>> 
>>> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
>>> 
>>> For new feature work on trunk, targeting the highest supported language 
>>> level featureset (jdk17 right now, jdk21 within the next couple of weeks) 
>>> makes sense to me. For bugfixing, targeting the oldest supported GA branch 
>>> and the highest language level that works there would allow maximum 
>>> flexibility with minimal re-implementation.
>>> 
>>> If anyone has any misgivings with certain features (i.e. the debate around 
>>> usage of "var") they can bring it up on the dev ML and we can adjust, but 
>>> otherwise I'd prefer to see us have more modern evolving options on how 
>>> contributors engage rather than less.
>>> 
>>> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
 Hello,
 
 I want to understand the community's thoughts on using newer features 
 (post JDK11) in upcoming releases in Cassandra. An example is flow scoping 
 instead of explicitly casting types with instanceOf: 
 https://openjdk.org/jeps/395. I want your thoughts on JDK requirements for 
 the main Cassandra repository, Accord, and Sidecar. 
 
 Much appreciated.
 Thanks,
 Vivekanand K.  
>>> 
> 



Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Josh McKenzie
> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe explicit 
> casting with instanceOf. 
We have a rich history of broad sweeping refactors dying on the rocks of the 
community's aversion to instability and risk w/out a concrete outcome we're 
trying to achieve. :)

All of which is to say: do we have examples of instanceOf casting blowing 
things up for users that would warrant going through the codebase to tidy this 
up? Between src/java and test/unit and test/distributed we have around 2,000 
occurrences of this pattern.

On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
> Sounds great. I would like to refactor the codebase (Trunk 5+) to eliminate 
> unsafe explicit casting with instanceOf. 
> 
> Thanks,
> Vivekanand
> 
> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith  
> wrote:
>> Yep, that approach seems more than sufficient to me. No need for lots of 
>> ceremony, but good to keep everyone in the decision loop.
>> 
>>> On 9 May 2025, at 13:10, Josh McKenzie  wrote:
>>> 
 I think it doesn’t cost us much to briefly discuss new language features 
 before using them.
>>> I had that thought as well but on balance my intuition was there were 
>>> enough new features that the volume of discussion to do that would be a 
>>> poor cost/benefit compared to the "lazy consensus, revert" approach.
>>> 
>>> So if I actually do the work required to have an opinion ;):
>>> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>>> 
>>> JDK21:
>>> - Record Patterns 
>>> 
>>> - Pattern Matching for switch Expressions and Statements 
>>> 
>>> - String Templates 
>>> 
>>> - Unnamed Patterns and Variables 
>>> 
>>> - Unnamed Classes and Instance Main Methods 
>>> 
>>> JDK17:
>>> - Sealed Classes 
>>> 
>>> JDK16:
>>> - Pattern Matching for instanceof 
>>> 
>>> JDK15:
>>> - Text Blocks 
>>> 
>>> JDK14:
>>> - Switch Expressions 
>>> 
>>> JDK11:
>>> - Local Variable Type Inference 
>>> 
>>>  (test only, not prod code is where we landed)
>>> 
>>> Assuming we just lazily evaluate and deal with new features as people* 
>>> actually care about them* and seeing them add value, a simple "[DISCUSS] 
>>> I'm thinking about using new language feature X; any objection?" lazy 
>>> consensus that we then dumped onto a wiki article / code style page as 
>>> "stuff we're good to use" would probably be fine?
>>> 
>>> 
>>> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
 
 I think it doesn’t cost us much to briefly discuss new language features 
 before using them. Lambdas, Streams and var all have problems - and even 
 with the guidance we publish some are still misused.
 
 The flow scoping improvement to instanceof seems obviously good though.
 
 
> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
> 
> For new feature work on trunk, targeting the highest supported language 
> level featureset (jdk17 right now, jdk21 within the next couple of weeks) 
> makes sense to me. For bugfixing, targeting the oldest supported GA 
> branch and the highest language level that works there would allow 
> maximum flexibility with minimal re-implementation.
> 
> If anyone has any misgivings with certain features (i.e. the debate 
> around usage of "var") they can bring it up on the dev ML and we can 
> adjust, but otherwise I'd prefer to see us have more modern evolving 
> options on how contributors engage rather than less.
> 
> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
>> Hello,
>> 
>> I want to understand the community's thoughts on using newer features 
>> (post JDK11) in upcoming releases in Cassandra. An example is flow 
>> scoping instead of explicitly casting types with instanceOf: 
>> https://openjdk.org/jeps/395. I want your thoughts on JDK requirements 
>> for the main Cassandra repository, Accord, 

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Brandon Williams
We thought we had this figured out when we did the big bang switch to
ByteBuffers, then spent years finding subtle bugs that the tests
didn't.

Kind Regards,
Brandon

On Fri, May 9, 2025 at 3:24 PM Jon Haddad  wrote:
>
> There’s a pretty simple solution here - breaking it up into several smaller 
> patches.
>
> * Any changes should include tests that validate the checks are used 
> correctly.
> * It should also alleviate any issues with code conflicts and rebasing as the 
> merges would happen slowly over time rather than all at once.
> * If there’s two committers willing to spend time and work with OP on this, 
> that should be enough to move it forward.
> * There's a thread on user@ right now [1] where someone *just* ran into this 
> issue, so I'd say addressing that one is a reasonable starting point.
>
> [1] https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3
>
>
>
> Jon
>
>
> On Fri, May 9, 2025 at 12:16 PM C. Scott Andreas  wrote:
>>
>> My thinking is most closely aligned with Blake and Benedict’s views here.
>>
>> For the specific refactor in question, I support adoption of the language 
>> feature for new code or to cut existing code over to the new syntax as 
>> changes are made to the respective areas of the codebase. But I don’t 
>> support a sweeping project-wide refactor on trunk in this case.
>>
>> Here is my thinking:
>>
>> - If there are 2000 target sites for the refactor, that means this is going 
>> to be a 5000+ line diff.
>> - The safety improvement here is marginal but nonzero.
>> - If we have a 5000 line refactor, it should accomplish a significant and 
>> compelling purpose in the project.
>> - Any execution of the refactor will require manual review of each of those 
>> 2000 refactor sites on the part of the implementer and two reviewers.
>> - Since the check is compile-time, we’d learn that by the initial refactor 
>> the first time it’s compiled, and we short-circuit to having gained 100% of 
>> the value by being able to fix the broken callsites.
>> - The act of that per-call site review would inform us as to whether we had 
>> incorrect casts; and we would immediately achieve the value of the “safer” 
>> approach by having identified the bugs.
>> - 2x reviewer coverage for a 5000 line patch set is a significant commitment 
>> of reviewer resources. These reviewer resources have significant opportunity 
>> cost and can put to a better purpose.
>> - Blake/others mention that such refactors create conflicts when bug fixes 
>> are backported to previous releases, requiring refactors of those rebased 
>> patches to bring fixes to versions that predate the large refactor.
>>
>> I think this is a good language feature. I think we should use it. I think 
>> it’d be completely reasonable to cut existing syntax over to it as we make 
>> changes to the respective subsystems.
>>
>> But I wouldn’t do a big bang refactor in this case. The juice isn’t worth 
>> the squeeze for me.
>>
>> - Scott
>>
>> On May 9, 2025, at 11:33 AM, Blake Eggleston  wrote:
>>
>> 
>>
>> No one is treating the codebase like a house of cards that can’t be touched.
>>
>> In this case I think the cost/risk of doing this change outweighs the 
>> potential benefits the project might see from it. Josh counts ~2000 
>> instances where we’re casting objects so we’re talking about a 
>> not-insignificant change which may introduce it’s own bugs. Even if no new 
>> bugs are introduced, this will be an refactor annoyance for projects in 
>> development, but the real concern I have with any large change is how it 
>> complicates the process of fixing bugs across versions. On the other hand, I 
>> don’t think that incorrectly casting objects has historically been a source 
>> of pain for us, so it seems like the benefit would be small if any.
>>
>> On Fri, May 9, 2025, at 10:38 AM, Jon Haddad wrote:
>>
>> Why not?
>>
>> Personally, I hate the idea of treating a codebase (any codebase) like a 
>> house of cards that can't be touched.  It never made sense to me to try to 
>> bundle new features / bug fixes with improvements to code quality.
>>
>> Making the code more reliable should be a goal in itself, rather than a side 
>> effect of other work.
>>
>> Jon
>>
>>
>>
>> On Fri, May 9, 2025 at 10:31 AM Blake Eggleston  wrote:
>>
>>
>> This seems like a cool feature that will be useful in future development 
>> work, but not something we should be proactively refactoring the project to 
>> make use of.
>>
>> On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:
>>
>> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to provide 
>> safer and less poisoning in the code by default. Instead of having a 
>> production server halt/impaired due to a RuntimeException, instead it is 
>> verified at compile time. If a new language feature makes code more robust 
>> and addresses a hazardous, historical design choice, I believe it's time has 
>> arrived. Curious to see what everyone thinks.
>>
>> Thanks,
>> Viveka

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

2025-05-09 Thread David Capwell
> The MV repair tool in Cassandra is intended to address inconsistencies that 
> may occur in materialized views due to various factors. This component is the 
> most complex and demanding part of the development effort, representing 
> roughly 70% of the overall work.

> but I am very concerned about how the merkle tree comparisons are likely to 
> work with wide partitions leading to massive fanout in ranges. 

As far as I can tell, being based off Accord means you don’t need to care about 
repair, as Accord will manage the consistency for you; you can’t get out of 
sync.

Being based off accord also means you can deal with multiple partitions/tokens, 
where as LWT is limited to a single token.  I am not sure how the following 
would work with the proposed design and LWT

CREATE TABLE tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck));
CREATE MATERIALIZED VIEW tbl2
AS SELECT * FROM tbl WHERE ck > 42 PRIMARY KEY(pk, ck)

— mutations
UPDATE tbl SET v=42 WHERE pk IN (0, 1) AND ck IN (50, 74); — this touches 2 
partition keys
BEGIN BATCH — also touches 2 partition keys
  INSERT INTO tbl (pk, ck, v) VALUES (0, 47, 0);
  INSERT INTO tbl (pk, ck, v) VALUES (1, 48, 0);
END BATCH



> On May 9, 2025, at 1:03 PM, Jeff Jirsa  wrote:
> 
> 
> 
>> On May 9, 2025, at 12:59 PM, Ariel Weisberg  wrote:
>> 
>> 
>> I am *big* fan of getting repair really working with MVs. It does seem 
>> problematic that the number of merkle trees will be equal to the number of 
>> ranges in the cluster and repair of MVs would become an all node operation.  
>> How would down nodes be handled and how many nodes would simultaneously 
>> working to validate a given base table range at once? How many base table 
>> ranges could simultaneously be repairing MVs?
>> 
>> If a row containing a column that creates an MV partition is deleted, and 
>> the MV isn't updated, then how does the merkle tree approach propagate the 
>> deletion to the MV? The CEP says that anti-compaction would remove extra 
>> rows, but I am not clear on how that works. When is anti-compaction 
>> performed in the repair process and what is/isn't included in the outputs?
> 
> 
> I thought about these two points last night after I sent my email.
> 
> There’s 2 things in this proposal that give me a lot of pause.
> 
> One is the lack of tombstones / deletions in the merle trees, which makes 
> properly dealing with writes/deletes/inconsistency very hard (afaict)
> 
> The second is the reality that repairing a single partition in the base table 
> may repair all hosts/ranges in the MV table, and vice versa. Basically 
> scanning either base or MV is effectively scanning the whole cluster (modulo 
> what you can avoid in the clean/dirty repaired sets). This makes me really, 
> really concerned with how it scales, and how likely it is to be able to 
> schedule automatically without blowing up. 
> 
> The paxos vs accord comments so far are interesting in that I think both 
> could be made to work, but I am very concerned about how the merkle tree 
> comparisons are likely to work with wide partitions leading to massive fanout 
> in ranges. 
> 
> 



Re: [DISCUSS] Replace airlift/airline library with Picocli

2025-05-09 Thread Maxim Muzafarov
Hello everyone,

The commands have been migrated to picocli and are ready for review,
and we need a second committer to review them.
Would anyone be able to help?

Key points:
- All the commands are backwards-compatible with everything we have in
the trunk now (including the accord commands);
- The commands help output also match the trunk (no difference from
the UX point of view);
- Test coverage has also been significantly increased (most of the
changes are new tests);

https://github.com/apache/cassandra/pull/2497/files
https://issues.apache.org/jira/browse/CASSANDRA-17445

On Mon, 15 Jul 2024 at 20:53, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
>
> I want to continue the discussion that was originally started here
> [2], however, it's better to move it to a new thread with an
> appropriate title, so that everyone is aware of the replacement
> library we're trying to agree on.
>
> The question is:
> Does everyone agree with using Picocli as an airlift/airline
> replacement for our cli tools?
> The prototype to look at is here [1].
>
>
> The reasons are as follows:
>
> Why to replace?
>
> There are several cli tools that rely on the airlift/airline library
> to mark up the commands: NodeTool, JMXTool, FullQueryLogTool,
> CompactionStress (with the size of the NodeTool dominating the rest of
> the tools). The airline is no longer maintained, so we will have to
> update it sooner or later anyway.
>
>
> What criteria?
>
> Before we dive into the pros and cons of each candidate, I think we
> have to formulate criteria for the libraries we are considering, based
> on what we already have in the source code (from Cassandra's
> perspective). This in itself limits the libraries we can consider.
>
> Criteria can be as follows:
> - Library licensing, including risks that it may change in the future
> (the asf libs are the safest for us from this perspective);
> - Similarity of library design (to the airline). This means that the
> closer the libraries are, the easier it is to migrate to them, and the
> easier it is to get guarantees that we haven't broken anything. The
> further away the libraries are, the more extra code and testing we
> need;
> - Backward compatibility. The ideal case is where the user doesn't
> even notice that a different library is being used under the hood.
> This includes both the help output and command output.
>
> Of course, all libraries need to be known and well-maintained.
>
> What candidates?
>
>
> Picocli
> https://picocli.info/
>
> This is the well-known cli library under the Apache 2.0 license, which
> is similar to what we have in source code right now. This also means
> that the amount of changes (despite the number of the commands)
> required to migrate what we have is quite small.
> In particular, I would like to point out that:
> - It allows us to unbind the jmx-specific command options from the
> commands themselves, so that they can be reused in other APIs (my
> goal);
> - We can customize the help output so that the user doesn't notice
> anything while using of the nodetool;
> - The cli parser is the same as what we now do with cli arguments.
>
> This makes the library a good candidate, but leaves open the question
> of changing the license of the lib in the future. However, these risks
> are relatively small because the CLI library is not a monetizable
> thing, as I believe. We can also mitigate the risks copying the lib to
> sources, as it mentioned here:
> https://picocli.info/#_getting_started
>
>
> commons-cli
> https://commons.apache.org/proper/commons-cli/
>
> In terms of licenses, it is the easiest candidate for us to use as
> it's under the asf, and in fact the library is already used in e.g.
> BulkLoader, SSTableExpoert.
> However, I'd like to point out the following disadvantages the library
> has for our case:
> - This is not a drop-in replacement for the airline commands, as the
> lib does not have annotation for markup commands. We have to flesh out
> all the options we have as java classes, or create our owns;
> - Subcommands have to be supported manually, which requires extra
> effort to adopt the cli parser (correct me if I'm wrong here). We have
> at least several subcommands in the NodeTool e.g. cms describe, cms
> snapshot;
> - Apart from parsing the cli arguments, we need to manually initialize
> the command class and set the input arguments we have.
>
>
> JComannder
> https://jcommander.org/
>
> The library is licensed under the Apache 2.0 license, so the situation
> is the same as for Picocli. Here I'd like to point out a few things I
> encountered while prototyping:
> - Unbinding the jmx-specific options from commands is quite tricky and
> requires touching an internal API (which I won't do). Option
> inheritance is not the way to go if we want to have a clear command
> hierarchy regardless of the API used.
> - We won't be able to inject a Logger (the Output class in terms of
> NodeTool) or other abstractions (e.g. MBeans) directly into the
> comm

Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Benedict
I think it doesn’t cost us much to briefly discuss new language features before 
using them. Lambdas, Streams and var all have problems - and even with the 
guidance we publish some are still misused.

The flow scoping improvement to instanceof seems obviously good though.

> On 9 May 2025, at 12:30, Josh McKenzie  wrote:
> 
> 
> For new feature work on trunk, targeting the highest supported language level 
> featureset (jdk17 right now, jdk21 within the next couple of weeks) makes 
> sense to me. For bugfixing, targeting the oldest supported GA branch and the 
> highest language level that works there would allow maximum flexibility with 
> minimal re-implementation.
> 
> If anyone has any misgivings with certain features (i.e. the debate around 
> usage of "var") they can bring it up on the dev ML and we can adjust, but 
> otherwise I'd prefer to see us have more modern evolving options on how 
> contributors engage rather than less.
> 
>> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
>> Hello,
>> 
>> I want to understand the community's thoughts on using newer features (post 
>> JDK11) in upcoming releases in Cassandra. An example is flow scoping instead 
>> of explicitly casting types with instanceOf: https://openjdk.org/jeps/395. I 
>> want your thoughts on JDK requirements for the main Cassandra repository, 
>> Accord, and Sidecar. 
>> 
>> Much appreciated.
>> Thanks,
>> Vivekanand K.  
> 


Re: Cassandra 5+ JDK Minimum Compatibility Requirement

2025-05-09 Thread Vivekanand Koya
Made some progress. After adding 
throughout build.xml and compiling the 5.03 branch with openjdk 17.0.15
2025-04-15
OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6) I got a
build Failed error at the same position in exception. Please see:
https://github.com/apache/cassandra/pull/4152

While debugging, it appears there is an idiosyncrasy how Netty was used for
efficient network operations. The unsafe casting was highlighted by the
compiler and eventually made its way to runtime. I drew a dependency graph
between types. It appears Java natively supports such functionality with
Project Loom (https://openjdk.org/jeps/444) (
https://inside.java/2021/05/10/networking-io-with-virtual-threads/). I
understand that this is only part of the story. Please correct me if my
reasoning is wrong, wish to learn from your experience. Wish to see your
insights.

Thanks,
Vivekanand K.

On Fri, May 9, 2025 at 1:30 PM Brandon Williams  wrote:

> We thought we had this figured out when we did the big bang switch to
> ByteBuffers, then spent years finding subtle bugs that the tests
> didn't.
>
> Kind Regards,
> Brandon
>
> On Fri, May 9, 2025 at 3:24 PM Jon Haddad  wrote:
> >
> > There’s a pretty simple solution here - breaking it up into several
> smaller patches.
> >
> > * Any changes should include tests that validate the checks are used
> correctly.
> > * It should also alleviate any issues with code conflicts and rebasing
> as the merges would happen slowly over time rather than all at once.
> > * If there’s two committers willing to spend time and work with OP on
> this, that should be enough to move it forward.
> > * There's a thread on user@ right now [1] where someone *just* ran into
> this issue, so I'd say addressing that one is a reasonable starting point.
> >
> > [1] https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3
> >
> >
> >
> > Jon
> >
> >
> > On Fri, May 9, 2025 at 12:16 PM C. Scott Andreas 
> wrote:
> >>
> >> My thinking is most closely aligned with Blake and Benedict’s views
> here.
> >>
> >> For the specific refactor in question, I support adoption of the
> language feature for new code or to cut existing code over to the new
> syntax as changes are made to the respective areas of the codebase. But I
> don’t support a sweeping project-wide refactor on trunk in this case.
> >>
> >> Here is my thinking:
> >>
> >> - If there are 2000 target sites for the refactor, that means this is
> going to be a 5000+ line diff.
> >> - The safety improvement here is marginal but nonzero.
> >> - If we have a 5000 line refactor, it should accomplish a significant
> and compelling purpose in the project.
> >> - Any execution of the refactor will require manual review of each of
> those 2000 refactor sites on the part of the implementer and two reviewers.
> >> - Since the check is compile-time, we’d learn that by the initial
> refactor the first time it’s compiled, and we short-circuit to having
> gained 100% of the value by being able to fix the broken callsites.
> >> - The act of that per-call site review would inform us as to whether we
> had incorrect casts; and we would immediately achieve the value of the
> “safer” approach by having identified the bugs.
> >> - 2x reviewer coverage for a 5000 line patch set is a significant
> commitment of reviewer resources. These reviewer resources have significant
> opportunity cost and can put to a better purpose.
> >> - Blake/others mention that such refactors create conflicts when bug
> fixes are backported to previous releases, requiring refactors of those
> rebased patches to bring fixes to versions that predate the large refactor.
> >>
> >> I think this is a good language feature. I think we should use it. I
> think it’d be completely reasonable to cut existing syntax over to it as we
> make changes to the respective subsystems.
> >>
> >> But I wouldn’t do a big bang refactor in this case. The juice isn’t
> worth the squeeze for me.
> >>
> >> - Scott
> >>
> >> On May 9, 2025, at 11:33 AM, Blake Eggleston 
> wrote:
> >>
> >> 
> >>
> >> No one is treating the codebase like a house of cards that can’t be
> touched.
> >>
> >> In this case I think the cost/risk of doing this change outweighs the
> potential benefits the project might see from it. Josh counts ~2000
> instances where we’re casting objects so we’re talking about a
> not-insignificant change which may introduce it’s own bugs. Even if no new
> bugs are introduced, this will be an refactor annoyance for projects in
> development, but the real concern I have with any large change is how it
> complicates the process of fixing bugs across versions. On the other hand,
> I don’t think that incorrectly casting objects has historically been a
> source of pain for us, so it seems like the benefit would be small if any.
> >>
> >> On Fri, May 9, 2025, at 10:38 AM, Jon Haddad wrote:
> >>
> >> Why not?
> >>
> >> Personally, I hate the idea of treating a codebase (any codebase) like
> a house of cards that c

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

2025-05-09 Thread Runtian Liu
I’ve added a new section on isolation and consistency
.
In our current design, materialized-view tables stay eventually consistent,
while the base table offers linearizability. Here, “strict consistency”
refers to linearizable base-table updates, with every successful write
ensuring that the corresponding MV change is applied and visible.

>Why mandate `LOCAL_QUORUM` instead of using the consistency level
requested by the application? If they want to use `LOCAL_QUORUM` they can
always request it.

I think you meant LOCAL_SERIAL? Right, LOCAL_SERIAL should not be mandatory
and users should be able to select which consistency to use. Updated the
page for this one.

For the example David mentioned, LWT cannot support. Since LWTs operate on
a single token, we’ll need to restrict base-table updates to one
partition—and ideally one row—at a time. A current MV base-table command
can delete an entire partition, but doing so might touch hundreds of MV
partitions, making consistency guarantees impossible. Limiting each
operation’s scope lets us ensure that every successful base-table write is
accurately propagated to its MV. Even with Accord backed MV, I think we
will need to limit the number of rows that get modified each time.

Regarding repair, due to bugs, operator errors, or hardware faults, MVs can
become out of sync with their base tables—regardless of the chosen
synchronization method during writes. The purpose of MV repair is to detect
and resolve these mismatches using the base table as the source of truth.
As a result, if data resurrection occurs in the base table, the repair
process will propagate that resurrected data to the MV.

>One is the lack of tombstones / deletions in the merle trees, which makes
properly dealing with writes/deletes/inconsistency very hard (afaict)

Tombstones are excluded because a base table update can produce a tombstone
in the MV—for example, when the updated cell is part of the MV's primary
key. Since such tombstones may not exist in the base table, we can only
compare live data during MV repair.

> repairing a single partition in the base table may repair all
hosts/ranges in the MV table,

That’s correct. To avoid repeatedly scanning both tables, the proposed
solution is for all nodes to take a snapshot first. Then, each node scans
the base table once and the MV table once, generating a list of Merkle
trees from each scan. These lists are then compared to identify mismatches.
This means MV repair must be performed at the table level rather than one
token range at a time to be efficient.

>If a row containing a column that creates an MV partition is deleted, and
the MV isn't updated, then how does the merkle tree approach propagate the
deletion to the MV? The CEP says that anti-compaction would remove extra
rows, but I am not clear on how that works. When is anti-compaction
performed in the repair process and what is/isn't included in the outputs?

Let me illustrate this with an example:

We have the following base table and MV:

CREATE TABLE base (pk int, ck int, v int, PRIMARY KEY (pk, ck));CREATE
MATERIALIZED VIEW mv AS SELECT * FROM base PRIMARY KEY (ck, pk);

Assume there are 100 rows in the base table (e.g., (1,1), (2,2), ...,
(100,100)), and accordingly, the MV also has 100 rows. Now, suppose the row
(55,55) is deleted from the base table, but due to some issue, it still
exists in the MV.

Let's say each Merkle tree covers 20 rows in both the base and MV tables,
so we have a 5x5 grid—25 Merkle tree comparisons in total. Suppose the
repair job detects a mismatch in the range base(40–59) vs MV(40–59).

On the node that owns the MV range (40–59), anti-compaction will be
triggered. If all 100 rows were in a single SSTable, it would be split into
two SSTables: one containing the 20 rows in the (40–59) range, and the
other containing the remaining 80 rows.

On the base table side, the node will scan the (40–59) range, identify all
rows that map to the MV range (40–59)—which in this example would be 19
rows—and stream them to the MV node. Once streaming completes, the MV node
can safely mark the 20-row SSTable as obsolete. In this way, the extra row
in MV is removed.

The core idea is to reconstruct the MV data for base range (40–59) and MV
range (40–59) using the corresponding base table range as the source of
truth.


On Fri, May 9, 2025 at 2:26 PM David Capwell  wrote:

> The MV repair tool in Cassandra is intended to address inconsistencies
> that may occur in materialized views due to various factors. This component
> is the most complex and demanding part of the development effort,
> representing roughly 70% of the overall work.
>
>
> but I am very concerned about how the merkle tree comparisons are likely
> to work with wide partitions leading to massive fanout in ranges.
>
>
> As far as I can tell, bein

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

2025-05-09 Thread Benedict
I should add that I’m in favour of this proposal in principle, and support the proposal to utilise Paxos.On 9 May 2025, at 08:21, Benedict Elliott Smith  wrote:I’d also like to explore a bit further the isolation guarantees we’re promising with "Strict Consistency Mode” - and the protocol details. By strict, do we mean linearizable? Either way, we should state the guarantees explicitly so we can evaluate whether the protocol can meet them. Also, if the protocol is not linearisable, we should leave design space for a genuinely strict mode later.It isn’t clearly stated in the design document, but it seems to me that safety with this approach requires a SERIAL base-table read for every MV read to ensure the view is consistent with the base table. This means the MV cannot meaningfully replicate any data, only keys that can be used to consult the base table. Is that a reasonable inference for “strict" mode?Using LOCAL_SERIAL for this purpose (as it seems the document proposes) cannot provide strict guarantees, and mixing LOCAL_SERIAL with SERIAL is generally considered unsafe - so we need to explore a bit more in the design document what this means, but once we understand the isolation guarantees we're promising that will be easier.On 9 May 2025, at 02:13, Jeff Jirsa  wrote:Setting aside the paxos vs accord conversation (though admittedly my first question would have been “why not accord”), I’m curious from folks who have thought about this how you’re thinking about correctness of repairI ask because I have seen far more data resurrection cases than I have lost write cases, so repair here propagates that resurrection? Is that the expected primary behavior? I know repair also propagates resurrection in many cases (once tombstones purge), but has anyone running MVs in real life seen mismatches caused by lost writes instead of by something else (like resurrection)?On May 8, 2025, at 5:44 PM, Runtian Liu  wrote:Here’s my perspective:#1 Accord vs. LWT round tripsBased on the insights shared by the Accord experts, it appears that implementing MV using Accord can achieve a comparable number of round trips as the LWT solution proposed in CEP-48. Additionally, it seems that the number of WAN RTTs might be fewer than the LWT solution through Accord. This suggests that Accord is either equivalent or better in terms of performance for CEP-48.Given this, it seems appropriate to set aside performance as a deciding factor when evaluating LWT versus Accord. I've also updated the CEP-48 page to reflect this clarification.#2 Accord vs. LWT current stateAccord Accord is poised to significantly reshape Apache Cassandra's future and stands out as one of the most impactful developments on the horizon. The community is genuinely excited about its potential.That said, the recent mailing list update on Accord (CEP-15) highlights that substantial work remains to mature the protocol entirely. In addition, real-world testing is still needed to validate its readiness. Beyond that, users will require additional time to evaluate and adopt Cassandra 6.x in their environments.LWTOn the other hand, LWT has been proven and has been hitting production at scale for many years.#3 Dev work for CEP-48The CEP-48 design has two major components.Online path (CQL Mutations)This section focuses on the LWT code path where any mutation to a base table (via CQL insert, update, or delete) reliably triggers the corresponding materialized view (MV) update. The development effort required for this part is relatively limited, accounting for approximately 30% of the total work.If we need to implement this on Accord, this would be a similar effort as the LWT.Offline path (MV Data Repair)The MV repair tool in Cassandra is intended to address inconsistencies that may occur in materialized views due to various factors. This component is the most complex and demanding part of the development effort, representing roughly 70% of the overall work.#4 Accord is mentioned as a Future Alternative in CEP-48Accord has always been top of mind, and we genuinely appreciate the thought and effort that has gone into its design and implementation -  We’re excited about the changes, and if you look at the CEP-48 proposal, Accord is listed as a 'Future Alternative' — not as a 'Rejected Alternative' — to make clear that we continue to see value in its approach and are not opposed to it.Based on #1, #2, #3, and #4, here is my thinking:Scenario#1: CEP-15 prod takes longer than CEP-48 mergeSince we're starting with LWT, there is no dependency on the progress of CEP-15. This means the community can benefit from CEP-48 independently of CEP-15's timeline. Additionally, it's possible to backport the changes from trunk to the current broadly adopted Cassandra release (4.1.x), enabling adoption before upgrading to 6.x.Scenario#2: CEP-15 prod qualified before CEP-48 mergeAs noted in #3, developing on top of Accord is a relatively small effort of the overall CEP-48 scope. Therefore, we can imple

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

2025-05-09 Thread Benedict Elliott Smith
I’d also like to explore a bit further the isolation guarantees we’re promising 
with "Strict Consistency Mode” - and the protocol details. By strict, do we 
mean linearizable? Either way, we should state the guarantees explicitly so we 
can evaluate whether the protocol can meet them. Also, if the protocol is not 
linearisable, we should leave design space for a genuinely strict mode later.

It isn’t clearly stated in the design document, but it seems to me that safety 
with this approach requires a SERIAL base-table read for every MV read to 
ensure the view is consistent with the base table. This means the MV cannot 
meaningfully replicate any data, only keys that can be used to consult the base 
table. Is that a reasonable inference for “strict" mode?

Using LOCAL_SERIAL for this purpose (as it seems the document proposes) cannot 
provide strict guarantees, and mixing LOCAL_SERIAL with SERIAL is generally 
considered unsafe - so we need to explore a bit more in the design document 
what this means, but once we understand the isolation guarantees we're 
promising that will be easier.


> On 9 May 2025, at 02:13, Jeff Jirsa  wrote:
> 
> Setting aside the paxos vs accord conversation (though admittedly my first 
> question would have been “why not accord”), I’m curious from folks who have 
> thought about this how you’re thinking about correctness of repair
> 
> I ask because I have seen far more data resurrection cases than I have lost 
> write cases, so repair here propagates that resurrection? Is that the 
> expected primary behavior? I know repair also propagates resurrection in many 
> cases (once tombstones purge), but has anyone running MVs in real life seen 
> mismatches caused by lost writes instead of by something else (like 
> resurrection)?
> 
> 
>> On May 8, 2025, at 5:44 PM, Runtian Liu  wrote:
>> 
>> 
>> Here’s my perspective:
>> 
>> #1 Accord vs. LWT round trips
>> 
>> Based on the insights shared by the Accord experts, it appears that 
>> implementing MV using Accord can achieve a comparable number of round trips 
>> as the LWT solution proposed in CEP-48. Additionally, it seems that the 
>> number of WAN RTTs might be fewer than the LWT solution through Accord. This 
>> suggests that Accord is either equivalent or better in terms of performance 
>> for CEP-48.
>> 
>> Given this, it seems appropriate to set aside performance as a deciding 
>> factor when evaluating LWT versus Accord. I've also updated the CEP-48 page 
>> to reflect this clarification.
>> 
>> #2 Accord vs. LWT current state
>> 
>> Accord 
>> 
>> Accord is poised to significantly reshape Apache Cassandra's future and 
>> stands out as one of the most impactful developments on the horizon. The 
>> community is genuinely excited about its potential.
>> 
>> That said, the recent mailing list update 
>>  on 
>> Accord (CEP-15) highlights that substantial work remains to mature the 
>> protocol entirely. In addition, real-world testing is still needed to 
>> validate its readiness. Beyond that, users will require additional time to 
>> evaluate and adopt Cassandra 6.x in their environments.
>> 
>> LWT
>> 
>> On the other hand, LWT has been proven and has been hitting production at 
>> scale for many years.
>> 
>> #3 Dev work for CEP-48
>> 
>> The CEP-48 design has two major components.
>> 
>> Online path (CQL Mutations)
>> 
>> This section focuses on the LWT code path where any mutation to a base table 
>> (via CQL insert, update, or delete) reliably triggers the corresponding 
>> materialized view (MV) update. The development effort required for this part 
>> is relatively limited, accounting for approximately 30% of the total work.
>> 
>> If we need to implement this on Accord, this would be a similar effort as 
>> the LWT.
>> 
>> Offline path (MV Data Repair)
>> 
>> The MV repair tool in Cassandra is intended to address inconsistencies that 
>> may occur in materialized views due to various factors. This component is 
>> the most complex and demanding part of the development effort, representing 
>> roughly 70% of the overall work.
>> 
>> #4 Accord is mentioned as a Future Alternative in CEP-48
>> 
>> Accord has always been top of mind, and we genuinely appreciate the thought 
>> and effort that has gone into its design and implementation -  We’re excited 
>> about the changes, and if you look at the CEP-48 proposal, Accord is listed 
>> as a 'Future Alternative' — not as a 'Rejected Alternative' — to make clear 
>> that we continue to see value in its approach and are not opposed to it.
>> 
>> 
>> 
>> Based on #1, #2, #3, and #4, here is my thinking:
>> 
>> Scenario#1: CEP-15 prod takes longer than CEP-48 merge
>> 
>> Since we're starting with LWT, there is no dependency on the progress of 
>> CEP-15. This means the community can benefit from CEP-48 independently of 
>> CEP-15's timeline. Additionally, it's possible to backport the changes from 
>> trunk t