I've taken a closer look at the code. I've been able to reproduce the issue
and have produced a fix. Along the way, I've noticed a few things.


   1. initiateStreaming() method of OutboundConnectionInitiator is called
   from NettyStreamingConnectionFactory located in the streaming.async
   package. Similar method invocation initiateMessaging() method
   of OutboundConnectionInitiator is made from OutboundConnection (net
   package). Thus, the enum Outcome is package-private. The sanity check based
   on OutboundConnectionInitiator.Result.Outcome could be performed from
   OutboundConnection class. The same check cannot be performed
   from NettyStreamingConnectionFactory since the outcome field of Result is
   inaccessible.
   2. There is an attempt to cast to the generic type SuccessType in
   OutboundConnectionIntiator.Result class. This behavior is inconsistent with
   the retry() and incompatible() methods which return their respective
   classes.
   3. In NettyStreamingConnectionFactory, there appears to be some
   confusion in the involcation of isSuccess() method. It actually is making
   the invocation on Netty Future. It should have been on the Result object.
   On making a successful connect, NettyStreamingConnectionFactory calls
   success() on Future' s getNow() without checking the type of the cast.
   4. There are no tests for initiateStreaming() method
   of OutboundConnectionInitiator as there are for initiateMessaging() method
   of OutboundConnectionInitiator.

What I've done.

   1. I wrote a test (StreamingTest) that reproduces the issue in
   https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3.
   2. I used the instanceof in https://openjdk.org/jeps/394 to make
   incorrect comparisons a compile-time error. This is done in
   OutboundConnection where I check if result.success() instanceof
   MessagingSuccess and OutboundConnectionInitiator where I return Success
   safely instead.

Please see the pull request: https://github.com/apache/cassandra/pull/4152

Thanks,
Vivekanand Koya




On Mon, Jun 2, 2025 at 4:23 PM Vivekanand Koya <[email protected]>
wrote:

> Hello Everyone,
>
> I was debugging
> https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3 and
> found the issue. The Result inner class has a circular dependency on its
> inner classes. (
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L457).
> I have refactored the Result class into an individual file (Result.java).
> The refactored code compiles successfully by Ant. I am now working to
> resolve the dependency between the superclass and subclass. (
> https://github.com/vivekkoya/cassandra/commit/1e5178dd8a8a523eb490c753ee28ff966abe9fc3
> )
>
> At the same time, I found the deprecated warnings from using
> SecurityManager annoying so I commented out all the code in src/ and test/
> directories which caused those warnings and got the code to compile with
> only one warning from an Ant jar file.
> (
> https://github.com/vivekkoya/cassandra/commit/8b7f5e150ddb678a46bd661f595a8875d1329451
> )
>
> Appreciate any feedback
>
> Thanks,
> Vivekanand K.
>
>
> On Sun, May 11, 2025 at 5:27 AM Josh McKenzie <[email protected]>
> wrote:
>
>> breaking it up into several smaller patches.
>>
>> This immediately made me think of poor Blake and the "remove singletons"
>> sisyphean task. That was 700 smaller patches! :D  Which to be fair isn't
>> "several" by any measure, but...
>>
>> All of which is to say - it's a continuum between big-banging it in one
>> or death by 700 cuts. Not to grind an axe (I'm totally grinding an axe),
>> but if we head cleanly delineated modular boundaries in this codebase with
>> clear separation of concerns, we could tackle a refactor like this on a
>> subsystem by subsystem basis (i.e. batch: the middle ground between big
>> bang and death-by-a-thousand) to strike a sweet spot between two extremes.
>>
>> I'm sympathetic to the pragmatic reality of the disruption sweeping
>> changes to modernize cause to a project ecosystem, but my opinion is that
>> the march of time and evolution of our language ecosystem is *really*
>> leaving us behind without some batched, focused work on modernization. This
>> codebase has some jekyll-and-hyde vibes; when you git blame and see <= 2010
>> and svn import commit messages all over a file it's very much a red flag
>> that you're probably in shark-infested waters.
>>
>> On Sat, May 10, 2025, at 11:53 AM, Vivekanand Koya wrote:
>>
>> It looks like there is a potential solution to the indeterministic
>> bytebuffer:
>> https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/foreign/MemoryLayout.html
>> & https://archive.fosdem.org/2020/schedule/event/bytebuffers/
>>
>> Thanks,
>> Vivekanand K.
>>
>>
>> On Fri, May 9, 2025, 8:59 PM Vivekanand Koya <[email protected]>
>> wrote:
>>
>> Made some progress. After adding <compilerarg value="-Xlint:unchecked"/>
>> throughout build.xml and compiling the 5.03 branch with openjdk 17.0.15
>> 2025-04-15
>> OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6) I got a
>> build Failed error at the same position in exception. Please see:
>> https://github.com/apache/cassandra/pull/4152
>>
>> While debugging, it appears there is an idiosyncrasy how Netty was used
>> for efficient network operations. The unsafe casting was highlighted by the
>> compiler and eventually made its way to runtime. I drew a dependency graph
>> between types. It appears Java natively supports such functionality with
>> Project Loom (https://openjdk.org/jeps/444) (
>> https://inside.java/2021/05/10/networking-io-with-virtual-threads/). I
>> understand that this is only part of the story. Please correct me if my
>> reasoning is wrong, wish to learn from your experience. Wish to see your
>> insights.
>>
>> Thanks,
>> Vivekanand K.
>>
>> On Fri, May 9, 2025 at 1:30 PM Brandon Williams <[email protected]> wrote:
>>
>> We thought we had this figured out when we did the big bang switch to
>> ByteBuffers, then spent years finding subtle bugs that the tests
>> didn't.
>>
>> Kind Regards,
>> Brandon
>>
>> On Fri, May 9, 2025 at 3:24 PM Jon Haddad <[email protected]>
>> wrote:
>> >
>> > There’s a pretty simple solution here - breaking it up into several
>> smaller patches.
>> >
>> > * Any changes should include tests that validate the checks are used
>> correctly.
>> > * It should also alleviate any issues with code conflicts and rebasing
>> as the merges would happen slowly over time rather than all at once.
>> > * If there’s two committers willing to spend time and work with OP on
>> this, that should be enough to move it forward.
>> > * There's a thread on user@ right now [1] where someone *just* ran
>> into this issue, so I'd say addressing that one is a reasonable starting
>> point.
>> >
>> > [1] https://lists.apache.org/thread/ykkwhjdpgyqzw5xtol4v5ysz664bxxl3
>> >
>> >
>> >
>> > Jon
>> >
>> >
>> > On Fri, May 9, 2025 at 12:16 PM C. Scott Andreas <[email protected]>
>> wrote:
>> >>
>> >> My thinking is most closely aligned with Blake and Benedict’s views
>> here.
>> >>
>> >> For the specific refactor in question, I support adoption of the
>> language feature for new code or to cut existing code over to the new
>> syntax as changes are made to the respective areas of the codebase. But I
>> don’t support a sweeping project-wide refactor on trunk in this case.
>> >>
>> >> Here is my thinking:
>> >>
>> >> - If there are 2000 target sites for the refactor, that means this is
>> going to be a 5000+ line diff.
>> >> - The safety improvement here is marginal but nonzero.
>> >> - If we have a 5000 line refactor, it should accomplish a significant
>> and compelling purpose in the project.
>> >> - Any execution of the refactor will require manual review of each of
>> those 2000 refactor sites on the part of the implementer and two reviewers.
>> >> - Since the check is compile-time, we’d learn that by the initial
>> refactor the first time it’s compiled, and we short-circuit to having
>> gained 100% of the value by being able to fix the broken callsites.
>> >> - The act of that per-call site review would inform us as to whether
>> we had incorrect casts; and we would immediately achieve the value of the
>> “safer” approach by having identified the bugs.
>> >> - 2x reviewer coverage for a 5000 line patch set is a significant
>> commitment of reviewer resources. These reviewer resources have significant
>> opportunity cost and can put to a better purpose.
>> >> - Blake/others mention that such refactors create conflicts when bug
>> fixes are backported to previous releases, requiring refactors of those
>> rebased patches to bring fixes to versions that predate the large refactor.
>> >>
>> >> I think this is a good language feature. I think we should use it. I
>> think it’d be completely reasonable to cut existing syntax over to it as we
>> make changes to the respective subsystems.
>> >>
>> >> But I wouldn’t do a big bang refactor in this case. The juice isn’t
>> worth the squeeze for me.
>> >>
>> >> - Scott
>> >>
>> >> On May 9, 2025, at 11:33 AM, Blake Eggleston <[email protected]>
>> wrote:
>> >>
>> >> 
>> >>
>> >> No one is treating the codebase like a house of cards that can’t be
>> touched.
>> >>
>> >> In this case I think the cost/risk of doing this change outweighs the
>> potential benefits the project might see from it. Josh counts ~2000
>> instances where we’re casting objects so we’re talking about a
>> not-insignificant change which may introduce it’s own bugs. Even if no new
>> bugs are introduced, this will be an refactor annoyance for projects in
>> development, but the real concern I have with any large change is how it
>> complicates the process of fixing bugs across versions. On the other hand,
>> I don’t think that incorrectly casting objects has historically been a
>> source of pain for us, so it seems like the benefit would be small if any.
>> >>
>> >> On Fri, May 9, 2025, at 10:38 AM, Jon Haddad wrote:
>> >>
>> >> Why not?
>> >>
>> >> Personally, I hate the idea of treating a codebase (any codebase) like
>> a house of cards that can't be touched.  It never made sense to me to try
>> to bundle new features / bug fixes with improvements to code quality.
>> >>
>> >> Making the code more reliable should be a goal in itself, rather than
>> a side effect of other work.
>> >>
>> >> Jon
>> >>
>> >>
>> >>
>> >> On Fri, May 9, 2025 at 10:31 AM Blake Eggleston <[email protected]>
>> wrote:
>> >>
>> >>
>> >> This seems like a cool feature that will be useful in future
>> development work, but not something we should be proactively refactoring
>> the project to make use of.
>> >>
>> >> On Fri, May 9, 2025, at 10:18 AM, Vivekanand Koya wrote:
>> >>
>> >> I would say that https://openjdk.org/jeps/394 (instanceOf) aims to
>> provide safer and less poisoning in the code by default. Instead of having
>> a production server halt/impaired due to a RuntimeException, instead it is
>> verified at compile time. If a new language feature makes code more robust
>> and addresses a hazardous, historical design choice, I believe it's time
>> has arrived. Curious to see what everyone thinks.
>> >>
>> >> Thanks,
>> >> Vivekanand K.
>> >>
>> >> On Fri, May 9, 2025 at 9:51 AM Josh McKenzie <[email protected]>
>> wrote:
>> >>
>> >>
>> >> I would like to refactor the codebase (Trunk 5+) to eliminate unsafe
>> explicit casting with instanceOf.
>> >>
>> >> We have a rich history of broad sweeping refactors dying on the rocks
>> of the community's aversion to instability and risk w/out a concrete
>> outcome we're trying to achieve. :)
>> >>
>> >> All of which is to say: do we have examples of instanceOf casting
>> blowing things up for users that would warrant going through the codebase
>> to tidy this up? Between src/java and test/unit and test/distributed we
>> have around 2,000 occurrences of this pattern.
>> >>
>> >> On Fri, May 9, 2025, at 10:14 AM, Vivekanand Koya wrote:
>> >>
>> >> Sounds great. I would like to refactor the codebase (Trunk 5+) to
>> eliminate unsafe explicit casting with instanceOf.
>> >>
>> >> Thanks,
>> >> Vivekanand
>> >>
>> >> On Fri, May 9, 2025, 5:19 AM Benedict Elliott Smith <
>> [email protected]> wrote:
>> >>
>> >> Yep, that approach seems more than sufficient to me. No need for lots
>> of ceremony, but good to keep everyone in the decision loop.
>> >>
>> >> On 9 May 2025, at 13:10, Josh McKenzie <[email protected]> wrote:
>> >>
>> >> I think it doesn’t cost us much to briefly discuss new language
>> features before using them.
>> >>
>> >> I had that thought as well but on balance my intuition was there were
>> enough new features that the volume of discussion to do that would be a
>> poor cost/benefit compared to the "lazy consensus, revert" approach.
>> >>
>> >> So if I actually do the work required to have an opinion ;):
>> >>
>> https://docs.oracle.com/en/java/javase/21/language/java-language-changes-release.html#GUID-6459681C-6881-45D8-B0DB-395D1BD6DB9B
>> >>
>> >> JDK21:
>> >> - Record Patterns
>> >> - Pattern Matching for switch Expressions and Statements
>> >> - String Templates
>> >> - Unnamed Patterns and Variables
>> >> - Unnamed Classes and Instance Main Methods
>> >> JDK17:
>> >> - Sealed Classes
>> >> JDK16:
>> >> - Pattern Matching for instanceof
>> >> JDK15:
>> >> - Text Blocks
>> >> JDK14:
>> >> - Switch Expressions
>> >> JDK11:
>> >> - Local Variable Type Inference (test only, not prod code is where we
>> landed)
>> >>
>> >> Assuming we just lazily evaluate and deal with new features as people
>> actually care about them and seeing them add value, a simple "[DISCUSS] I'm
>> thinking about using new language feature X; any objection?" lazy consensus
>> that we then dumped onto a wiki article / code style page as "stuff we're
>> good to use" would probably be fine?
>> >>
>> >>
>> >> On Fri, May 9, 2025, at 7:58 AM, Benedict wrote:
>> >>
>> >>
>> >> I think it doesn’t cost us much to briefly discuss new language
>> features before using them. Lambdas, Streams and var all have problems -
>> and even with the guidance we publish some are still misused.
>> >>
>> >> The flow scoping improvement to instanceof seems obviously good though.
>> >>
>> >>
>> >> On 9 May 2025, at 12:30, Josh McKenzie <[email protected]> wrote:
>> >>
>> >> 
>> >> For new feature work on trunk, targeting the highest supported
>> language level featureset (jdk17 right now, jdk21 within the next couple of
>> weeks) makes sense to me. For bugfixing, targeting the oldest supported GA
>> branch and the highest language level that works there would allow maximum
>> flexibility with minimal re-implementation.
>> >>
>> >> If anyone has any misgivings with certain features (i.e. the debate
>> around usage of "var") they can bring it up on the dev ML and we can
>> adjust, but otherwise I'd prefer to see us have more modern evolving
>> options on how contributors engage rather than less.
>> >>
>> >> On Fri, May 9, 2025, at 1:56 AM, Vivekanand Koya wrote:
>> >>
>> >> Hello,
>> >>
>> >> I want to understand the community's thoughts on using newer features
>> (post JDK11) in upcoming releases in Cassandra. An example is flow scoping
>> instead of explicitly casting types with instanceOf:
>> https://openjdk.org/jeps/395. I want your thoughts on JDK requirements
>> for the main Cassandra repository, Accord, and Sidecar.
>> >>
>> >> Much appreciated.
>> >> Thanks,
>> >> Vivekanand K.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>>
>>
>>

Reply via email to