That gets my vote as well Joan Haddad

On 17/3/25 8:19, Jon Haddad wrote:
It should be possible to modularize the code without breaking it into separate repos.  I don't know how to do it with ant, but with Gradle [1] you use subprojects.  I think Maven calls them multi module projects [2].  They're just folders in the same repo that are treated as dependencies. For example, we could have, at a top level:

accord
interfaces
server
sstable-reader

We'd publish all 4 artifacts as separate dependencies, but we can consume them internally without publishing.

At least with Gradle, you can get incremental builds, so you don't have to recompile the subprojects if they don't change.

Jon

[1] https://docs.gradle.org/current/userguide/multi_project_builds.html
[2] https://maven.apache.org/guides/mini/guide-multiple-modules.html


On Mon, Mar 17, 2025 at 7:15 AM Benedict <bened...@apache.org> wrote:

    I can only speak for myself, but the overhead of managing the
    accord submodule has been fairly low. It does mean opening two PRs
    when a change touches both projects (which is often the case for
    accord), but I think for utility classes this would be infrequent
    anyway.

    I think any modules within Cassandra proper (including eg APIs)
    should live in the same repository though, since they are already
    necessarily coupled.

    On 17 Mar 2025, at 03:08, Dinesh Joshi <djo...@apache.org> wrote:

    
    Definitely supportive of modularizing code but from a developer
    productivity standpoint we should discuss the overhead of
    managing changes across multiple repos.

    On Sun, Mar 16, 2025 at 4:26 AM Benedict Elliott Smith
    <bened...@apache.org> wrote:

        I want to break out at least one or two shared library
        projects. Both accord and in-jvm-dtest-api should share code
        with the Cassandra main project, particularly
        executors/futures/collections/concurrency utilities. This is
        something that has caused me some recurring friction over the
        past few years, so if there’s appetite I may try to pursue it
        in the near future.

        I also like the idea of defining our public APIs in a
        separate jar/folder/source tree. This helpfully also solves
        the never-ending discussion topic of how we define what our
        public APIs are. I don’t have any cycles for this, but I
        doubt it would be controversial.

        I am less sure about how we might go about breaking up the
        internals of Cassandra itself, but the accord project is
        perhaps a step in this direction.

        That all said, plugin dependencies are a much easier problem
        than this. We don’t need to run the plugins on their own
        threads; they just need their own class loader - which is
        anyway probably a good idea. We can perhaps even reuse the
        logic we already have for loading UDFs, but relax some of the
        restrictions.


        On 6 Mar 2025, at 21:27, Josh McKenzie
        <jmcken...@apache.org> wrote:

        I've gotten the impression that there's not a lot of
        enthusiasm for breaking apart the main Cassandra module,
        but I have wondered if it'd be worth making an exception
        for the interfaces plugins are supposed to code against
        Oh, there's /plenty/of enthusiasm. There's been a shortage
        of consensus however. /For now. /:D

        I think breaking out the interfaces first makes a lot of
        sense as that'd allow us to focus almost purely on build
        dependency and environmental factors w/out having to reason
        through implementation code movements and encapsulation
        breakage. I believe there's folks working on exploring the
        current build system through the lens of requirements to
        break out shared deps; I'll see if I can't rustle them up.

        On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:

        Splitting this out from the CEP-36 thread.

        I agree: dependency collisions at run-time are a problem.
        It's made even worse by the possibility of users using
        multiple plugins (authn, authz, compression, storage, etc.).

        It also cuts two ways. E.g. the interfaces that plugin
        authenticators need to implement are defined in
        org.apache.cassandra.auth, so as far as I know the plugin
        has to take a build-time dependency on the main Cassandra
        module itself, and pull in all of its dependencies. (I'd
        love to be told that I'm mistaken.) In addition to the risk
        of version conflicts, it increases the risk of a change to
        Cassandra's own dependencies inadvertently breaking a
        plugin that's taken a transitive dependency. Might be bad
        form on the plugin's part, but certainly possible.

        I've gotten the impression that there's not a lot of
        enthusiasm for breaking apart the main Cassandra module,
        but I have wondered if it'd be worth making an exception
        for the interfaces plugins are supposed to code against.
        It'd be nice to depend on those without pulling in the rest
        of the project, and it'd be another step towards reducing
        the risk of plugins breaking because of dependency changes
        in the main project.

        -- Joel.

        On 3/6/2025 10:52 AM, Jon Haddad wrote:
        Hey Joel, thanks for chiming in!

        Regarding dependencies - while it's possible to provide
        pluggable interfaces, the issue I'm concerned about is
        conflicting versions of transitive dependencies at
        runtime.  For example, I used a java agent that had a
        different version of snakeyaml, and it ended up breaking
        C*'s startup sequence [1].  I suggest putting external
        modules on separate threads with their own classpath to
        avoid this issue.

        I think there's quite a bit of overlap between the two
        desires expressed in this thread, even though they achieve
        very different results.  I personally can't see myself
        using something that treats an object store as cold
        storage where SSTables are moved (implying they weren't
        there before), and I've expressed my concerns with this,
        but other folks seem to want it and that's OK.  I feel
        very strongly that treating local storage as a cache with
        the full dataset on object store is a better approach, but
        ultimately different people have different priorities.
        Either way, stuff is moved to object store at some point,
        and pulled to the local disk on demand.

        I am *firmly* of the position that this CEP should not
        exclude the local storage as cache option, and should be
        accounted for in the design.

        Jon

        [1]https://issues.apache.org/jira/browse/CASSANDRA-19663


        On Thu, Mar 6, 2025 at 10:31 AM Joel Shepherd
        <sheph...@amazon.com> wrote:

            On 3/6/2025 7:16 AM, Jon Haddad wrote:
            Assuming everything else is identical, might not
            matter for S3. However, not every object store has a
            filesystem mount.

            Regarding sprawling dependencies, we can always make
            the provider specific libraries available as a
            separate download and put them on their own thread
            with a separate class path. I think in JVM dtest does
            this already. Someone just started asking about IAM
            for login, it sounds like a similar problem.

            That was me. :-) Cassandra's auth already has fairly
            well defined interfaces and a plug-in mechanism, so
            it's easy to vend alternative auth solutions without
            polluting the main project's dependency graph, at
            build-time anyway. A similar approach could be
            beneficial for CEP-36, particularly (IMO) for
            cold-storage purposes. I suspect decoupling pluggable
            alternate channel proxies for cold storage from
            configurable alternate channel proxies for redirecting
            data locally to free up space, migrate to a different
            storage device, etc., would make both easier. The CEP
            seems to be trying to do both, but they smell like
            pretty different goals to me.

            Thanks -- Joel.


            On Thu, Mar 6, 2025 at 12:53 AM Benedict
            <bened...@apache.org> wrote:

                I think another way of saying what Stefan may be
                getting at is what does a library give us that an
                appropriately configured mount dir doesn’t?

                We don’t want to treat S3 the same as local disk,
                but this can be achieved easily with config. Is
                there some other benefit of direct integration?
                Well defined exceptions if we need to distinguish
                cases is one that maybe springs to mind but
                perhaps there are others?


                On 6 Mar 2025, at 08:39, Štefan Miklošovič
                <smikloso...@apache.org> wrote:
                
                That is cool but this still does not show /
                explain how it would look like when it comes to
                dependencies needed for actually talking to
                storages like s3.

                Maybe I am missing something here and please
                explain when I am mistaken but If I understand
                that correctly, for talking to s3 we would need
                to use a library like this, right? (1). So that
                would be added among Cassandra dependencies?
                Hence Cassandra starts to be biased against s3?
                Why s3? Every time somebody comes up with a new
                remote storage support, that would be added to
                classpath as well? How are these dependencies
                going to play with each other and with Cassandra
                in general? Will all these storage
                provider libraries for arbitrary clouds be even
                compatible with Cassandra licence-wise?

                I am sorry I keep repeating these questions but
                this part of that I just don't get at all.

                We can indeed add an API for this, sure sure,
                why not. But for people who do not want to deal
                with this at all and just be OK with a FS
                mounted, why would we block them doing that?

                (1)
                
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml

                On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever
                <m...@apache.org> wrote:

                       .


                        It’s not an area where I can currently
                        dedicate engineering effort. But if
                        others are interested in contributing a
                        feature like this, I’d see it as
                        valuable for the project and would be
                        happy to collaborate on
                        design/architecture/goals.



                    Jake mentioned 17 months ago a custom
                    FileSystemProvider we could offer.

                    None of us at DataStax has gotten around to
                    providing that, but to quickly throw
                    something over the wall this is it:
                    
https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java

                      (with a few friend classes under
                    o.a.c.io.util)

                    We then have a RemoteStorageProvider,
                    private in another repo, that implements
                    that and also provides the
                    RemoteFileSystemProvider that Jake refers to.
                    Hopefully that's a start to get people
                    thinking about CEP level details, while we
                    get a cleaned abstract of
                    RemoteStorageProvider and friends to offer.

Reply via email to