Re: [DISCUSS] Plugins and dependencies

2025-03-17 Thread Jon Haddad
It should be possible to modularize the code without breaking it into
separate repos.  I don't know how to do it with ant, but with Gradle [1]
you use subprojects.  I think Maven calls them multi module projects [2].
They're just folders in the same repo that are treated as dependencies.
For example, we could have, at a top level:

accord
interfaces
server
sstable-reader

We'd publish all 4 artifacts as separate dependencies, but we can consume
them internally without publishing.

At least with Gradle, you can get incremental builds, so you don't have to
recompile the subprojects if they don't change.

Jon

[1] https://docs.gradle.org/current/userguide/multi_project_builds.html
[2] https://maven.apache.org/guides/mini/guide-multiple-modules.html


On Mon, Mar 17, 2025 at 7:15 AM Benedict  wrote:

> I can only speak for myself, but the overhead of managing the accord
> submodule has been fairly low. It does mean opening two PRs when a change
> touches both projects (which is often the case for accord), but I think for
> utility classes this would be infrequent anyway.
>
> I think any modules within Cassandra proper (including eg APIs) should
> live in the same repository though, since they are already necessarily
> coupled.
>
> On 17 Mar 2025, at 03:08, Dinesh Joshi  wrote:
>
> 
> Definitely supportive of modularizing code but from a developer
> productivity standpoint we should discuss the overhead of managing changes
> across multiple repos.
>
> On Sun, Mar 16, 2025 at 4:26 AM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> I want to break out at least one or two shared library projects. Both
>> accord and in-jvm-dtest-api should share code with the Cassandra main
>> project, particularly executors/futures/collections/concurrency utilities.
>> This is something that has caused me some recurring friction over the past
>> few years, so if there’s appetite I may try to pursue it in the near future.
>>
>> I also like the idea of defining our public APIs in a separate
>> jar/folder/source tree. This helpfully also solves the never-ending
>> discussion topic of how we define what our public APIs are. I don’t have
>> any cycles for this, but I doubt it would be controversial.
>>
>> I am less sure about how we might go about breaking up the internals of
>> Cassandra itself, but the accord project is perhaps a step in this
>> direction.
>>
>> That all said, plugin dependencies are a much easier problem than this.
>> We don’t need to run the plugins on their own threads; they just need their
>> own class loader - which is anyway probably a good idea. We can perhaps
>> even reuse the logic we already have for loading UDFs, but relax some of
>> the restrictions.
>>
>>
>> On 6 Mar 2025, at 21:27, Josh McKenzie  wrote:
>>
>> I've gotten the impression that there's not a lot of enthusiasm for
>> breaking apart the main Cassandra module, but I have wondered if it'd be
>> worth making an exception for the interfaces plugins are supposed to code
>> against
>>
>> Oh, there's *plenty* of enthusiasm. There's been a shortage of consensus
>> however. *For now. *:D
>>
>> I think breaking out the interfaces first makes a lot of sense as that'd
>> allow us to focus almost purely on build dependency and environmental
>> factors w/out having to reason through implementation code movements and
>> encapsulation breakage. I believe there's folks working on exploring the
>> current build system through the lens of requirements to break out shared
>> deps; I'll see if I can't rustle them up.
>>
>> On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:
>>
>> Splitting this out from the CEP-36 thread.
>>
>> I agree: dependency collisions at run-time are a problem. It's made even
>> worse by the possibility of users using multiple plugins (authn, authz,
>> compression, storage, etc.).
>>
>> It also cuts two ways. E.g. the interfaces that plugin authenticators
>> need to implement are defined in org.apache.cassandra.auth, so as far as I
>> know the plugin has to take a build-time dependency on the main Cassandra
>> module itself, and pull in all of its dependencies. (I'd love to be told
>> that I'm mistaken.) In addition to the risk of version conflicts, it
>> increases the risk of a change to Cassandra's own dependencies
>> inadvertently breaking a plugin that's taken a transitive dependency. Might
>> be bad form on the plugin's part, but certainly possible.
>>
>> I've gotten the impression that there's not a lot of enthusiasm for
>> breaking apart the main Cassandra module, but I have wondered if it'd be
>> worth making an exception for the interfaces plugins are supposed to code
>> against. It'd be nice to depend on those without pulling in the rest of the
>> project, and it'd be another step towards reducing the risk of plugins
>> breaking because of dependency changes in the main project.
>>
>> -- Joel.
>> On 3/6/2025 10:52 AM, Jon Haddad wrote:
>>
>> Hey Joel, thanks for chiming in!
>>
>> Regarding dependencies - while it's

Re: [DISCUSS] Plugins and dependencies

2025-03-17 Thread Berenguer Blasi

That gets my vote as well Joan Haddad

On 17/3/25 8:19, Jon Haddad wrote:
It should be possible to modularize the code without breaking it into 
separate repos.  I don't know how to do it with ant, but with Gradle 
[1] you use subprojects.  I think Maven calls them multi module 
projects [2].  They're just folders in the same repo that are treated 
as dependencies. For example, we could have, at a top level:


accord
interfaces
server
sstable-reader

We'd publish all 4 artifacts as separate dependencies, but we can 
consume them internally without publishing.


At least with Gradle, you can get incremental builds, so you don't 
have to recompile the subprojects if they don't change.


Jon

[1] https://docs.gradle.org/current/userguide/multi_project_builds.html
[2] https://maven.apache.org/guides/mini/guide-multiple-modules.html


On Mon, Mar 17, 2025 at 7:15 AM Benedict  wrote:

I can only speak for myself, but the overhead of managing the
accord submodule has been fairly low. It does mean opening two PRs
when a change touches both projects (which is often the case for
accord), but I think for utility classes this would be infrequent
anyway.

I think any modules within Cassandra proper (including eg APIs)
should live in the same repository though, since they are already
necessarily coupled.


On 17 Mar 2025, at 03:08, Dinesh Joshi  wrote:


Definitely supportive of modularizing code but from a developer
productivity standpoint we should discuss the overhead of
managing changes across multiple repos.

On Sun, Mar 16, 2025 at 4:26 AM Benedict Elliott Smith
 wrote:

I want to break out at least one or two shared library
projects. Both accord and in-jvm-dtest-api should share code
with the Cassandra main project, particularly
executors/futures/collections/concurrency utilities. This is
something that has caused me some recurring friction over the
past few years, so if there’s appetite I may try to pursue it
in the near future.

I also like the idea of defining our public APIs in a
separate jar/folder/source tree. This helpfully also solves
the never-ending discussion topic of how we define what our
public APIs are. I don’t have any cycles for this, but I
doubt it would be controversial.

I am less sure about how we might go about breaking up the
internals of Cassandra itself, but the accord project is
perhaps a step in this direction.

That all said, plugin dependencies are a much easier problem
than this. We don’t need to run the plugins on their own
threads; they just need their own class loader - which is
anyway probably a good idea. We can perhaps even reuse the
logic we already have for loading UDFs, but relax some of the
restrictions.



On 6 Mar 2025, at 21:27, Josh McKenzie
 wrote:


I've gotten the impression that there's not a lot of
enthusiasm for breaking apart the main Cassandra module,
but I have wondered if it'd be worth making an exception
for the interfaces plugins are supposed to code against

Oh, there's /plenty/of enthusiasm. There's been a shortage
of consensus however. /For now. /:D

I think breaking out the interfaces first makes a lot of
sense as that'd allow us to focus almost purely on build
dependency and environmental factors w/out having to reason
through implementation code movements and encapsulation
breakage. I believe there's folks working on exploring the
current build system through the lens of requirements to
break out shared deps; I'll see if I can't rustle them up.

On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:


Splitting this out from the CEP-36 thread.

I agree: dependency collisions at run-time are a problem.
It's made even worse by the possibility of users using
multiple plugins (authn, authz, compression, storage, etc.).

It also cuts two ways. E.g. the interfaces that plugin
authenticators need to implement are defined in
org.apache.cassandra.auth, so as far as I know the plugin
has to take a build-time dependency on the main Cassandra
module itself, and pull in all of its dependencies. (I'd
love to be told that I'm mistaken.) In addition to the risk
of version conflicts, it increases the risk of a change to
Cassandra's own dependencies inadvertently breaking a
plugin that's taken a transitive dependency. Might be bad
form on the plugin's part, but certainly possible.

I've gotten the impression that there's not a lot of
enthusiasm for breaking apart the main Cassandra module,
but I have wondered if it'd be worth making an exception
for the interfaces plugins are su

Re: [DISCUSS] Plugins and dependencies

2025-03-17 Thread Benedict
Yes this is what I’m suggesting - except we already have a separate repository for accord and several other systems (like dtest api, sidecar etc), and for these it makes sense to have a separate repository for shared functionality.On 17 Mar 2025, at 07:20, Jon Haddad  wrote:It should be possible to modularize the code without breaking it 
into separate repos.  I don't know how to do it with ant, but with 
Gradle [1] you use subprojects.  I think Maven calls them multi module projects [2].  They're just folders in the 
same repo that are treated as dependencies.  For example, we could have,
 at a top level:accordinterfacesserversstable-readerWe'd publish all 4 artifacts as separate dependencies, but we can consume them internally without publishing.At least with Gradle, you can get incremental builds, so you don't have to recompile the subprojects if they don't change.  Jon[1] https://docs.gradle.org/current/userguide/multi_project_builds.html[2] https://maven.apache.org/guides/mini/guide-multiple-modules.htmlOn Mon, Mar 17, 2025 at 7:15 AM Benedict  wrote:I can only speak for myself, but the overhead of managing the accord submodule has been fairly low. It does mean opening two PRs when a change touches both projects (which is often the case for accord), but I think for utility classes this would be infrequent anyway.I think any modules within Cassandra proper (including eg APIs) should live in the same repository though, since they are already necessarily coupled.On 17 Mar 2025, at 03:08, Dinesh Joshi  wrote:Definitely supportive of modularizing code but from a developer productivity standpoint we should discuss the overhead of managing changes across multiple repos.On Sun, Mar 16, 2025 at 4:26 AM Benedict Elliott Smith  wrote:I want to break out at least one or two shared library projects. Both accord and in-jvm-dtest-api should share code with the Cassandra main project, particularly executors/futures/collections/concurrency utilities. This is something that has caused me some recurring friction over the past few years, so if there’s appetite I may try to pursue it in the near future.I also like the idea of defining our public APIs in a separate jar/folder/source tree. This helpfully also solves the never-ending discussion topic of how we define what our public APIs are. I don’t have any cycles for this, but I doubt it would be controversial.I am less sure about how we might go about breaking up the internals of Cassandra itself, but the accord project is perhaps a step in this direction.That all said, plugin dependencies are a much easier problem than this. We don’t need to run the plugins on their own threads; they just need their own class loader - which is anyway probably a good idea. We can perhaps even reuse the logic we already have for loading UDFs, but relax some of the restrictions.On 6 Mar 2025, at 21:27, Josh McKenzie  wrote:I've gotten the impression that there's not a lot of enthusiasm for breaking apart the main Cassandra module, but I have wondered if it'd be worth making an exception for the interfaces plugins are supposed to code againstOh, there's plenty of enthusiasm. There's been a shortage of consensus however. For now. :DI think breaking out the interfaces first makes a lot of sense as that'd allow us to focus almost purely on build dependency and environmental factors w/out having to reason through implementation code movements and encapsulation breakage. I believe there's folks working on exploring the current build system through the lens of requirements to break out shared deps; I'll see if I can't rustle them up.On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:Splitting this out from the CEP-36 thread.I agree: dependency collisions at run-time are a problem. It's made even worse by the possibility of users using multiple plugins (authn, authz, compression, storage, etc.).It also cuts two ways. E.g. the interfaces that plugin authenticators need to implement are defined in org.apache.cassandra.auth, so as far as I know the plugin has to take a build-time dependency on the main Cassandra module itself, and pull in all of its dependencies. (I'd love to be told that I'm mistaken.) In addition to the risk of version conflicts, it increases the risk of a change to Cassandra's own dependencies inadvertently breaking a plugin that's taken a transitive dependency. Might be bad form on the plugin's part, but certainly possible.I've gotten the impression that there's not a lot of enthusiasm for breaking apart the main Cassandra module, but I have wondered if it'd be worth making an exception for the interfaces plugins are supposed to code against. It'd be nice to depend on those without pulling in the rest of the project, and it'd be another step towards reducing the risk of plugins breaking because of dependency changes in the main project.-- Joel.On 3/6/2025 10:52 AM, Jon Haddad wrote:Hey Joel, th

Re: [DISCUSS] Plugins and dependencies

2025-03-17 Thread Benedict
I can only speak for myself, but the overhead of managing the accord submodule has been fairly low. It does mean opening two PRs when a change touches both projects (which is often the case for accord), but I think for utility classes this would be infrequent anyway.I think any modules within Cassandra proper (including eg APIs) should live in the same repository though, since they are already necessarily coupled.On 17 Mar 2025, at 03:08, Dinesh Joshi  wrote:Definitely supportive of modularizing code but from a developer productivity standpoint we should discuss the overhead of managing changes across multiple repos.On Sun, Mar 16, 2025 at 4:26 AM Benedict Elliott Smith  wrote:I want to break out at least one or two shared library projects. Both accord and in-jvm-dtest-api should share code with the Cassandra main project, particularly executors/futures/collections/concurrency utilities. This is something that has caused me some recurring friction over the past few years, so if there’s appetite I may try to pursue it in the near future.I also like the idea of defining our public APIs in a separate jar/folder/source tree. This helpfully also solves the never-ending discussion topic of how we define what our public APIs are. I don’t have any cycles for this, but I doubt it would be controversial.I am less sure about how we might go about breaking up the internals of Cassandra itself, but the accord project is perhaps a step in this direction.That all said, plugin dependencies are a much easier problem than this. We don’t need to run the plugins on their own threads; they just need their own class loader - which is anyway probably a good idea. We can perhaps even reuse the logic we already have for loading UDFs, but relax some of the restrictions.On 6 Mar 2025, at 21:27, Josh McKenzie  wrote:I've gotten the impression that there's not a lot of enthusiasm for breaking apart the main Cassandra module, but I have wondered if it'd be worth making an exception for the interfaces plugins are supposed to code againstOh, there's plenty of enthusiasm. There's been a shortage of consensus however. For now. :DI think breaking out the interfaces first makes a lot of sense as that'd allow us to focus almost purely on build dependency and environmental factors w/out having to reason through implementation code movements and encapsulation breakage. I believe there's folks working on exploring the current build system through the lens of requirements to break out shared deps; I'll see if I can't rustle them up.On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:Splitting this out from the CEP-36 thread.I agree: dependency collisions at run-time are a problem. It's made even worse by the possibility of users using multiple plugins (authn, authz, compression, storage, etc.).It also cuts two ways. E.g. the interfaces that plugin authenticators need to implement are defined in org.apache.cassandra.auth, so as far as I know the plugin has to take a build-time dependency on the main Cassandra module itself, and pull in all of its dependencies. (I'd love to be told that I'm mistaken.) In addition to the risk of version conflicts, it increases the risk of a change to Cassandra's own dependencies inadvertently breaking a plugin that's taken a transitive dependency. Might be bad form on the plugin's part, but certainly possible.I've gotten the impression that there's not a lot of enthusiasm for breaking apart the main Cassandra module, but I have wondered if it'd be worth making an exception for the interfaces plugins are supposed to code against. It'd be nice to depend on those without pulling in the rest of the project, and it'd be another step towards reducing the risk of plugins breaking because of dependency changes in the main project.-- Joel.On 3/6/2025 10:52 AM, Jon Haddad wrote:Hey Joel, thanks for chiming in!Regarding dependencies - while it's possible to provide pluggable interfaces, the issue I'm concerned about is conflicting versions of transitive dependencies at runtime.  For example, I used a java agent that had a different version of snakeyaml, and it ended up breaking C*'s startup sequence [1].  I suggest putting external modules on separate threads with their own classpath to avoid this issue. I think there's quite a bit of overlap between the two desires expressed in this thread, even though they achieve very different results.  I personally can't see myself using something that treats an object store as cold storage where SSTables are moved (implying they weren't there before), and I've expressed my concerns with this, but other folks seem to want it and that's OK.  I feel very strongly that treating local storage as a cache with the full dataset on object store is a better approach, but ultimately different people have different priorities.  Either way, stuff is moved to object store at some point, and pulled to the local disk on demand. I am *firmly* of the position t