Sounds like a reasonable plan to me. +1
For the tests, maybe we can have two test class paths for a while? One for driver 3 and one for driver 4? That way we don’t need to migrate them all in a giant big bang patch? They could be moved over a few at a time making review much easier. On Feb 12, 2025, at 6:35 PM, Jon Haddad <j...@rustyrazorblade.com> wrote:
Hey Andy,
This seems like a reasonable proposal.
We can probably skip cassandra-stress, since it looks like easy-cass-stress can be donated. That does need a driver upgrade to support a vector workload, but imo there's no point in investing more in cassandra-stress when we have an alternative with more features available. Not a hill I'm going to die on, just an opportunity to do less work.
Jon
Hi All,
I'd like to propose decoupling the java driver as a dependency from the core
Cassandra server code.
I also want to propose a path towards eventually migrating test and tools code
from Apache Cassandra java driver 3.x to 4.x when the time is right for the
project.
Refactoring test code to 4.x is likely to be quite invasive, as I count
128 source files utilizing driver code. We'd want to find a good time to do
this to minimize disruption to ongoing development.
Java driver 4.x is effectively a rewrite of the 3.x driver. Its first release
was in March of 2019. While it has similar APIs, it is not binary compatible
with the 3.x driver [1].
While there hasn't been a clear decision on how the 3.x driver will be
supported going forward (although we should consider discussing this!), we
expect and have seen active development take place mostly exclusively
on the 4.x driver.
It would be useful to migrate to the 4.x driver to test new and future features
of which the 4.x driver will actively support. For example, the 4.x driver
supports Vector types, where the 3.x driver does not.
I've iterated the codebase and identified the following uses of the driver:
0. Core code that uses the driver
* UntypedResultSet uses CodecUtils.fromUnsignedToSignedInt from the driver
which is just adding Integer.MIN_VALUE to an int so can easily be removed.
* PreparedStatementHelper is used only by dtest fuzz tests to validate
Prepared Statements. Can be moved to test code.
* ThreadAwareSecurityManager.checkPermission makes reference to skipping
checking accessDeclaredMembers due to use of CodecUtils, can probably remove
that with its use removed.
* sstableloader uses the driver to fetch schema and metadata
1. Tools that use the driver
* fqltool replay (replaying queries from captured logs)
* cassandra-stress (making queries to generate load)
2. Test code
* Understandably, quite a bit of test code uses the driver. This is where I
anticipate the most work would be be needed.
I'd like to propose doing the following:
Can be done now:
* Move sstableloader source into its own tools directly, much like fqltool
and cassandra-stress. For compatibility, we could retain the existing shell
script entry point (bin/sstableloader).
* Update remaining core code to remove all use of the driver. As shown above,
there is not much to change here and this should be relatively easy to
accomplish.
* Update the build and scripts to establish separate classpaths for the server
and the respective tools. We would exclude the driver and its dependencies
(that aren't required otherwise) from the server. The driver would still be
included in the built package, so this wouldn't reduce the size of the
binary, but it would remove the driver from the server's classpath, which
would de-risk upgrading the driver and having it or its dependencies cause
possible runtime issues.
To be done next:
* Refactor sstableloader, fqltool and cassandra-stress to use the 4.x driver.
To be done when the timing works for the project:
* Refactor tests to use the 4.x driver.
Hopefully this proposed approach makes sense, I'd be eager to hear any
feedback or suggestions!
Thanks,
Andy
[1]: https://docs.datastax.com/en/developer/java-driver/4.17/upgrade_guide/index.html#4-0-0
|