Which cassandra client for Python should we use (in the context of Python 3.12) ?
Hello dear Cassandra community, I am a fellow PMC member of Apache Airflow and recently we started to look at the Cassandra provider of ours in the context of Python 3.12 migration and the integration raised my interest. TL;DR; I am quite confused, which client should we use to be future-proof and I would appreciate the advice of the community on it, also I would like to understand why there is no community-managed client, as seems that with the current approach, any Python project (including ASF ones are pretty much forced to use 3rd-party managed way to use Cassandra, which I find rather strange. Context: So far in Apache Airflow we were using https://github.com/datastax/python-driver/ to connect to Cassandra, but when we worked on Python 3.12 compatibility. While looking at it, I discovered something strange This driver is published on Pypi as "Cassandra driver" [1] which raises a bit of a question about trademark - I was so far convinced this driver is managed by the Cassandra community, but at a closer inspection it turned out that it is - in fact - Datastax driver. I find it pretty confusing to be honest, and with all the debate about ASF trademarks, this should IMHO raise a few eyebrows and PMC reaction - if you ask me. As a PMC of Apache Airflow I am responsible to raise trademark issues if I see them and that one seems to be at odds with the ASF rules. And if I am confused by the PyPI naming, then I am pretty sure zany of the users are as well. Note that I am not attacking anyone with that, I just noticed that this should likely be handled by the PMC somehow (or that would be my advise at least as a fellow ASF member and PMC member of a friendly ASF project) But that's a bit tangential to the problem. Coming back to the main problem. I did quite some research and it turned out that the driver still uses the default asyncore stdlib (which is removed in Python 3.12) and even if theoretically we could use libev reactor, it does not work out of the box with the .whl released even if proper libraries are installed - you really have to take an sdist and build the package with gcc configured and libev4/libev-devel installed. Another option is to use the asyncio reactor [2] as far as I understand - but as I understand from the issue [3] - this support is still experimental and it''s not ready for prime time. This is all captured in the PR [4] where I work on Python 3.12 compatibility and Cassandra is - literally - the last remaining provider that we have to make a decision on what to do. That makes it rather useless fpr us - because we would not only complicate our testing / tooling setup (we have ~90 providers and pretty complicated system to manage dependencies already) and also it would make our users who would want to use Python 3.12 require to the same, which is quite a blocker. And handling user issues in this case would become rather tiring. In the same PR Israel Fruchter - who helped us with the Cassandra issue and suggested that another option is to use the Scylladb driver - that is 100% compatible and published and released by Scylla [5]. I tested it and the .whl packages nicely work with libev installed - as expected (and initially Israel thought the datastax driver will work similarly). From Israel's explanation Datastax and Scylla are cooperating on the driver (in fact Scylla one is a fork of the Datastax one) but there is no insight who and how builds the packages (which also raised my eyebrow because it seems that - unlike in ASF, the process of building and releasing the package is not transparent and verifiable). Now - we have two choices: 1) We can use "cassandra-driver" (which really is a "datastax driver") and disable Cassandra provider for the users of Airflow for Python 3.12 until Datastax fixes the compatibility with Python 3.12 2) W can switch to Scylla driver and release next provider with Python 3.12 support So ... Providing all the context I have two questions: Q1: What would be the recommended solution by the community here. I understand the community has no impact on Datastax decisions and effort on releasing those drivers, so you can at most ask Datastax to fix the compatibility issue. As a user I have no insight on what relations are between the Cassandra community, Datastax and Scylla, so I am reaching here as the place to advise me on which option is best. (This I am asking as a confused user) Q2: I find it pretty worrying that such an important interface (data world is driven by Python) is not under the community "umbrella" - seems that a very important thing for the users of Cassandra is managed and controlled by a 3rd-parties, and the users (as it is in this case) are pretty much left on the "mercy" (for the lack of better word) of the 3rd-parties - those are the parties that decide on whether Python 3.12 users are able to use Cassandra. If I had such a situation in Airflow, I would be deeply worried in the PMC. Also what adds to that is the potent
Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?
Ah. And also to add - I created this issue in datastack asking to add libev support to the compiled .whl package they release: [6] cassandra-driver for Python 3.12 Linux is compiled without libev support : https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues/PYTHON-1378 On Wed, Feb 21, 2024 at 10:26 AM Jarek Potiuk wrote: > Hello dear Cassandra community, > > I am a fellow PMC member of Apache Airflow and recently we started to look > at the Cassandra provider of ours in the context of Python 3.12 migration > and the integration raised my interest. > > TL;DR; I am quite confused, which client should we use to be future-proof > and I would appreciate the advice of the community on it, also I would like > to understand why there is no community-managed client, as seems that with > the current approach, any Python project (including ASF ones are pretty > much forced to use 3rd-party managed way to use Cassandra, which I find > rather strange. > > Context: > > So far in Apache Airflow we were using > https://github.com/datastax/python-driver/ to connect to Cassandra, but > when we worked on Python 3.12 compatibility. While looking at it, I > discovered something strange > > This driver is published on Pypi as "Cassandra driver" [1] which raises a > bit of a question about trademark - I was so far convinced this driver is > managed by the Cassandra community, but at a closer inspection it turned > out that it is - in fact - Datastax driver. I find it pretty confusing to > be honest, and with all the debate about ASF trademarks, this should IMHO > raise a few eyebrows and PMC reaction - if you ask me. As a PMC of Apache > Airflow I am responsible to raise trademark issues if I see them and that > one seems to be at odds with the ASF rules. And if I am confused by > the PyPI naming, then I am pretty sure zany of the users are as well. > > Note that I am not attacking anyone with that, I just noticed that this > should likely be handled by the PMC somehow (or that would be my advise at > least as a fellow ASF member and PMC member of a friendly ASF project) > > But that's a bit tangential to the problem. Coming back to the main > problem. > > I did quite some research and it turned out that the driver still uses the > default asyncore stdlib (which is removed in Python 3.12) and even if > theoretically we could use libev reactor, it does not work out of the box > with the .whl released even if proper libraries are installed - you really > have to take an sdist and build the package with gcc configured and > libev4/libev-devel installed. > > Another option is to use the asyncio reactor [2] as far as I understand - > but as I understand from the issue [3] - this support is still experimental > and it''s not ready for prime time. > > This is all captured in the PR [4] where I work on Python 3.12 > compatibility and Cassandra is - literally - the last remaining provider > that we have to make a decision on what to do. > > That makes it rather useless fpr us - because we would not only complicate > our testing / tooling setup (we have ~90 providers and pretty complicated > system to manage dependencies already) and also it would make our users who > would want to use Python 3.12 require to the same, which is quite a > blocker. And handling user issues in this case would become rather tiring. > > In the same PR Israel Fruchter - who helped us with the Cassandra issue > and suggested that another option is to use the Scylladb driver - that is > 100% compatible and published and released by Scylla [5]. I tested it and > the .whl packages nicely work with libev installed - as expected (and > initially Israel thought the datastax driver will work similarly). From > Israel's explanation Datastax and Scylla are cooperating on the driver (in > fact Scylla one is a fork of the Datastax one) but there is no insight who > and how builds the packages (which also raised my eyebrow because it seems > that - unlike in ASF, the process of building and releasing the package is > not transparent and verifiable). > > Now - we have two choices: > > 1) We can use "cassandra-driver" (which really is a "datastax driver") and > disable Cassandra provider for the users of Airflow for Python 3.12 until > Datastax fixes the compatibility with Python 3.12 > > 2) W can switch to Scylla driver and release next provider with Python > 3.12 support > > So ... Providing all the context I have two questions: > > Q1: What would be the recommended solution by the community here. I > understand the community has no impact on Datastax decisions and effort on > releasing those drivers, so you can at most ask Datastax to fix the &g
Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?
This is cool - thanks Jeff for this explanation, that helps us in making informed decisions. Really appreciate it! Very encouraging for the future :) - I think then, if the donation is on-going, choosing a cassandra-driver (which I understand will become ASF-owned) is definitely a preference for us. And no - we do not have to release it now. We can definitely wait - we can just exclude Python 3.12 support until the .whl has libev support (I hope my issue will be handled soon by Datastax :). Then we can re-enable Python 3.12 support and add instructions to our users to make sure libev is included on Python 3.12. So it does not block us now, and we have clear vision on the way forward. BTW. I looked at the links - they were mostly about Java Driver and mention Python Driver as the next logical step (agree) - is there anything happening currently with it ? There is a doc link that I have no access to, but would be great to know when it might happen? I am just eager to see it happen. J. On Wed, Feb 21, 2024 at 12:53 PM Jeff Jirsa wrote: > > > On 2024/02/21 09:26:53 Jarek Potiuk wrote: > > Hello dear Cassandra community, > > > > I am a fellow PMC member of Apache Airflow and recently we started to > look > > at the Cassandra provider of ours in the context of Python 3.12 migration > > and the integration raised my interest. > > > > TL;DR; I am quite confused, which client should we use to be future-proof > > and I would appreciate the advice of the community on it, also I would > like > > to understand why there is no community-managed client, as seems that > with > > the current approach, any Python project (including ASF ones are pretty > > much forced to use 3rd-party managed way to use Cassandra, which I find > > rather strange. > > > > Context: > > > > So far in Apache Airflow we were using > > https://github.com/datastax/python-driver/ to connect to Cassandra, but > > when we worked on Python 3.12 compatibility. While looking at it, I > > discovered something strange > > > > Mid-donated to the foundation: > > CEP: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation > > [Private@]: > https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0 > > Status in board report: > https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt > > The Scylla version is a fork WITH ADDITIONS that work with implementation > details of Scylladb not present in Apache Cassandra. > > Preference use "Datastax" driver under donation if at all possible, and > get it fixed as rapidly as is practical, but given that Scylla has already > fixed the issue in theirs and it's an apache licensed fork of the same > code, if you have to ship something to remain functional, that seems like a > reasonable fallback. > > > > > >
Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?
That all sounds great! Thanks for all the information Bret. On Wed, Feb 21, 2024 at 8:57 PM Bret McGuire wrote: >To add some additional information to what's already on this thread: > PYTHON-1378 is actively being looked into. An initial look has suggested a > likely cause; it's very likely this was an oversight stemming from the move > to cibuildwheel. Assuming I can confirm that a fix will then be provided. > All of that work will be managed on PYTHON-1378. > >Regarding the question about asyncore vs. asyncio: as Jarek correctly > pointed out we have PYTHON-1375 to represent the work of moving to > asyncio. I'll also mention that we've begun defining what will be included > in the next Python driver release. Let's call it 3.30.0, although (as > always) that's subject to change. This release is currently slated to > include three major changes: > >* Stabilize asyncio reactor and make it the default (PYTHON-1375 > <https://datastax-oss.atlassian.net/browse/PYTHON-1375>) >* Officially get off nose and move to pytest (PYTHON-1297 > <https://datastax-oss.atlassian.net/browse/PYTHON-1297>) >* Extend vector support to variable length types (PYTHON-1369 > <https://datastax-oss.atlassian.net/browse/PYTHON-1369>) > >As mentioned above everything is subject to change but as of this > writing the current plan is that PYTHON-1375 will be included in the next > release. This can be tracked via the "Fix version" on the various tickets > above (yup, we already have a 3.30.0 release in JIRA). You can also follow > along on the Python driver mailing list > <https://groups.google.com/a/lists.datastax.com/g/python-driver-user>; > I'll likely be starting a more detailed discussion on some of these points > there soon. > >Thanks! > > - Bret - > > > On Wed, Feb 21, 2024 at 7:58 AM Jarek Potiuk wrote: > >> This is cool - thanks Jeff for this explanation, that helps us in making >> informed decisions. Really appreciate it! >> >> Very encouraging for the future :) - I think then, if the donation is >> on-going, choosing a cassandra-driver (which I understand will become >> ASF-owned) is definitely a preference for us. >> >> And no - we do not have to release it now. We can definitely wait - we >> can just exclude Python 3.12 support until the .whl has libev support (I >> hope my issue will be handled soon by Datastax :). Then we can re-enable >> Python 3.12 support and add instructions to our users to make sure libev is >> included on Python 3.12. So it does not block us now, and we have >> clear vision on the way forward. >> >> BTW. I looked at the links - they were mostly about Java Driver and >> mention Python Driver as the next logical step (agree) - is there anything >> happening currently with it ? There is a doc link that I have no access to, >> but would be great to know when it might happen? I am just eager to see it >> happen. >> >> J. >> >> On Wed, Feb 21, 2024 at 12:53 PM Jeff Jirsa wrote: >> >>> >>> >>> On 2024/02/21 09:26:53 Jarek Potiuk wrote: >>> > Hello dear Cassandra community, >>> > >>> > I am a fellow PMC member of Apache Airflow and recently we started to >>> look >>> > at the Cassandra provider of ours in the context of Python 3.12 >>> migration >>> > and the integration raised my interest. >>> > >>> > TL;DR; I am quite confused, which client should we use to be >>> future-proof >>> > and I would appreciate the advice of the community on it, also I would >>> like >>> > to understand why there is no community-managed client, as seems that >>> with >>> > the current approach, any Python project (including ASF ones are pretty >>> > much forced to use 3rd-party managed way to use Cassandra, which I find >>> > rather strange. >>> > >>> > Context: >>> > >>> > So far in Apache Airflow we were using >>> > https://github.com/datastax/python-driver/ to connect to Cassandra, >>> but >>> > when we worked on Python 3.12 compatibility. While looking at it, I >>> > discovered something strange >>> > >>> >>> Mid-donated to the foundation: >>> >>> CEP: >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation >>> >>> [Private@]: >>> https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0 >>> >>> Status in board report: >>> https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt >>> >>> The Scylla version is a fork WITH ADDITIONS that work with >>> implementation details of Scylladb not present in Apache Cassandra. >>> >>> Preference use "Datastax" driver under donation if at all possible, and >>> get it fixed as rapidly as is practical, but given that Scylla has already >>> fixed the issue in theirs and it's an apache licensed fork of the same >>> code, if you have to ship something to remain functional, that seems like a >>> reasonable fallback. >>> >>> >>> >>> >>> >>>