Re: Cassandra Java Driver and DataStax
Hi All, Thanks for the replies so far. A few last questions: 1. Is Apache Cassandra useful *without* a driver? That is, can you use the database without a driver to connect to it or in the real world would your users all have to download at least one driver in order to use the DB? 2. To confirm again, at one point at least the Java driver code lived in the code-base, and further, at one point, people did submit some patches to add drivers, but the PMC didn’t want to maintain that code (and apparently they didn’t want to create any new PMC members and/or committers to do so) and so thus people started their own new projects? That right? Thanks, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/4/16, 11:59 AM, "Brandon Williams" wrote: >First off, full disclosure: contributor, committer, PMC member, and >finally, Datastax employee, in about that order chronologically. > >All of the drivers, as far as I know, are Apache licensed, just as is >Cassandra itself. There is no 'control', there is only momentum, since >anyone can fork the code if needed and then perhaps gain the momentum if >Datastax loses it. Nobody is locked in to anything, and no sufficient >traction has been found to take the momentum away from Datastax yet, >because at least in my own admittedly biased opinion, our drivers team has >done an excellent job of accepting community feedback and requests. > >tl;dr don't fix what is not broken > >On Fri, Jun 3, 2016 at 11:11 PM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> Thanks Jason for the information - I’m going to continue >> researching and hope more people will chime in that are on >> the PMC. >> >> Thank you. >> >> Cheers, >> Chris >> >> ++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++ >> Director, Information Retrieval and Data Science Group (IRDS) >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> WWW: http://irds.usc.edu/ >> ++ >> >> >> >> >> >> >> >> >> >> On 6/3/16, 8:33 PM, "Jason Brown" wrote: >> >> > >> > >> > >> >The client-server protocol is well defined in the Cassandra repo, so any >> one may implement a client library for any language. However, it is a far >> from trivial task, so not many folks build their own. Thus, already-built >> drivers tend to become the de facto >> > standard, but we (the Apache Cassandra committers/PMC) do not/have not >> blessed any vendor's driver(s) as official. >> > >> > >> >As to why there is not a canonical set of drivers in the Cassandra repo, >> well, we've just never gotten into that game as an OSS community. >> > >> > >> >-Jason (not affiliated with DataStax) >> > >> >On Friday, June 3, 2016, Johan Edstrom wrote: >> > >> > >> > >> >On Jun 3, 2016, at 9:14 PM, Jeff Jirsa > > wrote: >> > >> > >> >https://github.com/hector-client/hector >> > >> > >> > >> > >> > >> > >> >So - that isn’t doing CQL, Right? >> > >> > >> >https://github.com/Netflix/astyanax >> > >> > >> > >> > >> > >> > >> >Upgrading to CQL? >> > >> > >> > >> >http://doanduyhai.github.io/Achilles/ >> > >> > >> > >> > >> > >> > >> > >> >Which driver do you use? >> > >> > >> >https://github.com/noorq/casser >> > >> > >> > >> > >> > >> > >> >2.1.5 >> > >> > >> > >> > >> > >> >https://github.com/impetus-opensource/Kundera >> > >> > >> > >> > >> > >> > >> > >> >ds-driver >> > >> >false >> > >> > >> >cassandra-core >> >cassandra-ds-driver >> > >> > >> > >> >thrift >> > >> >false >> > >> > >> >cassandra-core >> >cassandra-thrift >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >https://github.com/deanhiller/playorm >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >- Jeff ( Not affiliated with datastax ) >> > >> > >> > >> > >> > >> >On 6/3/16, 7:58 PM, "Johan Edstrom" > > wrote: >> > >> > >> >How many Java drivers could you point out? >> >Doesn’t it strike you slightly off that you’d not have a driver for a
Re: Cassandra Java Driver and DataStax
Chris, We technically do have barebones java client in tree [1] CQL was designed as an open protocol anyone can implement [2] We really want to see a thriving eco-system for drivers. By making CQL an open protocol vs making it some internally controlled document/code we feel it's the best way to achieve this. I'll throw up as a random example [3]. -Jake [1]: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/SimpleClient.java [2]: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec [3]: https://github.com/matehat/cqerl On Sun, Jun 5, 2016 at 9:32 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi All, > > Thanks for the replies so far. A few last questions: > > 1. Is Apache Cassandra useful *without* a driver? That is, can > you use the database without a driver to connect to it or in the > real world would your users all have to download at least one > driver in order to use the DB? > > 2. To confirm again, at one point at least the Java driver code > lived in the code-base, and further, at one point, people did > submit some patches to add drivers, but the PMC didn’t want to > maintain that code (and apparently they didn’t want to create > any new PMC members and/or committers to do so) and so thus > people started their own new projects? That right? > > Thanks, > Chris > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Director, Information Retrieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++ > > > > > > > > > > On 6/4/16, 11:59 AM, "Brandon Williams" wrote: > > >First off, full disclosure: contributor, committer, PMC member, and > >finally, Datastax employee, in about that order chronologically. > > > >All of the drivers, as far as I know, are Apache licensed, just as is > >Cassandra itself. There is no 'control', there is only momentum, since > >anyone can fork the code if needed and then perhaps gain the momentum if > >Datastax loses it. Nobody is locked in to anything, and no sufficient > >traction has been found to take the momentum away from Datastax yet, > >because at least in my own admittedly biased opinion, our drivers team has > >done an excellent job of accepting community feedback and requests. > > > >tl;dr don't fix what is not broken > > > >On Fri, Jun 3, 2016 at 11:11 PM, Mattmann, Chris A (3980) < > >chris.a.mattm...@jpl.nasa.gov> wrote: > > > >> Thanks Jason for the information - I’m going to continue > >> researching and hope more people will chime in that are on > >> the PMC. > >> > >> Thank you. > >> > >> Cheers, > >> Chris > >> > >> ++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++ > >> Director, Information Retrieval and Data Science Group (IRDS) > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> WWW: http://irds.usc.edu/ > >> ++ > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On 6/3/16, 8:33 PM, "Jason Brown" wrote: > >> > >> > > >> > > >> > > >> >The client-server protocol is well defined in the Cassandra repo, so > any > >> one may implement a client library for any language. However, it is a > far > >> from trivial task, so not many folks build their own. Thus, > already-built > >> drivers tend to become the de facto > >> > standard, but we (the Apache Cassandra committers/PMC) do not/have not > >> blessed any vendor's driver(s) as official. > >> > > >> > > >> >As to why there is not a canonical set of drivers in the Cassandra > repo, > >> well, we've just never gotten into that game as an OSS community. > >> > > >> > > >> >-Jason (not affiliated with DataStax) > >> > > >> >On Friday, June 3, 2016, Johan Edstrom wrote: > >> > > >> > > >> > > >> >On Jun 3, 2016, at 9:14 PM, Jeff Jirsa >> > wrote: > >> > > >> > > >> >https://github.com/hector-client/hector > >> > > >> > > >> > > >> > > >> > > >> > > >> >So - that isn’t doing CQL, Right? > >> > > >> > > >> >https://github.com/Netflix/astyanax > >> > > >> >
Re: Cassandra Java Driver and DataStax
Hi Jake, Thanks for the email. So back to my 2 questions - and in particular #1 - a driver is needed to use Apache Cassandra right? As in, you wouldn’t expect users of Apache Cassandra to get the database core from the ASF, and then use it without a driver (from somewhere else?) Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/5/16, 9:25 AM, "jak...@gmail.com on behalf of Jake Luciani" wrote: >Chris, > >We technically do have barebones java client in tree [1] >CQL was designed as an open protocol anyone can implement [2] > >We really want to see a thriving eco-system for drivers. By making CQL an >open protocol vs making it some internally controlled document/code we feel >it's the best way to achieve this. I'll throw up as a random example [3]. > >-Jake > > >[1]: >https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/SimpleClient.java >[2]: >https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec >[3]: https://github.com/matehat/cqerl > >On Sun, Jun 5, 2016 at 9:32 AM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> Hi All, >> >> Thanks for the replies so far. A few last questions: >> >> 1. Is Apache Cassandra useful *without* a driver? That is, can >> you use the database without a driver to connect to it or in the >> real world would your users all have to download at least one >> driver in order to use the DB? >> >> 2. To confirm again, at one point at least the Java driver code >> lived in the code-base, and further, at one point, people did >> submit some patches to add drivers, but the PMC didn’t want to >> maintain that code (and apparently they didn’t want to create >> any new PMC members and/or committers to do so) and so thus >> people started their own new projects? That right? >> >> Thanks, >> Chris >> >> ++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++ >> Director, Information Retrieval and Data Science Group (IRDS) >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> WWW: http://irds.usc.edu/ >> ++ >> >> >> >> >> >> >> >> >> >> On 6/4/16, 11:59 AM, "Brandon Williams" wrote: >> >> >First off, full disclosure: contributor, committer, PMC member, and >> >finally, Datastax employee, in about that order chronologically. >> > >> >All of the drivers, as far as I know, are Apache licensed, just as is >> >Cassandra itself. There is no 'control', there is only momentum, since >> >anyone can fork the code if needed and then perhaps gain the momentum if >> >Datastax loses it. Nobody is locked in to anything, and no sufficient >> >traction has been found to take the momentum away from Datastax yet, >> >because at least in my own admittedly biased opinion, our drivers team has >> >done an excellent job of accepting community feedback and requests. >> > >> >tl;dr don't fix what is not broken >> > >> >On Fri, Jun 3, 2016 at 11:11 PM, Mattmann, Chris A (3980) < >> >chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> >> Thanks Jason for the information - I’m going to continue >> >> researching and hope more people will chime in that are on >> >> the PMC. >> >> >> >> Thank you. >> >> >> >> Cheers, >> >> Chris >> >> >> >> ++ >> >> Chris Mattmann, Ph.D. >> >> Chief Architect >> >> Instrument Software and Science Data Systems Section (398) >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 168-519, Mailstop: 168-527 >> >> Email: chris.a.mattm...@nasa.gov >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++ >> >> Director, Information Retrieval and Data Science Group (IRDS) >> >> Adjunct Associate Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> WWW: http://irds.usc.edu/ >> >>
Re: Cassandra Java Driver and DataStax
On Sun, Jun 5, 2016 at 8:32 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > 1. Is Apache Cassandra useful *without* a driver? That is, can > you use the database without a driver to connect to it or in the > real world would your users all have to download at least one > driver in order to use the DB? > The users do need to download a driver--but this is pretty normal for community-driven OSS databases. Besides the Apache projects I listed, PostgreSQL also runs on a community-maintained driver model. > 2. To confirm again, at one point at least the Java driver code > lived in the code-base, and further, at one point, people did > submit some patches to add drivers, but the PMC didn’t want to > maintain that code (and apparently they didn’t want to create > any new PMC members and/or committers to do so) and so thus > people started their own new projects? That right? > I think that summary over-emphasizes the governance aspect at the expense of more important considerations: 0. The very first Cassandra driver interface was Thrift. No Thrift clients were ever part of the Cassandra tree. 1. When we created the CQL protocol, we initially had a Java driver in tree as a reference implementation. 2. But due primarily to the project management issues mentioned by Nate, and secondarily to the governance aspects above, we moved quickly back to the pure community-driven drivers approach that had worked for us before. 2a. While some Apache databases do ship a Java driver in tree, I think that this hinders adoption because it signals to users that non-Java drivers are second-class citizens. (No doubt this is not the *intent* of that decision, but it is a likely consequence nevertheless.) 2b. DataStax saw CQL adoption as a key driver for Cassandra adoption and hence its own success, and hired a team to accelerate the production of drivers for the new CQL protocol. These drivers are Apache licensed and see broad community participation, e.g. with ~70 contributors to the Java driver. 2c. Neither has DataStax "sucked the oxygen out of the room." Lots of non-DataStax drivers exist as well. As Aleksey pointed out earlier, I don't see anyone being harmed by this state of affairs. Cassandra PMC doesn't want to run drivers projects, driver authors don't want to be run by Cassandra PMC, and meanwhile users have Apache licensed drivers that let them be productive with Cassandra.
Re: Cassandra Java Driver and DataStax
Thanks for the info Jonathan. I think have assessed based on the replies thus far, my studying of the archives and commit and project history the following situation. Unfortunately it seems like there is a bit of control going on I’m going to call a spade a spade here. A key portion of your software’s stack, a client driver to use it, exists outside of Apache in separate communities. This is an inherent risk to the project. Some of you cite flexibility and adaptability as reasons for this - I’ve seen it in so many communities over the last 12+ years in the foundation - it’s not really due to those issues. There is definitely some control going on. I would ask you all this - has there been a PR or patch in the past year or two that wasn’t singularly reviewed by DataStax committers and PMC? Also, as to the composition of the PMC when was the last time a non DataStax person was elected to the PMC and/or as a committer? By itself the diversity issues alone are not damning to the project, but taken together with the citation to other project communities even those outside of Apache (e.g., the comments well “Postgres does it this way, so it’s a good example to compare us to” or “these other 4 projects at the ASF do it like this, so X”.. [sic]) and with the perception being created to those that don’t work at DataStax, and there is an issue here. I would like to see a discussion in your next board report about the diversity and health issues of the project, and also some ideas about potential strategies for mitigation. I appreciate the open and honest conversation thus far. Let’s keep it up. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/5/16, 1:51 PM, "Jonathan Ellis" wrote: >On Sun, Jun 5, 2016 at 8:32 AM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> 1. Is Apache Cassandra useful *without* a driver? That is, can >> you use the database without a driver to connect to it or in the >> real world would your users all have to download at least one >> driver in order to use the DB? >> > >The users do need to download a driver--but this is pretty normal for >community-driven OSS databases. Besides the Apache projects I listed, >PostgreSQL also runs on a community-maintained driver model. > > >> 2. To confirm again, at one point at least the Java driver code >> lived in the code-base, and further, at one point, people did >> submit some patches to add drivers, but the PMC didn’t want to >> maintain that code (and apparently they didn’t want to create >> any new PMC members and/or committers to do so) and so thus >> people started their own new projects? That right? >> > >I think that summary over-emphasizes the governance aspect at the expense >of more important considerations: > >0. The very first Cassandra driver interface was Thrift. No Thrift clients >were ever part of the Cassandra tree. > >1. When we created the CQL protocol, we initially had a Java driver in tree >as a reference implementation. > >2. But due primarily to the project management issues mentioned by Nate, >and secondarily to the governance aspects above, we moved quickly back to >the pure community-driven drivers approach that had worked for us before. > >2a. While some Apache databases do ship a Java driver in tree, I think that >this hinders adoption because it signals to users that non-Java drivers are >second-class citizens. (No doubt this is not the *intent* of that >decision, but it is a likely consequence nevertheless.) > >2b. DataStax saw CQL adoption as a key driver for Cassandra adoption and >hence its own success, and hired a team to accelerate the production of >drivers for the new CQL protocol. These drivers are Apache licensed and >see broad community participation, e.g. with ~70 contributors to the Java >driver. > >2c. Neither has DataStax "sucked the oxygen out of the room." Lots of >non-DataStax drivers exist as well. > >As Aleksey pointed out earlier, I don't see anyone being harmed by this >state of affairs. Cassandra PMC doesn't want to run drivers projects, >driver authors don't want to be run by Cassandra PMC, and meanwhile users >have Apache licensed drivers that let them be productive with Cassandra.
Re: Cassandra Java Driver and DataStax
I am a non-datastax-employee committer, and the large percentage of my commits are not reviewed by datastax exmployees. I see problems or areas of improvement in the code base, and directly commit them. No questions asked, no oversight, no direction at all from datastax or their employees. I have had a minor number of commits that were reviewed by cassandra committers, some of which are datastax employees, but the overwhelming number have not been that way. If you go by pure commit counts, (an admittedly dubious rating, but still) i am #4 on number of commits. On 06/05/2016 06:33 PM, Mattmann, Chris A (3980) wrote: Thanks for the info Jonathan. I think have assessed based on the replies thus far, my studying of the archives and commit and project history the following situation. Unfortunately it seems like there is a bit of control going on I’m going to call a spade a spade here. A key portion of your software’s stack, a client driver to use it, exists outside of Apache in separate communities. This is an inherent risk to the project. Some of you cite flexibility and adaptability as reasons for this - I’ve seen it in so many communities over the last 12+ years in the foundation - it’s not really due to those issues. There is definitely some control going on. I would ask you all this - has there been a PR or patch in the past year or two that wasn’t singularly reviewed by DataStax committers and PMC? Also, as to the composition of the PMC when was the last time a non DataStax person was elected to the PMC and/or as a committer? By itself the diversity issues alone are not damning to the project, but taken together with the citation to other project communities even those outside of Apache (e.g., the comments well “Postgres does it this way, so it’s a good example to compare us to” or “these other 4 projects at the ASF do it like this, so X”.. [sic]) and with the perception being created to those that don’t work at DataStax, and there is an issue here. I would like to see a discussion in your next board report about the diversity and health issues of the project, and also some ideas about potential strategies for mitigation. I appreciate the open and honest conversation thus far. Let’s keep it up. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/5/16, 1:51 PM, "Jonathan Ellis" wrote: On Sun, Jun 5, 2016 at 8:32 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: 1. Is Apache Cassandra useful *without* a driver? That is, can you use the database without a driver to connect to it or in the real world would your users all have to download at least one driver in order to use the DB? The users do need to download a driver--but this is pretty normal for community-driven OSS databases. Besides the Apache projects I listed, PostgreSQL also runs on a community-maintained driver model. 2. To confirm again, at one point at least the Java driver code lived in the code-base, and further, at one point, people did submit some patches to add drivers, but the PMC didn’t want to maintain that code (and apparently they didn’t want to create any new PMC members and/or committers to do so) and so thus people started their own new projects? That right? I think that summary over-emphasizes the governance aspect at the expense of more important considerations: 0. The very first Cassandra driver interface was Thrift. No Thrift clients were ever part of the Cassandra tree. 1. When we created the CQL protocol, we initially had a Java driver in tree as a reference implementation. 2. But due primarily to the project management issues mentioned by Nate, and secondarily to the governance aspects above, we moved quickly back to the pure community-driven drivers approach that had worked for us before. 2a. While some Apache databases do ship a Java driver in tree, I think that this hinders adoption because it signals to users that non-Java drivers are second-class citizens. (No doubt this is not the *intent* of that decision, but it is a likely consequence nevertheless.) 2b. DataStax saw CQL adoption as a key driver for Cassandra adoption and hence its own success, and hired a team to accelerate the production of drivers for the new CQL protocol. These drivers are Apache licensed and see broad community participation, e.g. with ~70 contributors to the J