Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-09 Thread Jeff Widman
Thanks Max, always encouraging to hear that the time I spend on open source
is helping others.

Your use case is very similar to what drove my original desire to get
involved with the project. Being able to `pip install cqlsh` from a dev
machine was so much lighter weight than the alternatives.

Anyone else care to weigh in on this?

What are the next steps to move to a decision?

Cheers,
Jeff

On Sat, Jul 8, 2023, 7:23 PM Max C.  wrote:

> As a user, I really appreciate your efforts Jeff & Brad.  I would *love*
> for the C* project to officially support this.
>
> In our environment we have a lot of client machines that all share common
> NFS mounted directories.  It's much easier for us to create a Python
> virtual environment on a file server with the cqlsh PyPI package installed
> than it is to install the Cassandra RPMs on every single machine.  Before I
> discovered your PyPI package, our developers would need to login to  a
> Cassandra node in order to run cqlsh.  The cqlsh PyPI package, however, is
> in our standard "python dev tools" virtual environment -- along with
> Ansible, black, isort and various other Python packages; which means it's
> accessible to everyone, everywhere.
>
> I agree that this should not *replace* packaging cqlsh in the Cassandra
> RPM, so much provide an additional *option* for installing cqlsh without
> the baggage of installing the full Cassandra package.
>
> Thanks again for your work Jeff & Brad.
>
> - Max
> On 7/6/2023 5:55 PM, Jeff Widman wrote:
>
> Myself and Brad Schoening currently maintain
> https://pypi.org/project/cqlsh/ which repackages CQLSH that ships with
> every Cassandra release.
>
> This way:
>
>- anyone who wants a lightweight client to talk to a remote cassandra
>can simply `pip install cqlsh` without having to download the full
>cassandra source, unzip it, etc.
>- it's very easy for folks to use it as scaffolding in their python
>scripts/tooling since they can simply include it in the list of their
>required dependencies.
>
> We currently handle the packaging by waiting for a release, then manually
> copy/pasting the code out of the cassandra source tree into
> https://github.com/jeffwidman/cqlsh which has some additional
> build/python package configuration files, then using standard
> python tooling to publish to PyPI.
>
> Given that our project is simply a build/packaging project, I wanted to
> start a conversation about upstreaming this into core Cassandra. I realize
> that Cassandra has no interest in maintaining lots of build targets... but
> given that cqlsh is written in Python and publishing to PyPI enables DBA's
> to share more complicated tooling built on top of it this seems like a
> natural fit for core cassandra rather than a standalone project.
>
> Goal:
> When a Cassandra release happens, the build/release process automatically
> publishes cqlsh to https://pypi.org/project/cqlsh/.
>
> Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
> was some initial chatter about that in
> https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a
> lot of complexity, and I'm honestly not sure it's a great idea. Even if
> folks later want to go that route, the first hurdle is publishing to PyPI,
> so for now let's keep the scope of the discussion limited to treating PyPI
> purely as a release target, and not as an ingredient to a release.
>
> From an implementation perspective, this should be very straightforward.
> We don't have any differences from the CQLSH source that's in cassandra,
> instead we point folks to make changes to cqlsh in the Cassandra source. In
> fact we've made multiple contributions back to `cqlsh` ourselves and have
> drastically cleaned up the code:
> https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening&type=pullrequests.
> So the only real change is adding the package config files and the build /
> release pipeline.
>
> We realize the Cassandra team isn't python/PyPI experts, so we'd be more
> than happy to help wire this up and maintain it. I am also a maintainer of
> kazoo and kafka-python which are both popular python clients for other
> distributed databases. So I'm very familiar with open source, python, and
> distributed databases.
>
> My one hesitation around this discussion is that I'm a little concerned
> that we might lose the nimbleness we've currently got from having a
> separate project. Ie, if something is screwed up on PyPI / the build
> process, we can quickly get it fixed and get a new release out so that
> users aren't blocked. Would it be possible as part of this process to
> continue that myself/Brad had commit rights to the build process for PyPI?
> To be clear, I'm not asking for commit rights to the Java code or anything
> outside of Python, I just want to be sure that if we go to the trouble of
> working with you to upstream this that there's a commitment from Cassandra
> to keeping this b

Re: Changing the output of tooling between majors

2023-07-09 Thread Dinesh Joshi
> On Jul 8, 2023, at 8:43 AM, Miklosovic, Stefan  
> wrote:
>  
> If we are providing CQL / JSON / YAML for couple years, I do not believe that 
> the argument "lets not break it for folks in nodetool" is still relevant. CQL 
> output is there from times of 4.0 at least (at least!) and YAML / JSON is 
> also not something completely new. It is not like we are suddenly forcing 
> people to change their habits, there was enough time to update the stuff to 
> CQL / json / yaml etc ...

What % of Cassandra users are using 4.0+? Operators who upgrade to 4.0 and 
beyond may still use their existing scripts. Therefore keeping things stable is 
important. Until nodetool can support JSON as output format for all interaction 
and there is a significant adoption in the user community, I would strongly 
advise against making breaking changes to the CLI output.

Dinesh

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-09 Thread Berenguer Blasi
+1 to Josh which is exactly my line of thought as well. But that is only 
valid if we have a solid Jenkins that will eventually run all test 
configs. So I think I lost track a bit here. Are you proposing:


1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) 
config of tests


2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you 
in case of problems?


Or sthg different like having 1 also in Jenkins?

On 7/7/23 17:55, Andrés de la Peña wrote:
I think 500 runs combining all configs could be reasonable, since it's 
unlikely to have config-specific flaky tests. As in five configs with 
100 repetitions each.


On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:

Maybe. Kind of depends on how long we write our tests to run
doesn't it? :)

But point taken. Any non-trivial test would start to be something
of a beast under this approach.

On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:

On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie
 wrote:
> 3. Multiplexed tests (changed, added) run against all JDK's and
a broader range of configs (no-vnode, vnode default, compression,
etc)

I think this is going to be too heavy...we're taking 500 iterations
and multiplying that by like 4 or 5?