Re: Should we change 4.1 to G1 and offheap_objects ?

2023-01-12 Thread Brad
*+1* to changing to G1 on trunk for 5.0 and 4.1.1.  We have over a thousand
clusters and over 10K nodes running on J8 and 11 with G1GC and memory
management is excellent. Excellent. Two observations: first we
reverted MaxGCPauseMillis=200,
which is the JVM default. Cassandra's jvm{8,11}-server.options has 500
(commented out) for some reason. Second on some clusters with 'humongous
allocations' we've had to increase G1HeapRegionSize in a few cases on
clusters with very large partitions.

CMS was deprecated in Java 9, so I don't know why Cassandra would still use
it as the default.

JEP 291: Deprecate the Concurrent Mark Sweep (CMS) Garbage Collector
https://openjdk.org/jeps/291


The change to off-heap memory sounds good, but maybe change on trunk (5.0)
not 4.1.

On Thu, Jan 12, 2023 at 8:16 AM Mick Semb Wever  wrote:

> > Ok, wrt G1 default, this is won't go ahead for 4.1-rc1
> >
> > We can revisit it for 4.1.x
> >
> > We have a lot of voices here adamantly positive for it, and those of us
> that have done the performance testing over the years know why. But being
> called to prove it is totally valid, if you have data to any such tests
> please add them to the ticket 18027
>
>
> Revisiting. Are there any vetoes to making G1 the default (and
> updating the G1 settings, see the patch on
> https://issues.apache.org/jira/browse/CASSANDRA-18027 ) for 4.1.1 ?
>
> IIUC , the summary of this thread till now was: there were no vetoes
> to the change in trunk, but there were vetoes to 4.1.0 (because we
> were inside the beta to GA window), and there was a desire to have
> benchmarking data presented.
>
> WRT benchmarking, we have a separate thread for performance testing in
> the project.  The ticket admittedly does not do its due diligence on
> data presentation and analysis of smaller heaps: a precedent we should
> be creating; but instead relies upon experience from many. Are we ok
> with this this time around, or shall the patch only be applied to
> trunk (where we have no choice w/ jdk17 landing)?
>


Re: Is simplenative in cassandra-stress still relevant?

2023-05-30 Thread Brad
+1 on removing it from cassandra-stress

If you're performing stress testing, why would you not want to use the
official driver?  I've spoken to several people who all have said they've
never used simplenative mode.

On Sat, May 27, 2023 at 3:57 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> I am doing some fixes for cassandra-stress and I stumbled upon this
>
> https://issues.apache.org/jira/browse/CASSANDRA-18529
>
> There is
>
> Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?]
> [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?]
> [protocolVersion=?]
>  OR
> Usage: -mode simplenative [prepared] cql3 [port=?]
>
> "-mode simplenative prepared cql3" throws: (it works without "prepared").
>
> java.lang.ClassCastException: [B cannot be cast to
> org.apache.cassandra.transport.messages.ResultMessage$Prepared
> java.io.IOException: Operation x10 on key(s) [373038504b3436363830]: Error
> executing: (ClassCastException): [B cannot be cast to
> org.apache.cassandra.transport.messages.ResultMessage$Prepared
>
> at org.apache.cassandra.stress.Operation.error(Operation.java:127)
> at
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:105)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:91)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:99)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:242)
> at
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:467)
> java.io.IOException: Operation x10 on key(s) [4e334f364c4c4b373530]: Error
> executing: (ClassCastException): [B cannot be cast to
> org.apache.cassandra.transport.messages.ResultMessage$Prepared
>
>
> I want to ask if this "simplenative" is still relevant and people are
> still using it. It seems to me that nobody is actually using this / I've
> never heard of anybody doing that but I may be wrong and people are using
> it all day and night ...
>
> simplenative uses SimpleClient which is used through the code base, e.g.
> in CQLTester so we are not going to get rid of that for sure.
>
> If simplenative in stress is not relevant, that whole -mode is
> questionable, if we get rid of simplenative, we would end up having "-mode
> native cql3" and since there is nothing but "native" as there is no Thrift
> anymore, "native" is a constant which can go away. If we end up having
> "-mode cql3" as the only mode possible, whole -mode can go away and we can
> rename it to "-cql3".
>
> Thoughts?


Re: Is simplenative in cassandra-stress still relevant?

2023-05-31 Thread Brad
We all agree that we're not suggesting removing SimpleClient from
Cassandra, just from its use in cassandra-stress.

For debugging the native transport protocol, in addition to the standalone
Java Driver, there are the python drivers and ODBC drivers which can be
exercised with cqlsh and Intellij respectively.  Are they not sufficient?

The main issue I see with maintaining the SimpleClient in cassandra-stress
is the burden it puts on a user to understand the options available when
connecting with *-mode*:

> cassandra-stress help -mode

Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?]
[password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?]
[protocolVersion=?]

 OR

Usage: -mode simplenative [prepared] cql3 [port=?]





A user trying to determine how to specify credentials for usr/pwd is
presented with the option to use simplenative and prepared statements
(which appear broken).  It can lead down a rabbit hole of sparse
documentation trying to figure out what the simplenative option is, and is
better than cql3?




On Wed, May 31, 2023 at 1:58 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Interesting point about the debuggability.
>
> Yes, I agree that SimpleClient (as class) should not be removed because we
> are using it in tests. I have already mentioned in my original e-mail that
> for this reason that class is not going anywhere and we still need to use
> it.
>
> The cost of keeping it there is not big, sure, but we clearly see that
> e.g. the usage of "prepared" is buggy and it does not work. That somehow
> indicates to me that it kind of atrophied and nobody seems to notice which
> further supports my case that it is actually not used too much if it went
> undetected for so long.
>
> Anyway, I think that we might just look at that bug with "prepared" and
> fix it and keep it all there. I do not see any tests which would test
> cassandra-stress command, similarly what we have for nodetool in JUnit. We
> could cover cassandra-stress similarly, just to be sure that its invocation
> on the most important commands does not fail over time.
>
>
> 
> From: Brandon Williams 
> Sent: Wednesday, May 31, 2023 2:33
> To: dev@cassandra.apache.org
> Subject: Re: Is simplenative in cassandra-stress still relevant?
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> On Tue, May 30, 2023 at 7:15 PM Brad  wrote:
> > If you're performing stress testing, why would you not want to use the
> official driver?  I've spoken to several people who all have said they've
> never used simplenative mode.
>
> I agree that it shouldn't be used normally, but I'm not sure we should
> remove it, because we can't remove it fully: SimpleClient is still
> used in many tests, and I think that should continue.
>
> If you suspect any kind of native proto or driver issue it may be
> useful to have another implementation easily accessible to aid in
> debugging the problem, and the maintenance cost of keeping it in
> stress is roughly zero in my opinion.  We can make it clear that it's
> not recommended for use and is intended only as a debugging tool,
> though.
>
> Kind Regards,
> Brandon
>


Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-06 Thread Brad
The 'cqlsh' package has been maintained at pypi.org since 2013, see
https://pypi.org/project/cqlsh/#history.  There is a solid 10 year history
of support and interest in the Python package distribution for cqlsh and it
has 11K/downloads per week.

A few additions to Jeff's comments:

   - The 'cqlsh' package has one primary file which is all of 36 lines,
   *setup.cfg*. The suggestion is to 1) move this file into the Apache
   Cassandra repository together with a README.md, and 2) add pypi as a
   distribution target for new Apache Cassandra releases of cqlsh.


   - As it exists today, the 'cqlsh' project is really just a stub which
   exists outside of Apache Cassandra to package cqlsh for distribution onto
   pypi.org.


   - For Windows clients (and yes, there are lots), 'pip install cqlsh' is
   the best way to run cqlsh on Windows.



On Thu, Jul 6, 2023 at 9:50 PM guo Maxwell  wrote:

> Hi :
> First of all, thank you very much for your work. I have a question: what
> is your long-term evolution plan for this project? How to achieve long-term
> continuous maintenance of this project? I have encountered some situations
> where some people's work is related to a certain project, and then they may
> have time to maintain, but once they change jobs, they may not have enough
> time to do this.  Besides, can you share more about the code management
> mechanism?
>
> Jeff Widman  于2023年7月7日周五 08:56写道:
>
>> Myself and Brad Schoening currently maintain
>> https://pypi.org/project/cqlsh/ which repackages CQLSH that ships with
>> every Cassandra release.
>>
>> This way:
>>
>>- anyone who wants a lightweight client to talk to a remote cassandra
>>can simply `pip install cqlsh` without having to download the full
>>cassandra source, unzip it, etc.
>>- it's very easy for folks to use it as scaffolding in their python
>>scripts/tooling since they can simply include it in the list of their
>>required dependencies.
>>
>> We currently handle the packaging by waiting for a release, then manually
>> copy/pasting the code out of the cassandra source tree into
>> https://github.com/jeffwidman/cqlsh which has some additional
>> build/python package configuration files, then using standard
>> python tooling to publish to PyPI.
>>
>> Given that our project is simply a build/packaging project, I wanted to
>> start a conversation about upstreaming this into core Cassandra. I realize
>> that Cassandra has no interest in maintaining lots of build targets... but
>> given that cqlsh is written in Python and publishing to PyPI enables DBA's
>> to share more complicated tooling built on top of it this seems like a
>> natural fit for core cassandra rather than a standalone project.
>>
>> Goal:
>> When a Cassandra release happens, the build/release process automatically
>> publishes cqlsh to https://pypi.org/project/cqlsh/.
>>
>> Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
>> was some initial chatter about that in
>> https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a
>> lot of complexity, and I'm honestly not sure it's a great idea. Even if
>> folks later want to go that route, the first hurdle is publishing to PyPI,
>> so for now let's keep the scope of the discussion limited to treating PyPI
>> purely as a release target, and not as an ingredient to a release.
>>
>> From an implementation perspective, this should be very straightforward.
>> We don't have any differences from the CQLSH source that's in cassandra,
>> instead we point folks to make changes to cqlsh in the Cassandra source. In
>> fact we've made multiple contributions back to `cqlsh` ourselves and have
>> drastically cleaned up the code:
>> https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening&type=pullrequests.
>> So the only real change is adding the package config files and the build /
>> release pipeline.
>>
>> We realize the Cassandra team isn't python/PyPI experts, so we'd be more
>> than happy to help wire this up and maintain it. I am also a maintainer of
>> kazoo and kafka-python which are both popular python clients for other
>> distributed databases. So I'm very familiar with open source, python, and
>> distributed databases.
>>
>> My one hesitation around this discussion is that I'm a little concerned
>> that we might lose the nimbleness we've currently got from having a
>> separate project. Ie, if something is screwed up on PyPI / the build
>> 

Proposed update to cassandra-stress to use Apache Commons CLI

2023-07-10 Thread Brad
The Apache Commons CLI library provides an API for parsing command line
options with the package org.apache.commons.cli and this is already used by
a dozen of existing Cassandra utilities including:

SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter,
SSTableExport, BulkLoader, and others.


However, cassandra-stress is an outlier which uses its own custom classes
to parse command line options with classes such as OptionsSimple.  In
addition, the options syntax for username, password, and others are not
aligned with the format used by CQLSH.

This suggestion is to:

a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies
are required as this library is already used by the project)

b) Align the cassandra-stress CLI options with those in CQLSH,

For example, using the new syntax like CQLSH:


cassandra-stress -username foo -password bar


and replacing the old syntax:

cassandra-stress -mode username=foo and password=bar


This will simplify and unify the code base, eliminate code and reduce the
confusion between similar named classes such
as org.apache.cassandra.stress.settings.{Option, OptionsMulti,
OptionsSimple} and org.apache.commons.cli.{Option, OptionGroup, Options)

If there are no significant objections, I can raise a Jira for this
proposal.

Regards,

Brad Schoening


Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-13 Thread Brad
I agree that a CEP is a good idea, I'll discuss with Jeff and hope to draft
something.

On Wed, Jul 12, 2023 at 6:13 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> CEP is a great idea. The devil is in details and while this looks cool, it
> will definitely not hurt to have the nuances ironed out.
>
> 
> From: Patrick McFadin 
> Sent: Tuesday, July 11, 2023 2:24
> To: dev@cassandra.apache.org; German Eichberger
> Subject: Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of
> the release process
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> I would say it helps a lot of people. 45k downloads in just last month:
> https://pypistats.org/packages/cqlsh<https://pypistats.org/packages/cqlsh>
>
> I feel like a CEP would be in order, along the lines of CEP-8:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
> <
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
> >
>
> Unless anyone objects, I can help you get the CEP together and we can get
> a vote, then a JIRA in place for any changes in trunk.
>
> Patrick
>
> On Mon, Jul 10, 2023 at 4:58 PM German Eichberger via dev <
> dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>> wrote:
> Same - really appreciate those efforts and also welcome the upstreaming
> and release automation...
>
> German
> 
> From: Jeff Widman mailto:j...@jeffwidman.com>>
> Sent: Sunday, July 9, 2023 1:44 PM
> To: Max C. mailto:mc_cassand...@core43.com>>
> Cc: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> <
> dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>; Brad
> Schoening mailto:bscho...@gmail.com>>
> Subject: [EXTERNAL] Re: CASSANDRA-18654 - start publishing CQLSH to PyPI
> as part of the release process
>
> You don't often get email from j...@jeffwidman.com j...@jeffwidman.com>. Learn why this is important<
> https://aka.ms/LearnAboutSenderIdentification>
> Thanks Max, always encouraging to hear that the time I spend on open
> source is helping others.
>
> Your use case is very similar to what drove my original desire to get
> involved with the project. Being able to `pip install cqlsh` from a dev
> machine was so much lighter weight than the alternatives.
>
> Anyone else care to weigh in on this?
>
> What are the next steps to move to a decision?
>
> Cheers,
> Jeff
>
> On Sat, Jul 8, 2023, 7:23 PM Max C.  mc_cassand...@core43.com>> wrote:
>
> As a user, I really appreciate your efforts Jeff & Brad.  I would *love*
> for the C* project to officially support this.
>
> In our environment we have a lot of client machines that all share common
> NFS mounted directories.  It's much easier for us to create a Python
> virtual environment on a file server with the cqlsh PyPI package installed
> than it is to install the Cassandra RPMs on every single machine.  Before I
> discovered your PyPI package, our developers would need to login to  a
> Cassandra node in order to run cqlsh.  The cqlsh PyPI package, however, is
> in our standard "python dev tools" virtual environment -- along with
> Ansible, black, isort and various other Python packages; which means it's
> accessible to everyone, everywhere.
>
> I agree that this should not replace packaging cqlsh in the Cassandra RPM,
> so much provide an additional option for installing cqlsh without the
> baggage of installing the full Cassandra package.
>
> Thanks again for your work Jeff & Brad.
>
> - Max
>
> On 7/6/2023 5:55 PM, Jeff Widman wrote:
> Myself and Brad Schoening currently maintain
> https://pypi.org/project/cqlsh/<https://pypi.org/project/cqlsh/> which
> repackages CQLSH that ships with every Cassandra release.
>
> This way:
>
>   *   anyone who wants a lightweight client to talk to a remote cassandra
> can simply `pip install cqlsh` without having to download the full
> cassandra source, unzip it, etc.
>   *   it's very easy for folks to use it as scaffolding in their python
> scripts/tooling since they can simply include it in the list of their
> required dependencies.
>
> We currently handle the packaging by waiting for a release, then manually
> copy/pasting the code out of the cassandra source tree into
> https://github.com/jeffwidman/cqlsh<https://github.com/jeffwidman/cqlsh>
> which has some additional build/python package configuration files, then
> usin

[Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-09 Thread Brad
As per the CEP process guidelines, I'm starting a formal DISCUSS thread to
resume the conversation started here[1].

The developers who maintain the Python CQLSH client on the official Python
PYPI repository would like to integrate and donate their open source work
to the Apache Cassandra project so it can be more tightly and seamlessly
integrated.

The Apache Cassandra project pre-dates the adoption in Python 3.4 of PyPI
as the default package manager. As a result, an unofficial distribution has
been provided by a group of developers who have maintained the repository
there since October 2013.

The installable version of CQLSH on PyPI.org allows end users to install a
cqlsh client with PIP - no tarball or path setup required. I.e.,

  $ pip install cqlsh

This popular package has 50K downloads per month and is today maintained by
Jeff Wideman and Brad Schoening. The PYPI package is updated upon every
major release by simply repackaging the CQLSH that ships with every
Cassandra release.

CQLSH PyPI Repository:  https://pypi.org/project/cqlsh/


This CEP Proposal suggests incorporating PYPI as a regular part of the
Cassandra release process and making the CQLSH project on PYPI an official
distribution point.

The full CEP can be reviewed at:

Wiki: CEP-35: Add PIP support for CQLSH
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425995>
.

Jira: CASSANDRA-18654
<https://issues.apache.org/jira/browse/CASSANDRA-18654>


But in brief, the proposal will:

   - Add PyPI.org as an official distribution point for CQLSH
   - Allow end users to install CQLSH with simply 'pip install cqlsh' on
   MacOS, Windows and Linux platforms.
   - Donate the modest amount of existing configuration files by the
   authors to Apache Cassandra
   - This only involves the Python CQLSH client, no changes to distribution
   of Java server side code and tools are involved.

We welcome further discussion and suggestions regarding this proposal on
the  mailing list here.

Regards,

Jeff Widman &
Brad Schoening

[1] https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d
<https://lists.apache.org/thread.html/ra7caa1dd42ccaa04bcabfbc33233995c125c655f9a3cdb2c7bd8e9f7%40%3Cdev.cassandra.apache.org%3E>


Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-09 Thread Brad
HI Dinesh,

You are correct that the scope of this CEP is practical, narrow and limited
to having an official distribution of CQLSH on the official Python package
repository. Cassandra end-users, who use the CQLSH command line, would
benefit in several direct ways:

   - A timely distribution of new CQLSH versions on the official Python
   package repository aligned with Apache Cassandra releases
   - A trusted distribution overseen by Apache Cassandra instead of third
   party maintainers. Today, there is only trust-based faith that the PyPI
   distribution of CQLSH matches the Apache Open Source one.
   - A lightweight distribution of CQLSH clocking in at 110KB vs
   downloading a 50MB tarball.

Perhaps those are modest goals, but I would suggest they are big wins for
the Cassandra user community. If you haven't tried it yet, please run '*pip
install cqlsh*' on your desktop and see how nicely it works. Indeed, the
return-on-investment of effort here should be really high, as the work is
mostly already done, it's just run from a private repo at
https://github.com/jeffwidman/cqlsh and has been maintained continually
since 2013.

Other initiatives such as subdividing the project(s) or re-writing the REPL
in another language would be out-of-scope. It would be entirely appropriate
to have a separate discussion on those two topics, if you wish to start
that discussion.

The process and degree of overhead required to publish to PyPI will require
some discovery and discussion. Ideally, it would be possible to automate
it. That is definitely a topic we need further input from the engineers
involved in the build-release process.

A pre-CEP discussion of this proposal was started by Jeff on the mailing
list back in early July, see
https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d.

Regards,

Brad

On Wed, Aug 9, 2023 at 3:31 PM Dinesh Joshi  wrote:

> Brad,
>
> Thanks for starting this discussion. My understanding is that we're
> simply adding pip support for cqlsh and Apache Cassandra project will
> officially publish a cqlsh pip package. This is a good goal but other
> than having an official pip package, what is it that we're gaining?
> Please don't interpret this as push back on your proposal but I am
> unclear on what we're trying to solve by making this official
> distribution. There are several distribution channels and it is
> untenable to officially support all of them.
>
> If we do adopt this, there will be non-zero overhead of the release
> process. This is fine but we need volunteers to run this process. My
> understanding is that they need to be ideally PMC or at least Committers
> on the project to go through all the steps to successfully release a new
> artifact for our users.
>
> I would have liked this CEP to go a bit further than just packaging
> cqlsh in pip. IMHO we should have cqlsh as a separate sub-project. It
> doesn't need to live in the cassandra repo. Extracting cqlsh into it's
> separate repo would allow us to truly decouple cqlsh from the server.
> This is already true for the most part as we rely on the Python driver
> which is compatible with several cassandra releases. As it stands today
> it is not possible for us to update cqlsh without making a Cassandra
> release.
>
> If you truly want to go a bit further, we should consider rewriting
> cqlsh in Java so we can easily share code from the server. We can then
> potentially use Java Native Image[1] to produce a truly platform
> independent binary like golang. Python has its strengths but it does get
> hairy as it expects certain runtime components on the target. Java With
> Native Image we make things very simple from a user's perspective very
> similar to how golang produces statically linked binaries. This might be
> a very far out thought but it is worth exploring. I believe GraalVM's
> license might allow us to produce binaries that we can incorporate in
> our release but IANAL so maybe we can ask ASF legal on their opinion.
>
> Giving cqlsh it's own identity as a sub-project might help us build a
> roadmap and evolve it along these lines.
>
> I would like other folks to chime in with their opinions.
>
> Dinesh
>
> On 8/9/23 09:18, Brad wrote:
> >
> > As per the CEP process guidelines, I'm starting a formal DISCUSS thread
> > to resume the conversation started here[1].
> >
> > The developers who maintain the Python CQLSH client on the official
> > Python PYPI repository would like to integrate and donate their open
> > source work to the Apache Cassandra project so it can be more tightly
> > and seamlessly integrated.
> >
> > The Apache Cassandra project pre-dates the adoption in Python 3.4 of
> > PyPI as the default package manager. As a resul

Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-10 Thread Brad
Distribution is pretty straightforward, the process I follow after 'build'
is complete:

## Release Cassandra CQLSH

You must have the maintainer privilege of `https://pypi.org/project/cqlsh/`


1. verify package locally with pip


pip install -e .



2. Test upload on testpypi



twine upload --repository testpypi dist/*


pip install --index-url https://test.pypi.org/simple/ cqlsh



3. prod upload

twine upload --repository pypi dist/*

Regards,


Brad

On Thu, Aug 10, 2023 at 3:27 PM Patrick McFadin  wrote:

> Dinesh raises some good points.
>
> If we do adopt this, there will be non-zero overhead of the release
> process. This is fine but we need volunteers to run this process. My
> understanding is that they need to be ideally PMC or at least Committers
> on the project to go through all the steps to successfully release a new
> artifact for our users.
>
> Which was addressed in the proposed changes part of the CEP:
>
>
>- A document detailing procedures for releasing to PyPI.org. This
>document should include details on:
>
>
>1. How release to PyPI can be integrated into the build process. Can
>this be done with automation?
>2. How will credentials, permissions and ownership of packages on PyPI
>be managed?
>
>
>-
>
> My first thought was automation and integration into the build release.
>
> Can you briefly outline the steps that need to be followed for a PyPI
> release, Brad?
>
> Patrick
>
>
> On Wed, Aug 9, 2023 at 2:54 PM Abe Ratnofsky  wrote:
>
>> I think it would be good for the project to have an official PyPI
>> distribution, and the signal from users (40K downloads per month) is a
>> clear indication that this is useful. Timely releases would help us get
>> future improvements to cqlsh out faster, and moving this to an official
>> distribution would protect users against any changes in this volunteer
>> effort in case something happens in the future.
>>
>> +1 (nb)
>>
>> --
>> Abe
>>
>> On Aug 9, 2023, at 1:33 PM, Brad  wrote:
>>
>> HI Dinesh,
>>
>> You are correct that the scope of this CEP is practical, narrow and
>> limited to having an official distribution of CQLSH on the official Python
>> package repository. Cassandra end-users, who use the CQLSH command line,
>> would benefit in several direct ways:
>>
>>- A timely distribution of new CQLSH versions on the official Python
>>package repository aligned with Apache Cassandra releases
>>- A trusted distribution overseen by Apache Cassandra instead of
>>third party maintainers. Today, there is only trust-based faith that the
>>PyPI distribution of CQLSH matches the Apache Open Source one.
>>- A lightweight distribution of CQLSH clocking in at 110KB vs
>>downloading a 50MB tarball.
>>
>> Perhaps those are modest goals, but I would suggest they are big wins for
>> the Cassandra user community. If you haven't tried it yet, please run '*pip
>> install cqlsh*' on your desktop and see how nicely it works. Indeed, the
>> return-on-investment of effort here should be really high, as the work is
>> mostly already done, it's just run from a private repo at
>> https://github.com/jeffwidman/cqlsh and has been maintained continually
>> since 2013.
>>
>> Other initiatives such as subdividing the project(s) or re-writing the
>> REPL in another language would be out-of-scope. It would be entirely
>> appropriate to have a separate discussion on those two topics, if you wish
>> to start that discussion.
>>
>> The process and degree of overhead required to publish to PyPI will
>> require some discovery and discussion. Ideally, it would be possible to
>> automate it. That is definitely a topic we need further input from the
>> engineers involved in the build-release process.
>>
>> A pre-CEP discussion of this proposal was started by Jeff on the mailing
>> list back in early July, see
>> https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d.
>>
>> Regards,
>>
>> Brad
>>
>> On Wed, Aug 9, 2023 at 3:31 PM Dinesh Joshi  wrote:
>>
>>> Brad,
>>>
>>> Thanks for starting this discussion. My understanding is that we're
>>> simply adding pip support for cqlsh and Apache Cassandra project will
>>> officially publish a cqlsh pip package. This is a good goal but other
>>> than having an official pip package, what is it that we're gaining?
>>> Please don't interpret this as push back on your proposal but I am
>>> unclear on what we're trying to solve b

[DISCUSS] CASSANDRA-19104: Standardize tablestats formatting and data units

2023-12-04 Thread Brad
Tablestats currently reports output in a mixed format which is neither
ideal for human readability nor machine readability.  Spaces are also
inconsistently used and 13 digit numbers w/out commas or larger units are
complicated to read.

For example, 'Bytes repaired / un-repaired / pending' uses KiB, MiB units,
but 'Space used live / total' uses bytes.

Space used (live): 1463210998523
Space used (total): 1463210998523

Bytes repaired: 0.000KiB
Bytes unrepaired: 4315.386GiB
Bytes pending repair: 0.000KiB

Given tablestats supports a machine readable formatting with the -f format
option for json or yaml output, this Jira proposes:

   - standardizing the output to be human readable (-H) as default and
   - eliminating the current mixed mode of formatting.

The above example would become:

Space used (live): 1463.210 GiB
Space used (total): 1463.210 GiB

Bytes repaired: 0.000 KiB
Bytes unrepaired: 4315.386 GiB
Bytes pending repair: 0.000 KiB

Existing machine readable formatting (with -f) will be unchanged.  More
detailed examples can be found in the Jira CASSANDRA-19104 and associated
google spreadsheet detailing the existing and proposed output:
https://tinyurl.com/38edebjd

We welcome feedback and thoughts on this.


Re: [DISCUSS] CASSANDRA-19104: Standardize tablestats formatting and data units

2023-12-04 Thread Brad
Thanks, Jacek.  Using three significant digits for disk space is a good
suggestion.

On Mon, Dec 4, 2023 at 9:58 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> This looks great,
>
> I'd consider limiting the number of significant digits to 3 in the human
> readable format. In the above example it would translate to:
>
> Space used (live): 1.46 TiB
> Space used (total): 1.46 TiB
>
> Bytes repaired: 0.00 KiB
> Bytes unrepaired: 4.31 TiB
> Bytes pending repair: 0.000 KiB
>
> I just think with human readable format we just expect to have a grasp
> view of the stats and 4th significant digit has very little meaning in that
> case.
>
>
> thanks,
> Jacek
>
>


Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-12-11 Thread Brad
I'll add, the process for packaging CQLSH similar if not identical to how
the Python driver will be packaged by Apache for pypl.org once the driver
donation in CEP-8
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation>
is completed.  The python driver already has an official PYPL distribution
at https://pypi.org/project/cassandra-driver/.

On Wed, Dec 6, 2023 at 3:03 PM Jeff Widman  wrote:

> 👋 I'm the other current maintainer of https://github.com/jeffwidman/cqlsh
> .
>
> *> Knowing nothing about the pypi release/publish process, I'm curious how
> you would stage and then publish the signed convenience package.
> Background: what we publish post-release-vote needs to be signed and
> identical to what is staged when the release vote starts. See the two
> scripts prepare_release.sh and finish_release.sh
> in https://github.com/apache/cassandra-builds/tree/trunk/cassandra-release
> <https://github.com/apache/cassandra-builds/tree/trunk/cassandra-release> ,
> where all the packaging is done in prepare_ and finish_ is just about
> pushing what's in staging to the correct public locations.  I am assuming
> that the CEP would be patching these two files, as well as adding files
> in-tree to the pylib/ directory*
>
> From a tactical implementation standpoint, there's a few ingredients to a
> release:
>
>1. Packaging code used by PyPI such as `pyproject.toml`:
>2. Code freeze of the functional code.
>3. Versioning.
>4. Secret management of the PyPI credentials
>5. The actual publishing to PyPI.
>6. Maintaining the above, ie, fixing breakage and keeping it up to
>date.
>
> Looking at them in more detail:
>
>1. Packaging code used by PyPI such as `pyproject.toml` - This should
>be easy to add into the tree. Brad / myself would be happy to contribute
>and we should be able to pull most of it directly from
>https://github.com/jeffwidman/cqlsh.
>2. Code freeze of the functional code - This already happens today
>upon every Cassandra release.
>3. Versioning - Versioning is a pain since currently CQLSH versions
>are not aligned with Cassandra. Furthermore the internal CQLSH version
>number doesn't always increment when a new version of Cassandra / CQLSH is
>released. However, PyPI requires every release artifact to have a unique
>version number. So we work around this currently by saying "Here's pypi
>version X, which contains the cqlsh version from Y extracted from Cassandra
>version Z".
>   1. If you want to keep CQLSH releases in-lockstep with Cassandra,
>   then life would be _much_ simpler if the CQLSH version directly pulled 
> from
>   the Cassandra version.
>   2. However, there's still the problem that sometimes the CQLSH
>   python packaging may have a bug, which forces a new release of CQLSH. 
> Seems
>   a bit heavyweight to require a new release of Cassandra just to fix a 
> bug
>   in the python packaging.
>   3. Another option is to have CQLSH release *not* tied at the hip to
>   Cassandra releases. Extract it to a separate project/repo and then pull 
> in
>   specific releases of CQLSH into the Cassandra final release. Probably 
> too
>   heavyweight right now given we are trying to simplify life, but wanted 
> to
>   mention it.
>   4. I don't feel strongly on the above, other than the current state
>   of affairs of requiring three different versions is worse than either of
>   the above options.
>   4. Secret management of the PyPI credentials
>   1. I'm not sure if Apache projects have a special "apache account"
>   that they typically use, or if they add multiple maintainers as admins 
> on
>   PyPI, and then add/remove them as they join/drop the core team. Either
>   works for me.
>   2. We'd probably want to keep Brad / myself as admins on PyPI as
>   we'll be more attentive to breakage / fixing things but that's really 
> up to
>   the discretion of the core team... I'm fine if you folks prefer to 
> remove
>   our access.
>   3. Making the secrets available to the publishing tool can be
>   managed using PyPI's trusted publishing:
>   
> https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/#configuring-trusted-publishing
>   .
>5. The actual publishing to PyPI.
>   1. The "staged" releases could be pushed to
>   https://test.pypi.org/project/cqlsh/ and then the final released
>   pushed to the normal https://pypi.org/pr

[Discuss] CQLSH should left-align numbers, right-align text (CASSANDRA-19150)

2024-01-09 Thread Brad
CQLSH currently left-aligns all output, affecting both numbers and text.
While this works well for numbers, a better approach adopted by many is to
left align numbers and right align text.

For example, both Excel and Postgres shell use the later:

psql

# select * from employee;

 empid |  name   |dept

---+-+

 1 | Clark   | Sales

   200 | Dave| Accounting

33 | Johnson | Sales


while CQLSH simply left aligns all the columns

cqlsh> select * from employee;

 empid | dept   | name

---++-

33 |  Sales | Johnson

 1 |  Sales |   Clark

   200 | Accounting |Dave



Left aligned text looks much worse on text values which share common
prefixes


cqlsh> select * from system_views.system_properties limit 7 ;


 name   | value

+

  JAVA_HOME |
  /Users/brad/.jenv/versions/17

   cassandra.jmx.local.port |
7199

   cassandra.logdir |
/usr/local/cassandra-5.0-beta1/bin/../logs

   cassandra.storagedir |
/usr/local/cassandra-5.0-beta1/bin/../data

  com.sun.management.jmxremote.authenticate |
false

 com.sun.management.jmxremote.password.file |
  /etc/cassandra/jmxremote.password

io.netty.transport.estimateSizeOnSubmit |
false



The Jira CASSANDRA-19150
<https://issues.apache.org/jira/browse/CASSANDRA-19150> discusses this in
further detail with some additional examples.


I wanted to raise the issue here to propose changing CQLSH to right-align
text while continue to left-align numbers.


Regards,


Brad Schoening


ReplyForward
Add reaction


Re: [Discuss] CQLSH should left-align numbers, right-align text (CASSANDRA-19150)

2024-01-09 Thread Brad
Derek,

I'm proposing a switch or blanket change to a convention of right aligned
text and left aligned numbers in CQLSH.

I took a look at two other examples, Excel and Postgres shell and that's
how they work when displaying tabular data.  The Jira was originally to
make right or left alignment an option, but making it configurable seems
less useful than choosing a better standard.

On Tue, Jan 9, 2024 at 9:58 AM Derek Chen-Becker 
wrote:

> Just to clarify, per the ticket you're proposing a configuration option to
> control this on a per-column basis, correct? Your email makes it sound like
> a blanket change.
>
> Cheers,
>
> Derek
>
> On Tue, Jan 9, 2024 at 7:34 AM Brad  wrote:
>
>> CQLSH currently left-aligns all output, affecting both numbers and text.
>> While this works well for numbers, a better approach adopted by many is to
>> left align numbers and right align text.
>>
>> For example, both Excel and Postgres shell use the later:
>>
>> psql
>>
>> # select * from employee;
>>
>>  empid |  name   |dept
>>
>> ---+-+
>>
>>  1 | Clark   | Sales
>>
>>200 | Dave| Accounting
>>
>> 33 | Johnson | Sales
>>
>>
>> while CQLSH simply left aligns all the columns
>>
>> cqlsh> select * from employee;
>>
>>  empid | dept   | name
>>
>> ---++-
>>
>> 33 |  Sales | Johnson
>>
>>  1 |  Sales |   Clark
>>
>>200 | Accounting |Dave
>>
>>
>>
>> Left aligned text looks much worse on text values which share common
>> prefixes
>>
>>
>> cqlsh> select * from system_views.system_properties limit 7 ;
>>
>>
>>  name   | value
>>
>>
>> +
>>
>>   JAVA_HOME |
>>   /Users/brad/.jenv/versions/17
>>
>>cassandra.jmx.local.port |
>>   7199
>>
>>cassandra.logdir |
>> /usr/local/cassandra-5.0-beta1/bin/../logs
>>
>>cassandra.storagedir |
>> /usr/local/cassandra-5.0-beta1/bin/../data
>>
>>   com.sun.management.jmxremote.authenticate |
>>   false
>>
>>  com.sun.management.jmxremote.password.file |
>>   /etc/cassandra/jmxremote.password
>>
>> io.netty.transport.estimateSizeOnSubmit |
>>   false
>>
>>
>>
>> The Jira CASSANDRA-19150
>> <https://issues.apache.org/jira/browse/CASSANDRA-19150> discusses this
>> in further detail with some additional examples.
>>
>>
>> I wanted to raise the issue here to propose changing CQLSH to right-align
>> text while continue to left-align numbers.
>>
>>
>> Regards,
>>
>>
>> Brad Schoening
>>
>>
>> ReplyForward
>> Add reaction
>>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>


Re: [Discuss] CQLSH should left-align numbers, right-align text (CASSANDRA-19150)

2024-01-16 Thread Brad
Hi Shailaja,

In the case of machine readable output, CQL uses delimited output ('|')
with whitespace on either side of the data values.

To better support machine readable output, it might be useful to allow
user specified delimiters (in a separate Jira). E.g.:

cqlsh *-s","* -e"CAPTURE '/tmp/props.csv';select * from
system_views.system_properties limit 7"

We could make something like that a precondition here. But it is just
whitespace. I'd agree this shouldn't go into a patch release.

On Mon, Jan 15, 2024 at 1:34 PM  wrote:

> Hi Brad,
>
> While I prefer the indentation style that Postgres following for better
> readability of text, if we are changing it, this may break existing scripts
> of users/operators if tightly coupled with the current format/spaces etc
> (Ideally shouldn’t be, but as Cassandra being used all over the world, such
> scenarios are possible). To avoid breaking such existing scripts, I believe
> either these changes need to happen in a major release or under a feature
> flag (which can be deprecated over the time), for existing scripts to
> continue without breaking until they are fixed.
>
> Thanks,
> Shailaja
>
>
> On Jan 9, 2024, at 5:23 PM, Derek Chen-Becker 
> wrote:
>
> Actually, now that I'm looking at the original email on my browser and not
> my phone (and can see the formatting properly), I think we have the
> nomenclature backward here. Left-alignment in the printing world means that
> text in each cell starts at the left-most column for the cell, but in your
> examples you're calling that right-aligned (and vice-versa). Along the
> lines of what Stefan said, I think this probably came about more as a
> "we'll just keep things simple and use the same alignment everywhere"
> rather than an intentional right-alignment of text for a specific purpose.
> I would actually be fine with left-aligning text to fit what appears to be
> standard practice in other systems.
>
> Cheers,
>
> Derek
>
> On Tue, Jan 9, 2024 at 7:34 AM Brad  wrote:
>
>> CQLSH currently left-aligns all output, affecting both numbers and text.
>> While this works well for numbers, a better approach adopted by many is to
>> left align numbers and right align text.
>>
>> For example, both Excel and Postgres shell use the later:
>>
>> psql
>> # select * from employee;
>>  empid |  name   |dept
>> ---+-+
>>  1 | Clark   | Sales
>>200 | Dave| Accounting
>> 33 | Johnson | Sales
>>
>>
>> while CQLSH simply left aligns all the columns
>>
>> cqlsh> select * from employee;
>>
>>  empid | dept   | name
>> ---++-
>> 33 |  Sales | Johnson
>>  1 |  Sales |   Clark
>>200 | Accounting |Dave
>>
>>
>>
>> Left aligned text looks much worse on text values which share common
>> prefixes
>>
>> cqlsh> select * from system_views.system_properties limit 7 ;
>>
>>
>>  name   | value
>>
>> +
>>   JAVA_HOME |
>>   /Users/brad/.jenv/versions/17
>>cassandra.jmx.local.port |
>>   7199
>>cassandra.logdir |
>> /usr/local/cassandra-5.0-beta1/bin/../logs
>>cassandra.storagedir |
>> /usr/local/cassandra-5.0-beta1/bin/../data
>>   com.sun.management.jmxremote.authenticate |
>>   false
>>  com.sun.management.jmxremote.password.file |
>>   /etc/cassandra/jmxremote.password
>> io.netty.transport.estimateSizeOnSubmit |
>>   false
>>
>>
>>
>> The Jira CASSANDRA-19150
>> <https://issues.apache.org/jira/browse/CASSANDRA-19150> discusses this
>> in further detail with some additional examples.
>>
>> I wanted to raise the issue here to propose changing CQLSH to right-align
>> text while continue to left-align numbers.
>>
>> Regards,
>>
>> Brad Schoening
>>
>> ReplyForward
>> Add reaction
>>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>
>


Re: [DISCUSS] Cassandra 5.0 support for RHEL 7

2024-03-11 Thread Brad
RHEL 7 will reach the end of maintenance on June 30th, 2024 (extended
lifecycle support is an option).

Is it not possible to install and run python 3.8 on RHEL 7?  I assume that
would be necessary to run Java 11 on RHEL 7 with Cassandra 5.0.  It would
be a burden for contributors to test with an obsolete version of python --
you can't 'brew install python@3.6' for example.


% brew install python@3.6

Warning: No available formula with the name "python@3.6"

% brew install python@3.7

Error: python@3.7 has been disabled because it is deprecated upstream!


On Mon, Mar 11, 2024 at 3:38 PM Caleb Rackliffe 
wrote:

> I can try this out on trunk. Will report back...
>
> On Mon, Mar 11, 2024 at 2:23 PM J. D. Jordan 
> wrote:
>
>> The Python driver dropped official support for older EOL Python versions
>> because they are EOL and no longer tested by the newer driver CI. I don’t
>> think there are actually any changes yet that it won’t work in 3.6 still?
>> Maybe someone with Python 3.6 installed can change the if and see?  I think
>> we have some cqlsh tests in dtest?  As long as we as a project run those on
>> RHEL 7, I would be comfortable with adding that back to being supported.
>> Though maybe just in the rpm package?
>>
>> -Jeremiah
>>
>> On Mar 11, 2024, at 1:33 PM, Josh McKenzie  wrote:
>>
>> 
>> Looks like we bumped from 3.6 requirement to 3.7 in CASSANDRA-18960
>>  as well -
>> similar thing. Vector support in python, though that patch took it from
>> "return a simple blob" to "return something the python driver knows about,
>> but apparently not variable types so we'll need to upgrade again."
>>
>> The version of the Python driver that is used by cqlsh (3.25.0) doesn't
>> entirely support the new vector data type introduced by CASSANDRA-18504
>> . While we can
>> perfectly write data, read vectors are presented as blobs:
>>
>>
>> As far as I can tell, support for vector types in cqlsh is the sole
>> reason we've bumped to 3.7 and 3.8 to support that python driver. That
>> correct Andres / Brandon?
>>
>> On Mon, Mar 11, 2024, at 1:22 PM, Caleb Rackliffe wrote:
>>
>> The vector issues itself was a simple error message change:
>> https://github.com/datastax/python-driver/commit/e90c0f5d71f4cac94ed80ed72c8789c0818e11d0
>>
>> Was there something else in 3.29.0 that actually necessitated the move to
>> a floor of Python 3.8? Do we generally change runtime requirements in minor
>> releases for the driver?
>>
>> On Mon, Mar 11, 2024 at 12:12 PM Brandon Williams 
>> wrote:
>>
>> Given that 3.6 has been EOL for 2+ years[1], I don't think it makes
>> sense to add support for it back.
>>
>> Kind Regards,
>> Brandon
>>
>> [1] https://devguide.python.org/versions/
>>
>> On Mon, Mar 11, 2024 at 12:08 PM David Capwell 
>> wrote:
>> >
>> > Originally we had planned to support RHEL 7 but in testing 5.0 we found
>> out that cqlsh no longer works on RHEL 7[1].  This was changed in
>> CASSANDRA-19245 which upgraded python-driver from 3.28.0 to 3.29.0. For
>> some reason this minor version upgrade also dropped support for python 3.6
>> which is the supported python version on RHEL 7.
>> >
>> > We wanted to bring this to the attention of the community to figure out
>> next steps; do we wish to say that RHEL 7 is no longer supported (making
>> upgrades tied to OS upgrades, which can be very hard for users), or do we
>> want to add python 3.6 support back to python-driver?
>> >
>> >
>> > 1: the error seen by users is
>> > $ cqlsh
>> > Warning: unsupported version of Python, required 3.8-3.11 but found 3.6
>> Warning: unsupported version of Python, required 3.8-3.11 but found 2.7
>> > No appropriate Python interpreter found.
>> > $
>> >
>> >
>>
>>
>>


Re: [DISCUSS] Cassandra 5.0 support for RHEL 7

2024-03-11 Thread Brad
Is it different for Java?  How do you get Java 11 on RHEL 7?

On Mon, Mar 11, 2024 at 5:58 PM David Capwell  wrote:

> Is it not possible to install and run python 3.8 on RHEL 7?
>
>
> You have a few options, none really good.
>
> 1) build from source
> 2) a RPM from outside of RHEL; this means you don’t have support and must
> trust a different third party (not managed by python or RedHat)
> 3) you use SCL which means every time you want to touch CQLSH you have to
> remember to enable it (as its per-bash session)
>
> These 3 options don’t really work for most deployments
>
>
> On Mar 11, 2024, at 1:12 PM, Brad  wrote:
>
> RHEL 7 will reach the end of maintenance on June 30th, 2024 (extended
> lifecycle support is an option).
>
> Is it not possible to install and run python 3.8 on RHEL 7?  I assume that
> would be necessary to run Java 11 on RHEL 7 with Cassandra 5.0.  It would
> be a burden for contributors to test with an obsolete version of python --
> you can't 'brew install python@3.6' for example.
>
> % brew install python@3.6
> Warning: No available formula with the name "python@3.6"
>
> % brew install python@3.7
> Error: python@3.7 has been disabled because it is deprecated upstream!
>
>
> On Mon, Mar 11, 2024 at 3:38 PM Caleb Rackliffe 
> wrote:
>
>> I can try this out on trunk. Will report back...
>>
>> On Mon, Mar 11, 2024 at 2:23 PM J. D. Jordan 
>> wrote:
>>
>>> The Python driver dropped official support for older EOL Python versions
>>> because they are EOL and no longer tested by the newer driver CI. I don’t
>>> think there are actually any changes yet that it won’t work in 3.6 still?
>>> Maybe someone with Python 3.6 installed can change the if and see?  I think
>>> we have some cqlsh tests in dtest?  As long as we as a project run those on
>>> RHEL 7, I would be comfortable with adding that back to being supported.
>>> Though maybe just in the rpm package?
>>>
>>> -Jeremiah
>>>
>>> On Mar 11, 2024, at 1:33 PM, Josh McKenzie  wrote:
>>>
>>> 
>>> Looks like we bumped from 3.6 requirement to 3.7 in CASSANDRA-18960
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18960> as well -
>>> similar thing. Vector support in python, though that patch took it from
>>> "return a simple blob" to "return something the python driver knows about,
>>> but apparently not variable types so we'll need to upgrade again."
>>>
>>> The version of the Python driver that is used by cqlsh (3.25.0) doesn't
>>> entirely support the new vector data type introduced by CASSANDRA-18504
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18504>. While we can
>>> perfectly write data, read vectors are presented as blobs:
>>>
>>>
>>> As far as I can tell, support for vector types in cqlsh is the sole
>>> reason we've bumped to 3.7 and 3.8 to support that python driver. That
>>> correct Andres / Brandon?
>>>
>>> On Mon, Mar 11, 2024, at 1:22 PM, Caleb Rackliffe wrote:
>>>
>>> The vector issues itself was a simple error message change:
>>> https://github.com/datastax/python-driver/commit/e90c0f5d71f4cac94ed80ed72c8789c0818e11d0
>>>
>>> Was there something else in 3.29.0 that actually necessitated the move
>>> to a floor of Python 3.8? Do we generally change runtime requirements in
>>> minor releases for the driver?
>>>
>>> On Mon, Mar 11, 2024 at 12:12 PM Brandon Williams 
>>> wrote:
>>>
>>> Given that 3.6 has been EOL for 2+ years[1], I don't think it makes
>>> sense to add support for it back.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>> [1] https://devguide.python.org/versions/
>>>
>>> On Mon, Mar 11, 2024 at 12:08 PM David Capwell 
>>> wrote:
>>> >
>>> > Originally we had planned to support RHEL 7 but in testing 5.0 we
>>> found out that cqlsh no longer works on RHEL 7[1].  This was changed in
>>> CASSANDRA-19245 which upgraded python-driver from 3.28.0 to 3.29.0. For
>>> some reason this minor version upgrade also dropped support for python 3.6
>>> which is the supported python version on RHEL 7.
>>> >
>>> > We wanted to bring this to the attention of the community to figure
>>> out next steps; do we wish to say that RHEL 7 is no longer supported
>>> (making upgrades tied to OS upgrades, which can be very hard for users), or
>>> do we want to add python 3.6 support back to python-driver?
>>> >
>>> >
>>> > 1: the error seen by users is
>>> > $ cqlsh
>>> > Warning: unsupported version of Python, required 3.8-3.11 but found
>>> 3.6 Warning: unsupported version of Python, required 3.8-3.11 but found 2.7
>>> > No appropriate Python interpreter found.
>>> > $
>>> >
>>> >
>>>
>>>
>>>
>


Re: discuss: add to_human_size function

2024-04-10 Thread Brad
It's a useful idea and something supported in other databases.

MySQL has FORMAT function:

FORMAT(X,D[,locale])


Formats the number X to a format like '#,###,###.##', rounded to D decimal
places, and returns the result as a string. If D is 0, the result has no
decimal point or fractional part. If X or D is NULL, the function returns
NULL.FORMAT(X,D[,locale])



ex:


SELECT FORMAT(250500.5634, 2);

250,500.56


SELECT FORMAT(250500.5634,0);

250,500


https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_format


On Tue, Apr 9, 2024 at 8:10 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system tables
> but these numbers are just hard to parse visually. Users can indeed use
> this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we had
> something baked in.
>
> Does this make sense to people? Are there any other approaches to do this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>


Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-27 Thread Brad
The current cassandra-stress is in poor condition and clocks in at a hefty
16k lines of Java code.  I was involved in some work with it last Summer
(CASSANDRA-18529) and it was tricky.

I'm strongly in favor of replacing it with a modern tool which is easier to
configure and more user friendly.  While it's a valid concern to ask who
might help maintain it, the same could be asked about the
existing cassandra-stress which has not been well maintained recently.

A separate subproject like dtest and the Java driver would maybe help
address concerns with introducing a gradle build system and Kotlin.



On Fri, Apr 26, 2024 at 2:30 PM Jon Haddad  wrote:

> @mck I haven't done anything with IP clearance.  Not sure how to, and I
> want to get a feel for if we even want it in the project before I invest
> time in.  Jeff's question about people willing to maintain the project is a
> good one and if people aren't willing to maintain it with me, it's only
> going to make my life harder to move under the project umbrella.  I don't
> want to go from my wild west style of committing whatever I want to waiting
> around for days or weeks to get features committed.
>
>
> Project rename happened here:
>
> commit 6c9493254f7bed57f19aaf5bda19f0b7734b5333
> Author: Jon Haddad 
> Date:   Wed Feb 14 13:21:36 2024 -0800
>
> Renamed the project
>
>
>
>
>
> On Fri, Apr 26, 2024 at 12:50 AM Mick Semb Wever  wrote:
>
>>
>>
>> On Fri, 26 Apr 2024 at 00:11, Jon Haddad  wrote:
>>
>>> I should probably have noted - since TLP is no more, I renamed
>>> tlp-stress to easy-cass-stress around half a year ago when I took it over
>>> again.
>>>
>>
>>
>> Do we have the IP cleared for donation ?
>> At what SHA did you take and rename tlp-stress, and who was the copyright
>> holder til that point ?
>> We can fix this I'm sure, but we will need the paperwork.
>>
>>
>>


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-08 Thread Brad
Many of the utilities in the tools directory (BulkLoader, SSTableExpoert,
etc) already use apache.commons.cli.

On Mon, Jul 8, 2024 at 4:28 PM Dinesh Joshi  wrote:

> I agree about picking libraries on their merit but a major factor for any
> open source project should consider today is the possibility of
> unfavorable/hostile licensing changes.
>
> On Mon, Jul 8, 2024 at 1:15 PM Jon Haddad  wrote:
>
>> Without getting into the pros and cons of both libraries, I have to point
>> out there's something unsettling about making decisions about libraries we
>> used based on arbitrary rules an employer has put into place on its
>> employees.  The project isn't governed by Apple, it's governed by
>> individual contributors to open source.
>>
>> We need to pick libraries based on their merits.  Apple's draconian rules
>> should not prevent us from using the best option available.
>>
>> Jon
>>
>>
>> On Mon, Jul 8, 2024 at 12:57 PM Dinesh Joshi  wrote:
>>
>>> I agree, having a DISCUSS thread with a specific subject line is less
>>> likely to be overlooked.
>>>
>>> One thing I'd like to note here is PicoCLI and Airline 2 are independent
>>> projects that are ALv2 licensed. A subset of the Cassandra contributors may
>>> have difficulty contributing to such projects due to preexisting policies
>>> that their employers may have in place.
>>>
>>> I am concerned about hostile licensing changes in the future which will
>>> necessitate another migration for us. That said, is there a specific reason
>>> to not consider Apache Commons CLI[1]?
>>>
>>> Dinesh
>>>
>>> [1] https://commons.apache.org/proper/commons-cli/
>>>
>>> On Mon, Jul 8, 2024 at 10:22 AM David Capwell 
>>> wrote:
>>>
 I don't think that a separate thread would add extra visibility


 Disagree.  This thread is about adding a feature branch, so many could
 ignore if they don’t care.  The fact you are switching the library (and
 which one) is something we have to hunt for.  By having a new DISCUSS
 thread it makes it clear which library you wish to add, and people can sign
 off if they care or not.

 I wouldn’t create this thread until you settle on which one you wish to
 move forward with.

 Is adding the PicoCLI library as a project dependency getting any 
 objections
 from the Community?


 Thats the point of the new DISCUSS thread.  By being very clear you
 wish to add PicoCLI people can either validate we are allowed to, or raise
 any objections.  I have not really seen any pushback so far outside of 1
 case that wasn’t legally allowed to be used.

 Take a look at previous threads about adding different libraries.

 On Jul 8, 2024, at 7:58 AM, Caleb Rackliffe 
 wrote:

 +1 on picocli

 RE the feature branch, I would just maintain the feature branch in your
 own fork to break out whatever "reviewable units" of code you want. When
 all the incremental review is done (I have no problem going back and
 forth), squash everything together, do whatever additional testing you
 need, and commit.

 On Fri, Jul 5, 2024 at 10:40 AM Maxim Muzafarov 
 wrote:

> > Once you are happy with your chosen library, we need a DISCUSS
> thread to add this new library (current protocol).
>
> Thanks, David. This is a good point, do we need a separate DISCUSS
> thread or can we just use this one? I'm in favour of keeping the
> discussion in one place, especially when topics are closely related. I
> don't think that a separate thread would add extra visibility, but if
> that is the way the community has adopted - no problem at all, I'll
> repost.
>
>
> The reasons for replacing the Airlift/Airline [1] with the PicoCli [2]
> are as follows (in order of priority):
>
> 1. The library is under the Apache-2.0 License
> https://github.com/remkop/picocli?tab=Apache-2.0-1-ov-file#readme
>
> 2. The project is active and well-maintained (last release on 8 May
> 2024)
> https://github.com/remkop/picocli/releases
>
> 3. The library has ZERO dependencies, in some of the cases a single
> file can just be dropped into the sources (it's even pointed out in
> the documentation)
> https://picocli.info/#_add_as_source
>
> 4. Compared to the Airlift library, the PicoCLI uses the same markup
> design concepts, so we don't have to rewrite our command or make
> complex changes, which in turn minimizes the migration.
>
>
> Is adding the PicoCLI library as a project dependency getting any
> objections from the Community? Please, share your thoughts.
>
> There are a few other alternatives (commons-cli, airline2, jcommander)
> but they are not as well known and/or not as elegantly suited to our
> needs based on what we have now.
>
>
> [1] https://github.com/airlift/airline
> [2] https://github.com/rem

Re: Welcome Jordan West and Stefan Miklosovic as Cassandra PMC members!

2024-08-30 Thread Brad
Congrats Stefan & Jordan!

On Fri, Aug 30, 2024 at 4:20 PM Jon Haddad  wrote:

> The PMC's members are pleased to announce that Jordan West and Stefan
> Miklosovic have accepted invitations to become PMC members.
>
> Thanks a lot, Jordan and Stefan, for everything you have done for the
> project all these years.
>
> Congratulations and welcome!!
>
> The Apache Cassandra PMC
>


Re: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Brad
When thinking about compaction vs commit log bottlenecks, there would be
very different profiles between TWCS vs STCS as well as for transient
tables with short TTLs which never accumulate large data, but have heavy
I/O.

Amit's analysis strikes me as insightful.  Multi-threading the commit log
might resolve a pinch point for some classes of workloads, particularly if
it could be done in a reactive manner and wasn't too complex.

On Fri, Jul 22, 2022 at 6:19 AM Pawar, Amit  wrote:

> [Public]
>
>
>
> Thank you Bowen for your reply. Took some time to respond due to testing
> issue.
>
>
>
> I tested again multi-threaded feature with number of records from 260
> million to 2 billion and still improvement is seen around 80% of Ramdisk
> score. It is still possible that compaction can become new bottleneck and
> could be new opportunity to fix it. I am newbie here and possible that I
> failed to understand your suggestion completely.  At-least with this
> testing multi-threading benefit is reflecting in score.
>
>
>
> Do you think multi-threading is good to have now ? else please suggest if
> I need to test further.
>
>
>
> Thanks,
>
> Amit
>
>
>
> *From:* Bowen Song via dev 
> *Sent:* Wednesday, July 20, 2022 4:13 PM
> *To:* dev@cassandra.apache.org
> *Subject:* Re: [DISCUSS] Improve Commitlog write path
>
>
>
> [CAUTION: External Email]
>
> From my past experience, the bottleneck for insert heavy workload is
> likely to be compaction, not commit log. You initially may see commit log
> as the bottleneck when the table size is relatively small, but as the table
> size increases, compaction will likely take its place and become the new
> bottleneck.
>
> On 20/07/2022 11:11, Pawar, Amit wrote:
>
> [Public]
>
>
>
> Hi all,
>
>
>
> (My previous mail is not appearing in mailing list and resending again
> after 2 days)
>
>
>
> Myself Amit and working at AMD Bangalore, India. I am new to Cassandra and
> need to do Cassandra testing on large core systems. Usually should test on
> multi-nodes Cassandra but started with Single node testing to understand
> how Cassandra scales with increasing core counts.
>
>
>
> Test details:
>
> Operation: Insert > 90% (insert heavy)
>
> Operation: Scan < 10%
>
> Cassandra: 3.11.10 and trunk
>
> Benchmark: TPCx-IOT (similar to YCSB)
>
>
>
> Results shows scaling is poor beyond 16 cores and it is almost linear.
> Following settings are the common settings helped to get the better scores.
>
>1. Memtable heap allocation: offheap_objects
>2. memtable_flush_writers > 4
>3. Java heap: 8-32GB with survivor ratio tuning
>4. Separate storage space for Commitlog and Data.
>
>
>
> Many online blogs suggest to add new Cassandra node when unable to take
> high writes. But with large systems, high writes should be easily taken due
> to many cores. Need was to improve the scaling with more cores so this
> suggestion didn’t help. After many rounds of testing it was observed that
> current implementation uses single thread for Commitlog syncing activity.
> Commitlog files are mapped using mmap system call and changes are written
> with msync. Periodic syncing with JVisualvm tool shows
>
>1. thread is not 100% busy with Ramdisk usage for Commitlog storage
>and scaling improved on large systems. Ramdisk scores > 2 X NVME score.
>2. thread becomes 100% busy with NVME usage for Commiglog and score
>does not improve much beyond 16 cores.
>
>
>
> Linux kernel uses 4K pages for mapped memory with mmap system call. So, to
> understand this further, disk I/O testing was done using FIO tool and
> results shows
>
>1. NVME 4K random R/W throughput is very less with single thread and
>it improves with multi-threaded.
>2. Ramdisk 4K random R/W throughput is good with single thread only
>and also better with multi-threaded
>
>
>
> Based on the FIO test results following two ideas were tested for
> Commitlog files with Cassandra-3.1.10 sources.
>
>1. Enable Direct IO feature for Commitlog files (similar to  
> [CASSANDRA-14466]
>Enable Direct I/O - ASF JIRA (apache.org)
>
> 
>)
>2. Enable Multi-threaded syncing for Commitlog files.
>
>
>
> First one need to retest. Interestingly second one helped to improve the
> score with “NVME” disk. NVME disk configuration score is almost within
> 80-90% of ramdisk and 2 times of single threaded implementation.
> Multithreading enabled by adding new thread pool in
> “AbstractCommitLogSegmentManager” class and changed syncing thread as
> manager thread for this new thread pool to take care synchro

Re: [DISCUSS] Removing support for java 8

2022-08-30 Thread Brad
+1 on removing jdk8.  We should also remove python 3.6 (EOL 12/21) on trunk
at the same time.

On Mon, Aug 29, 2022 at 9:40 PM Blake Eggleston 
wrote:

> Sorry, I meant trunk, not 4.1 :)
>
> > On Aug 29, 2022, at 1:09 PM, Blake Eggleston 
> wrote:
> >
> > Hi all, I wanted to propose removing jdk8 support for 4.1. Active
> support ended back in March of this year, and I believe the community has
> built enough confidence in java 11 to make it an uncontroversial change for
> our next major release. Let me know what you think.
> >
> > Thanks,
> >
> > Blake
>
>


[Discuss] CASSANDRA-17914: Modernize CQLSH's with argparse for CLI arts

2022-09-29 Thread Brad

The Python standard library introduced argparse a decade ago in Python 2.7 to 
replace optparse as described in PEP-0389 for command line argument parsing.  
Optparse is no longer maintained, and has been deprecated since Python 3.2, 
although there are no plans to remove it from the std library.  

As part of modernizing CQLSH, I have proposed in CASSANDRA-17914 that we 
upgrade from optparse to argparse.  Argparse is part of the Python standard 
library and has been since 2011 so this upgrade involves no new library 
dependencies and should be self-contained and transparent.

The primary benefit is removing dependencies on deprecated classes and 
components.  Consensus seems to be that argparse has more meaningful help 
messages and is more intuitive to use.


Regards,

Brad Schoening

Re: [Discuss] CEP-24 Password validation and generation

2022-10-10 Thread Brad
I would suggest reviewing the guidelines in sec in 5.1.1.2 of NIST Special
Publication 800-63B
<https://pages.nist.gov/800-63-3/sp800-63b.html#memsecretver> and the
NCSC Password
policy: updating your approach - NCSC.GOV.UK
<https://www.ncsc.gov.uk/collection/passwords/updating-your-approach#PasswordGuidance:UpdatingYourApproach-Don'tenforceregularpasswordexpiry>

Regards,

Brad


On Mon, Sep 19, 2022 at 7:27 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Hi list,
>
> together with my colleague Jackson Fleming we put together CEP-24 about
> password validation and password generation in Cassandra.
>
> https://cwiki.apache.org/confluence/x/QoueDQ
>
> We are looking forward to discuss this CEP with you in depth.
>
> The outcome of this thread would be to sort out any issues / concerns you
> have so we might eventually vote and implement that in upstream if our
> contribution is found to be useful.
>
> There is a reference implementation provided we would like to build our
> solution on top.
>
> Regards
>
> Stefan Miklosovic
>


Re: [Discuss] CEP-24 Password validation and generation

2022-10-11 Thread Brad
I'd agree that password expiry should be avoided. Regarding password
complexity, could we offer a meter instead of specific rules?  The NIST
guideline states:

Verifiers SHOULD NOT impose other composition rules (e.g., requiring
mixtures of different character types or prohibiting consecutively repeated
characters) for memorized secrets.


The CEP-24 draft has a different perspective and states:

   - it has to fulfil n out of these 4 characteristics, number of
   characters per characteristic is again configurable both for warning and
   failure thresholds
  - contains upper case characters
  - contains lower case characters
  - contains digits
  - contains special characters (only ascii chars)


One thing to bear in mind is that the majority of enterprises with
Cassandra will use a SSO solution for authentication.  But test and dev
installations will more frequently use passwords.

Regards,

Brad
On Mon, Oct 10, 2022 at 4:09 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Hi Brad,
>
> your link about not enforcing regular password expiration for users is
> spot on. For these reasons I decided to not expand that CEP in that
> direction. Sure, technically possible, but practically questionable. I
> think that all these guides and recommendations should be looked at from
> the perspective of the system they are meant to be implemented in.
> Enforcing password to be changed in a database system is rather interesting
> take. After I briefly took a look, I do not think there is a database on
> the market which is enforcing this. On the other hand, for example, Neo4j
> forces you to change the password on the first login as the default one is
> "neo4j" for user "neo4j". This does make sense to implement for Cassandra
> as well. I do consider password "cassandra" for role "cassandra" very
> insecure and it is not forced by anybody to change it. However, it is quite
> interesting problem how to achieve that.
>
> Also, the reason I want to leave out historical verification of passwords
> in (at least the initial) implementation is that if we do that, we should
> also restrict the frequency how often a user can change the password. Lets
> think this through. If the depth of historical verification is 5 passwords,
> a user just has to regenerate a password 5 times in a row an he can use the
> same one. So implmenting this without restricting how often he can change
> his password does not make sense. We can indeed explore this further but I
> feel like the initial implementation should not deal with this for now.
>
> When it comes to section 5.1.1.2 of NIST document, as already mention at
> the bottom of the CEP, we used Appendix A of this (1) to model what the
> good password should be. Your link is way more descriptive though.
> Particularly interesting points are these:
>
> - Passwords obtained from previous breach corpuses.
> - Dictionary words.
> - Repetitive or sequential characters (e.g. ‘aa’, ‘1234abcd’).
> - Context-specific words, such as the name of the service, the username,
> and derivatives thereof.
>
> I believe that points 1), 2) and 4) can be implemented easily as checking
> the password against a dictionary. The library we want to use is able to
> check the password against a dictionary. Dictionary check can be also
> implemented as a separate ticket which would just expand the functionality
> of DefaultPasswordValidator. I do not have a problem to include dictionary
> check into the first iteration as well.
>
> Repetitive or sequential characters are already covered in the POC
> implementation.
>
> The document you linked also contains this:
>
> Verifiers SHOULD offer guidance to the subscriber, such as a
> password-strength meter [Meters], to assist the user in choosing a strong
> memorized secret. This is particularly important following the rejection of
> a memorized secret on the above list as it discourages trivial modification
> of listed (and likely very weak) memorized secrets
>
> We are already doing this, quite intelligently, by telling a user what is
> wrong with his password that it can not be used (e.g. that it does not
> contain so and so number of specific characters). The "meter" is also there
> - we have three levels - OK password, password with a warning and failed
> password. We inform a user about the strength of his password retroactively
> - we do not tell him what the password should be before he tries to set one
> however I think that is acceptable when using Cassandra and cqlsh in
> console environment.
>
> (1) https://pages.nist.gov/800-63-3/sp800-63b.html#appA
> 
> From: Brad 
> Sent: Monday, October 10, 2022 17:43
> To: dev@cassand

Re: [Discuss] CEP-24 Password validation and generation

2022-10-12 Thread Brad
Jackson,

You make a good case for implementing a solution that works with existing
policies vs perhaps better but less common practices.

There was a OSS password complexity meter in the OWASP Enterprise Security
API (ESAPI) Java toolkit in ESAPI 2.x.  It was a pass/fail meter testing
for complexity and throwing an exception "New password is not long and
complex enough" for invalid passwords. ESAI seems to have died as a project
after 2.x.

 see verifyPasswordStrength() in:

https://github.com/ESAPI/esapi-java-legacy/blob/develop/src/main/java/org/owasp/esapi/reference/FileBasedAuthenticator.java

Regards,

Brad

On Wed, Oct 12, 2022 at 4:16 AM Fleming, Jackson 
wrote:

> Password Meter - This is an interesting use case, password meters work
> really well when users are using a visual aid (like a website sign up
> page). I’d be concerned by just limiting the complexity that we would
> require to a single number, when a user attempts to create or update a
> password that’s too weak, how do we specify the issue/issues we see with
> said password?
>
>
>
> To an operator saying “A role must have a password that has a strength of
> 90/100” doesn’t have much meaning outside of it’s probably a strong
> password. This makes meeting organisational password requirements near
> impossible, which I would argue is the key use case we are trying to
> satisfy.
>
>
>
> Most organisations I’ve been in have very prescriptive password policies,
> ‘a password must be a minimum n characters long’, ‘must have so many
> special characters’ etc (I am sure most people in this mailing list have
> had the same experience). A meter circumvents this in some regards, while
> in practice a password that does not meet an arbitrary length and set of
> composition rules could be stronger than a password that does meet that set
> of rules, I can see problems trying to get this setup in organisations
> where these kinds of strict rules exists, since to an IT Security
> team/department the number that a meter outputs would be fairly
> subjective/arbitrary (even though the algorithm to generate that score
> would be in the public domain).
>
>
>
> While I agree we should align as closely to NIST as possible, we shouldn’t
> be restricted by it, given the requirement is SHOULD and not SHALL (per the
> verbiage outlined under Requirements Notation and Conventions). I would be
> extremely interested in seeing an implementation that implements a password
> meter that also covers these problems, I think that the current approach is
> more implementable and more palatable to operators and organisations that
> want to use Cassandra.
>
>
>
> Regards,
>
>
>
> Jackson
>
>
>
> *From: *Brad 
> *Date: *Wednesday, 12 October 2022 at 2:42 am
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: [Discuss] CEP-24 Password validation and generation
>
> *NetApp Security WARNING*: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> I'd agree that password expiry should be avoided. Regarding password
> complexity, could we offer a meter instead of specific rules?  The NIST
> guideline states:
>
>
>
> Verifiers SHOULD NOT impose other composition rules (e.g., requiring
> mixtures of different character types or prohibiting consecutively repeated
> characters) for memorized secrets.
>
>
>
> The CEP-24 draft has a different perspective and states:
>
>- it has to fulfil n out of these 4 characteristics, number of
>characters per characteristic is again configurable both for warning and
>failure thresholds
>
>
>- contains upper case characters
>   - contains lower case characters
>   - contains digits
>   - contains special characters (only ascii chars)
>
>
>
> One thing to bear in mind is that the majority of enterprises with
> Cassandra will use a SSO solution for authentication.  But test and dev
> installations will more frequently use passwords.
>
>
>
> Regards,
>
>
>
> Brad
>
> On Mon, Oct 10, 2022 at 4:09 PM Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
> Hi Brad,
>
> your link about not enforcing regular password expiration for users is
> spot on. For these reasons I decided to not expand that CEP in that
> direction. Sure, technically possible, but practically questionable. I
> think that all these guides and recommendations should be looked at from
> the perspective of the system they are meant to be implemented in.
> Enforcing password to be changed in a database system is rather interesting
> take. After I briefly took a look, I do not think there is a database on
> the market wh

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-09 Thread Brad
The default garbage collector in Java 11 is G1*.  *It's designed to be
self-tuning, so I'd call it friendly.  We have run Java 8 and 11 on G1 in
production on all of our 1,000+ clusters for several years.

I'd agree with Jeremiah that it's worth changing in trunk at the very least
and consider backporting.

On Wed, Nov 9, 2022 at 5:10 PM Brandon Williams  wrote:

> If CMS is gone, is there a friendlier alternative to G1?
>
> On Wed, Nov 9, 2022 at 3:53 PM Josh McKenzie  wrote:
> >
> > My recollection (and brief sleuthing now) surfaces: we've gone back and
> forth on the G1 vs. CMS debate over the years and I think we settled on "it
> all depends on your environment, workload, and you need to tune it anyway.
> It might be worth having a 'default' mode that selects one of the two based
> on heap size unless otherwise specified".
> >
> > I certainly wouldn't make changes to any defaults on a release between
> beta and rc personally.
> >
> > On Wed, Nov 9, 2022, at 4:20 PM, Jeff Jirsa wrote:
> >
> > G1 you can argue for with the changes in the JDK, though it's MUCH  less
> friendly to small heaps (e.g. probably our default simple user).
> >
> > Offheap memtables are different though. If someone wants to attest that
> offheap_objects get the same level of rigorous testing as the existing
> default, that'd be useful, but I'm pretty sure that's not true, and bugs
> like https://issues.apache.org/jira/browse/CASSANDRA-12125  (which
> remains undiagnosed) reinforce that it's less commonly used and may have
> latent undiscovered bugs for default users.
> >
> >
> >
> >
> >
> > On Wed, Nov 9, 2022 at 11:23 AM Mick Semb Wever  wrote:
> >
> > Any objections to making these changes, at the very last minute, for
> 4.1-rc1 ?
> > This is CASSANDRA-12029 and CASSANDRA-7486
> >
> > Provided we figure out patches for them in the next day or two.
> >
> >
>


Re: [DISCUSS] Donating easy-cass-stress to the project

2024-10-13 Thread Brad
I'm +1 on replacing the existing cassandra-stress.  My team did some work
last Summer to remove Thrift related CLI args, but arg parsing alone is a
5K line mess. It's certainly not being well-maintained and could use a
replacement.

On Sun, Oct 13, 2024 at 10:25 PM Josh McKenzie  wrote:

> Unsolicited .02:
>
> - If this will eventually replace the in-tree cassandra-stress, does it
> warrant a CEP ?  (i'm ok with skipping, though that step might have
> encouraged the questions above)
>
> I'm +1 to this replacing, -0 on requiring a CEP.
>
> Given the current tool is unmaintained and doesn't (to my knowledge) have
> a workflow-based usage paradigm that could be easily extended, seems like a
> clear win.
>
>
> On Sat, Oct 12, 2024, at 7:31 AM, Mick Semb Wever wrote:
>
>  reply below.
>
>
> I’m terms of next steps: Mick what do we need to do next? Figure out the
> answers to your questions re: getting contributor sign off?
>
>
>
> The process of donation is as follows… (feel free to correct me, or add
> anything)
>
>
> 1. General pre-agreement from the PMC that we'll take this project in, and
> how it will fit in.
>
> Some questions I (personally) have are,
> - Is the PMC ok with accepting a kotlin repository into the main part of
> the project ? (I assume so, kotlin == java, just asking the question.  this
> was asked before, maybe i missed any response)
> - Who are the initial three PMC members that are volunteering to be active
> ? (Jon, Jordan, and ?)
> - How will the activity in this repository maintain visibility to the rest
> of the project ? (see recent discussions wrt sidecar's activity silo-ing)
> - Is the repo intending to adopt general project practices ? (e.g. release
> formalities, "patch by ; reviewed by for " commit messages, etc etc etc.
>  if not, what is planned…)
> - If this will eventually replace the in-tree cassandra-stress, does it
> warrant a CEP ?  (i'm ok with skipping, though that step might have
> encouraged the questions above)
>
>
> 2. IP Donation.  Start filling out the IP Donation¹ form².
>
> Part of this process is to get approval to donate and an ICLA from each
> individual past contributor.  In addition any company involved in past
> works must consent through either an SGA or their CCLA.  In this case, all
> work before SHA 2d4542c27d3f1c0e24899c01247b9a8ee3c9a238 was copyrighted³ to
> The Last Pickle which is now owned by DataStax. Given that copyright was
> over an entire body of work I would say that the SGA⁴ is appropriate.   (I'm
> happy to handle this.)   We only need approval and ICLA's from
> contributors after⁵ that SHA, as the previous copyright to The Last
> Pickle applied to all past contributions.
>
>
> 3. When the form, and all its steps are complete, raise a vote on
> dev@cassandra.a.o and general@incubator.a.o
>
>
> 4. When the vote passes, request ASF Infra (create INFRA jira ticket) to
> move the repository to github.com/apache/cassandra-stress  (or whatever,
> but keep the cassandra- prefix IMO).
>
> --
>
> ¹)  https://incubator.apache.org/ip-clearance/ip-clearance-template.html
>
> ²)
> https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/cassandra-java-driver.xml
>
>
> ³)  https://github.com/thelastpickle/tlp-stress/blob/master/LICENSE.txt#L1
>
>
> ⁴)  https://www.apache.org/licenses/contributor-agreements.html
>
> ⁵)
> https://github.com/rustyrazorblade/easy-cass-stress/compare/2d4542c27d3f1c0e24899c01247b9a8ee3c9a238...main
>
>
>
>
>
>
>


Re: [DISCUSS] 5.1 should be 6.0

2024-12-10 Thread Brad
Usually, a major release would bump the Java and Python supported versions.
Both Java and Python are on well-published and faster release cycles.


On Tue, Dec 10, 2024 at 3:40 PM Paulo Motta  wrote:

> I share this sentiment. Outside of marketing and API compatibility
> considerations, I think the changes are significant enough to warrant a
> major version bump, since it represents a new generation of the database.
>
> On Tue, Dec 10, 2024 at 1:02 PM Brandon Williams  wrote:
>
>> Even if TCM is api-compatible, it will change how operators run
>> Cassandra in a significant way (like, different procedures from every
>> previous version.)  I think that justifies a major.
>>
>> Kind Regards,
>> Brandon
>>
>> On Tue, Dec 10, 2024 at 11:51 AM Jeff Jirsa  wrote:
>> >
>> > You’ve added a ton of API surface to transaction behavior and cluster
>> management. The TCM may or may not be strictly breaking, but they’re
>> fundamentally very very different, so even with semver as the only
>> standard, I think you can justify a major.
>> >
>> > But also, let’s just acknowledge that marketing is a thing and bump the
>> major to acknowledge the huge, massive, database-changing features, even if
>> they’re not meant to be disruptive.
>> >
>> >
>> >
>> > On Dec 10, 2024, at 9:46 AM, Josh McKenzie 
>> wrote:
>> >
>> > Currently we reserve MAJOR in semver changes for API breaking only:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530302#Patching,versioning,andLTSreleases-Versioningandtargeting
>> :
>> >
>> > That's consistent w/semver itself: link:
>> >
>> > Given a version number MAJOR.MINOR.PATCH, increment the:
>> >
>> > MAJOR version when you make incompatible API changes
>> > MINOR version when you add functionality in a backward compatible manner
>> > PATCH version when you make backward compatible bug fixes
>> >
>> >
>> > So absolute literal "correctness" of what we're doing aside, our
>> version numbers mean something to us as a dev community but also mean
>> something to Cassandra users. I'm not confident they mean the same thing to
>> each constituency. I'm also not comfortable with us prioritizing our own
>> version number needs over that of our users, should they differ in meaning.
>> >
>> > Does anybody have insight into how other well known widely adopted
>> projects do things we might be able to learn from? I generally only think
>> about this topic when a discussion like this comes up on our dev list so
>> don't have much insight to bring to the discussion.
>> >
>> > On Tue, Dec 10, 2024, at 11:52 AM, Jeremiah Jordan wrote:
>> >
>> > The question is if we are signaling compatibility or purely marketing
>> with the release number.
>> > We dropped compatibility with a few things in 5.0, which was the reason
>> for the .0 rather than 4.2.  I don’t know if we are breaking any
>> compatibility with current trunk?  Though maybe some of the TCM stuff could
>> be considered that.
>> > If we are purely going for marketing value, then yes, I agree
>> TCM+Accord would be 6.0 worthy.
>> >
>> > -Jeremiah
>> >
>> > On Dec 10, 2024 at 10:48:21 AM, Jon Haddad 
>> wrote:
>> >
>> > Keeping this short.  I'm not sure why we're calling the next release
>> 5.1.  TCM and Accord are a massive thing.  Other .1 / .2 releases were the
>> .0 with some smaller things added.  Imo this is a huge step forward, as big
>> as 5.0 was, so we should call it 6.0.
>> >
>> >
>>
>


Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Brad
I'm -1 on LCS being the default, seen far too many people use it for disk
storage management

On Fri, Dec 6, 2024 at 10:08 PM Jon Haddad  wrote:

> I'm -1 on LCS being the default, since using it in the wrong situations
> renders clusters inoperable.
>
>
> On Fri, Dec 6, 2024 at 7:03 PM Paulo Motta  wrote:
>
>> > I'd prefer to see the default go from STCS to UCS
>>
>> I’m proposing this for latest unstable (cassandra_latest.yaml) since it’s
>> a more recent strategy still being adopted. For latest stable
>> (cassandra.yaml) I’d prefer LCS since it does not need tuning to support
>> mutable workloads (UPDATE/DELETE) and is battle-tested.
>>
>> On Fri, 6 Dec 2024 at 21:37 Jon Haddad  wrote:
>>
>>> I'd prefer to see the default go from STCS to UCS, probably with
>>> scaling_parameters T4.  That's essentially the same as STCS but without the
>>> ridiculous SSTable growth, allowing us to leverage the fast streaming path
>>> more often.  I don't think there's any valid use cases for STCS anymore now
>>> that we have UCS.
>>>
>>> That said, many have taken issue with the state of UCS docs, myself
>>> included, so that would need to be addressed with any default change.
>>>
>>> I don't think we should mark TWCS as experimental.  Maybe we prevent
>>> repairs to tables using TWCS, or do a better job of encouraging folks to
>>> use incremental repair at higher frequencies.  It's definitely not
>>> experimental though.
>>>
>>> Side note: I think experimental has been over-used and has lost all
>>> meaning.  How is Java 17 experimental?  Very confusing for the community.
>>>
>>> I think TWCS should use UCS under the hood which would address streaming
>>> performance (and thus node density) or UCS could be updated to allow for
>>> time window's options.  Either would solve issue #3 in your list.
>>>
>>> Jon
>>>
>>>
>>>
>>> On Fri, Dec 6, 2024 at 5:36 PM Paulo Motta  wrote:
>>>
 Hi,

 It’s 2024 and users are still facing issues due to misconfigured
 compaction when using default configuration.

 I would like to start a conversation around improving compaction
 defaults in 5.1/trunk, so users trying out CQL transactions don’t need to
 worry about tuning compaction.

 A few suggestions:

 1) Make LeveledCompactionStrategy default on cassandra.yaml, UCS
 default on cassandra_latest.yaml ?

 2) Does TWCS work out of the box with repairs and hints? My
 understanding is that due to CASSANDRA-10496 this causes droppable
 tombstone issues when in combination with repair and hints (see more on
 this thread [1]). We should either fix this or mark TWCS experimental.

 3) When STCS is used with deletions/TTL, tombstones accumulate in
 higher level stables when unchecked_tombstone_compaction is disabled (see
 CASSANDRA-6563). I propose having adding a new setting “auto” enabled by
 default that will have this set to true when STCS/TWCS is used.

 I believe addressing these points will improve user experience with
 Cassandra.

 I apologize in advance if these topics were discussed in recent
 threads. I would be happy to get  pointers of related discussions on this
 topic.

 I will be happy to create JIRA if there’s agreement on addressing these
 items.

 Thanks,

 Paulo

 [1] -

 https://user.cassandra.apache.narkive.com/VQOacfnT/twcs-repair-create-new-buckets-with-old-data

>>>


Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Brad
> Could you elaborate what you mean by 'disk storage management'?

I often see clusters use LCS as an easy fix to avoid the 50% disk free
recommendation of STCS without considering the write
magnification implications.

On Fri, Dec 6, 2024 at 10:46 PM Dinesh Joshi  wrote:

> Could you elaborate what you mean by 'disk storage management'?
>
> On Fri, Dec 6, 2024 at 7:30 PM Brad  wrote:
>
>> I'm -1 on LCS being the default, seen far too many people use it for disk
>> storage management
>>
>> On Fri, Dec 6, 2024 at 10:08 PM Jon Haddad 
>> wrote:
>>
>>> I'm -1 on LCS being the default, since using it in the wrong situations
>>> renders clusters inoperable.
>>>
>>>
>>> On Fri, Dec 6, 2024 at 7:03 PM Paulo Motta  wrote:
>>>
>>>> > I'd prefer to see the default go from STCS to UCS
>>>>
>>>> I’m proposing this for latest unstable (cassandra_latest.yaml) since
>>>> it’s a more recent strategy still being adopted. For latest stable
>>>> (cassandra.yaml) I’d prefer LCS since it does not need tuning to support
>>>> mutable workloads (UPDATE/DELETE) and is battle-tested.
>>>>
>>>> On Fri, 6 Dec 2024 at 21:37 Jon Haddad  wrote:
>>>>
>>>>> I'd prefer to see the default go from STCS to UCS, probably with
>>>>> scaling_parameters T4.  That's essentially the same as STCS but without 
>>>>> the
>>>>> ridiculous SSTable growth, allowing us to leverage the fast streaming path
>>>>> more often.  I don't think there's any valid use cases for STCS anymore 
>>>>> now
>>>>> that we have UCS.
>>>>>
>>>>> That said, many have taken issue with the state of UCS docs, myself
>>>>> included, so that would need to be addressed with any default change.
>>>>>
>>>>> I don't think we should mark TWCS as experimental.  Maybe we prevent
>>>>> repairs to tables using TWCS, or do a better job of encouraging folks to
>>>>> use incremental repair at higher frequencies.  It's definitely not
>>>>> experimental though.
>>>>>
>>>>> Side note: I think experimental has been over-used and has lost all
>>>>> meaning.  How is Java 17 experimental?  Very confusing for the community.
>>>>>
>>>>> I think TWCS should use UCS under the hood which would address
>>>>> streaming performance (and thus node density) or UCS could be updated to
>>>>> allow for time window's options.  Either would solve issue #3 in your 
>>>>> list.
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 6, 2024 at 5:36 PM Paulo Motta  wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It’s 2024 and users are still facing issues due to misconfigured
>>>>>> compaction when using default configuration.
>>>>>>
>>>>>> I would like to start a conversation around improving compaction
>>>>>> defaults in 5.1/trunk, so users trying out CQL transactions don’t need to
>>>>>> worry about tuning compaction.
>>>>>>
>>>>>> A few suggestions:
>>>>>>
>>>>>> 1) Make LeveledCompactionStrategy default on cassandra.yaml, UCS
>>>>>> default on cassandra_latest.yaml ?
>>>>>>
>>>>>> 2) Does TWCS work out of the box with repairs and hints? My
>>>>>> understanding is that due to CASSANDRA-10496 this causes droppable
>>>>>> tombstone issues when in combination with repair and hints (see more on
>>>>>> this thread [1]). We should either fix this or mark TWCS experimental.
>>>>>>
>>>>>> 3) When STCS is used with deletions/TTL, tombstones accumulate in
>>>>>> higher level stables when unchecked_tombstone_compaction is disabled (see
>>>>>> CASSANDRA-6563). I propose having adding a new setting “auto” enabled by
>>>>>> default that will have this set to true when STCS/TWCS is used.
>>>>>>
>>>>>> I believe addressing these points will improve user experience with
>>>>>> Cassandra.
>>>>>>
>>>>>> I apologize in advance if these topics were discussed in recent
>>>>>> threads. I would be happy to get  pointers of related discussions on this
>>>>>> topic.
>>>>>>
>>>>>> I will be happy to create JIRA if there’s agreement on addressing
>>>>>> these items.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Paulo
>>>>>>
>>>>>> [1] -
>>>>>>
>>>>>> https://user.cassandra.apache.narkive.com/VQOacfnT/twcs-repair-create-new-buckets-with-old-data
>>>>>>
>>>>>


Re: [DISCUSS] 5.1 should be 6.0

2025-04-10 Thread Brad
> . I assume JDK 21 may lead to removal of JDK 11 which is breaking change

If we name it 6.0, I would hope we bump both Java and Python supported
versions to align with their EOL status.

   - Java 11 with OpenJDK EOL was October 2024
   - Python 3.8 EOL was October 7, 2024


On Thu, Apr 10, 2025 at 2:44 PM Ekaterina Dimitrova 
wrote:

> +1 on calling it 6.0. I assume JDK 21 may lead to removal of JDK 11 which
> is breaking change (people need to upgrade to the common JDK version - 17
> before upgrading to the next release)
>
> On Thu, 10 Apr 2025 at 14:40, Štefan Miklošovič 
> wrote:
>
>> +1, I am also getting questions about the versioning recently and people
>> themselves do not know what to call the next version like.
>>
>> On Thu, Apr 10, 2025 at 8:28 PM Jon Haddad 
>> wrote:
>>
>>> Bringing this back up.
>>>
>>> I don't think we have any reason to hold up renaming the version.  We
>>> can have a separate discussion about what upgrade paths are supported, but
>>> let's at least address this one issue of version number so we can have
>>> consistent messaging.  When i talk to people about the next release, I'd
>>> like to be consistent with what I call it, and have a unified voice as a
>>> project.
>>>
>>> Jon
>>>
>>> On Thu, Jan 30, 2025 at 1:41 AM Mick Semb Wever  wrote:
>>>
 .


> If you mean only 4.1 and 5.0 would be online upgrade targets, I would
> suggest we change that to T-3 so you encompass all “currently supported”
> releases at the time the new branch is GAed.
>
> I think that's better actually, yeah. I was originally thinking T-2
> from the "what calendar time frame is reasonable" perspective, but saying
> "if you're on a currently supported branch you can upgrade to a release
> that comes out" makes clean intuitive sense. That'd mean:
>
> 6.0: 5.0, 4.1, 4.0 online upgrades supported. Drop support for 4.0.
> API compatible guaranteed w/5.0.
> 7.0: 6.0, 5.0, 4.1 online upgrades supported. Drop support for 4.1.
> API compatible guaranteed w/6.0.
> 8.0: 7.0, 6.0, 5.0 online upgrades supported. Drop support for 5.0.
> API compatible guaranteed w/7.0.
>



 I like this.




Re: [VOTE][IP CLEARANCE] easy-cass-stress

2025-04-30 Thread Brad
+1

On Wed, Apr 30, 2025 at 11:16 AM Jordan West  wrote:

> (general@incubator cc'd)
>
> Please vote on the acceptance of the easy-cass-stress (to be renamed
> cassandra-stress) and its IP Clearance:
>
> https://incubator.apache.org/ip-clearance/cassandra-easy-cass-stress.html
>
> All consent from original authors of the donation, and tracking of
> collected CLAs, is found in
>
> https://github.com/rustyrazorblade/easy-cass-stress/pull/41/files and
> 
> https://delicate-tail-8c0.notion.site/easy-cass-stress-submission-141ac849cc9d80a4972cc8623aa54667
>
> These do not all require acknowledgement before the vote.
>
> The code is prepared for donation at
> https://github.com/rustyrazorblade/easy-cass-stress
>
> Once this vote passes we will request ASF Infra to move the
> rustyrazorblade/easy-cass-stress as-is to apache/cassandra-stress. The main
> branch and gh-pages branches, all tags, and all history, will be kept.  The
> main branch will continue to be named main.
>
> PMC members, please check carefully the IP Clearance requirements before
> voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members
>
> are considered binding. A vote passes if there are at least three binding
> +1s and no -1's.
>
> Thanks,
>
> Jordan
>


Re: Welcome Abe Ratnofsky as Cassandra committer!

2025-05-12 Thread Brad
Congrats, Abe!

On Mon, May 12, 2025 at 12:46 PM Alex Petrov  wrote:

> Hello folks of the dev list,
>
> The Apache Cassandra PMC is very glad to announce that Abe Ratnofsky has
> accepted our invitation to become a committer!
>
> Abe has been actively contributing to Cassandra itself, made outstanding
> contributions to the Cassandra drivers, played a key role in the recently
> accepted CEP-45 [1], and has been active in the community — including on
> this mailing list and on Cassandra conferences and meetups.
>
> Please join us in congratulating and welcoming Abe!
>
> Alex Petrov
> on behalf of the Apache Cassandra PMC
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45:+Mutation+Tracking
>
>