Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-11 Thread Jacek Lewandowski
Nobody referred so far to the idea of moving to JUnit 5, what are the
opinions?



niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):

> Alex’s suggestion was that we meta randomise, ie we randomise the config
> parameters to gain better rather than lesser coverage overall. This means
> we cover these specific configs and more - just not necessarily on any
> single commit.
>
> I strongly endorse this approach over the status quo.
>
> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
>
> 
>
>
>
>>
>> I think everyone agrees here, but…. these variations are still catching
>>> failures, and until we have an improvement or replacement we do rely on
>>> them.   I'm not in favour of removing them until we have proof /confidence
>>> that any replacement is catching the same failures.  Especially oa, tries,
>>> vnodes. (Not tries and offheap is being replaced with "latest", which
>>> will be valuable simplification.)
>>
>>
>> What kind of proof do you expect? I cannot imagine how we could prove
>> that because the ability of detecting failures results from the randomness
>> of those tests. That's why when such a test fail you usually cannot
>> reproduce that easily.
>>
>
>
> Unit tests that fail consistently but only on one configuration, should
> not be removed/replaced until the replacement also catches the failure.
>
>
>
>> We could extrapolate that to - why we only have those configurations? why
>> don't test trie / oa + compression, or CDC, or system memtable?
>>
>
>
> Because, along the way, people have decided a certain configuration
> deserves additional testing and it has been done this way in lieu of any
> other more efficient approach.
>
>
>


Re: Welcome Mike Adamson as Cassandra committer

2023-12-11 Thread Berenguer Blasi

Well done Sir! Congrats

On 9/12/23 13:47, Piotr Kołaczkowski wrote:
Congratulations, Mike! Well deserved, working with you has always been 
a pleasure!



Wiadomość napisana przez Melissa Logan  w dniu 
09.12.2023, o godz. 02:35:



Congratulations, Mike!

On Fri, Dec 8, 2023 at 11:13 AM David Capwell  wrote:

Congrats!


On Dec 8, 2023, at 11:00 AM, Lorina Poland 
wrote:

Congratulations, Mike!


Re: [VOTE] Release Apache Cassandra Java Driver 4.18.0

2023-12-11 Thread Alexandre DUTRA
+1 (non-binding)

I tested checksums and GPG signatures.

I also built the Cassandra Quarkus extension with the new driver, which is
imho a good test since it also exercises the reactive API. All tests passed
locally:
https://github.com/datastax/cassandra-quarkus/compare/main...adutra:cassandra-quarkus:asf-driver?expand=1

The only caveat for integrators, apart from the GAV coordinates change, is
that 2 dependencies were moved to provided scope: spotbugs-annotations
and jcip-annotations, so these must be provided explicitly.

Thanks,

Alex Dutra

Le sam. 9 déc. 2023 à 08:42, Mick Semb Wever  a écrit :

> Proposing the test build of Cassandra Java Driver 4.18.0 for release.
>
> sha1: 105d378fce16804a8af4c26cf732340a0c63b3c9
> Git: https://github.com/apache/cassandra-java-driver/tree/4.18.0
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1322
>
> 
> The Source release and Binary convenience artifacts are available here:
>
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/4.18.0/
>
> This is the first release post-donation of the Java Driver.  The maven
> coordinates have changed from com.datastax.oss to org.apache.cassandra,
> while all package names remain the same.  There is still work to be done on
> a number of fronts, e.g. being vendor-neutrality, covered
> under CASSANDRA-18611.
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
>
>


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-11 Thread Benedict
Why do we want to move to JUnit 5? I’m generally opposed to churn unless well justified, which it may be - just not immediately obvious to me.On 11 Dec 2023, at 08:33, Jacek Lewandowski  wrote:Nobody referred so far to the idea of moving to JUnit 5, what are the opinions?niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):Alex’s suggestion was that we meta randomise, ie we randomise the config parameters to gain better rather than lesser coverage overall. This means we cover these specific configs and more - just not necessarily on any single commit.I strongly endorse this approach over the status quo.On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:     I think everyone agrees here, but…. these variations are still catching failures, and until we have an improvement or replacement we do rely on them.   I'm not in favour of removing them until we have proof /confidence that any replacement is catching the same failures.  Especially oa, tries, vnodes. (Not tries and offheap is being replaced with "latest", which will be valuable simplification.)  What kind of proof do you expect? I cannot imagine how we could prove that because the ability of detecting failures results from the randomness of those tests. That's why when such a test fail you usually cannot reproduce that easily. Unit tests that fail consistently but only on one configuration, should not be removed/replaced until the replacement also catches the failure. We could extrapolate that to - why we only have those configurations? why don't test trie / oa + compression, or CDC, or system memtable? Because, along the way, people have decided a certain configuration deserves additional testing and it has been done this way in lieu of any other more efficient approach.



Can't make it to Cassandra Summit but want to see the talks?

2023-12-11 Thread Patrick McFadin
Hi everyone,

The Linux Foundation will be streaming all of the talks from the Cassandra
Summit. Finding the streams is very easy. Go to the conference schedule:

https://events.linuxfoundation.org/cassandra-summit/program/schedule/

Each talk has a YouTube link associated with it. The Keynotes and each room
have their own stream. Find the time and the room, and show up!

If you miss the live stream, the talks will all be available on YouTube
afterward. Join us in the #cassandra-summit channel in the ASF Slack and
start a thread on any talk you have questions. We'll try to get the
speakers to join in.

Patrick


Re: [VOTE] Release Apache Cassandra Java Driver 4.18.0

2023-12-11 Thread Mick Semb Wever
Thanks Alex.



> The only caveat for integrators, apart from the GAV coordinates change, is
> that 2 dependencies were moved to provided scope: spotbugs-annotations
> and jcip-annotations, so these must be provided explicitly.
>


We will need to document this.  They needed to be made provided for
licensing reasons.


Re: [VOTE] Release Apache Cassandra Java Driver 4.18.0

2023-12-11 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>



+1


Re: [VOTE] Release Apache Cassandra Java Driver 4.18.0

2023-12-11 Thread Brandon Williams
+1

Kind Regards,
Brandon

On Sat, Dec 9, 2023 at 1:43 AM Mick Semb Wever  wrote:
>
> Proposing the test build of Cassandra Java Driver 4.18.0 for release.
>
> sha1: 105d378fce16804a8af4c26cf732340a0c63b3c9
> Git: https://github.com/apache/cassandra-java-driver/tree/4.18.0
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1322
>
> The Source release and Binary convenience artifacts are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/4.18.0/
>
> This is the first release post-donation of the Java Driver.  The maven 
> coordinates have changed from com.datastax.oss to org.apache.cassandra, while 
> all package names remain the same.  There is still work to be done on a 
> number of fronts, e.g. being vendor-neutrality, covered under CASSANDRA-18611.
>
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
>


Custom FSError and CommitLog Error Handling

2023-12-11 Thread Raymond Huffman
Hello All,

On our fork of Cassandra, we've implemented some custom behavior for
handling CommitLog and SSTable Corruption errors. Specifically, if a node
detects one of those errors, we want the node to stop itself, and if the
node is restarted, we want initialization to fail. This is useful in
Kubernetes when you expect nodes to be restarted frequently and makes our
corruption remediation workflows less error-prone. I think we could make
this behavior more pluggable by allowing users to provide custom
implementations of the FSErrorHandler, and the error handler that's
currently implemented at
org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in
the same way one can provide custom Partitioners and
Authenticators/Authorizers.

Would you take as a contribution one of the following?
1. user provided implementations of FSErrorHandler and
CommitLogErrorHandler, set via config; and/or
2. new commit failure and disk failure policies that write a poison pill
file to disk and fail on startup if that file exists

The poison pill implementation is what we currently use - we call this a
"Non Transient Error" and we want these states to always require manual
intervention to resolve, including manual action to clear the error. I'd be
happy to contribute this if other users would find it beneficial. I had
initially shared this question in Slack, but I'm now sharing it here for
broader visibility.

-Raymond Huffman


Ext4 data corruption in stable kernels

2023-12-11 Thread Jon Haddad
Hey folks,

Just wanted to raise awareness about a I/O issue that seems to be affecting
some Linux Kernal releases that were listed as STABLE, causing corruption
when using the ext4 filesystem with direct I/O.  I don't have time to get a
great understanding of the full scope of the issue, what versions are
affected, etc, I just want to get this in front of the project.  I am
disappointed that this might negatively affect our ability to leverage
direct I/O for both the commitlog (recently merged) and SSTables
(potentially a future use case), since users won't be able to discern
between a bug we ship and one that we hit as a result of our filesystem
choices.

I think it might be worth putting a note in our docs and in the config to
warn the user to ensure they're not affected, and we may even want to
consider hiding this feature if the blast radius is significant enough that
users would be affected.

https://lwn.net/Articles/954285/

Jon


Re: Ext4 data corruption in stable kernels

2023-12-11 Thread Jacek Lewandowski
Aren't only specific kernels affected? If we can detect the kernel version,
the feature can be force disabled with the problematic kernels


pon., 11 gru 2023, 20:45 użytkownik Jon Haddad  napisał:

> Hey folks,
>
> Just wanted to raise awareness about a I/O issue that seems to be
> affecting some Linux Kernal releases that were listed as STABLE, causing
> corruption when using the ext4 filesystem with direct I/O.  I don't have
> time to get a great understanding of the full scope of the issue, what
> versions are affected, etc, I just want to get this in front of the
> project.  I am disappointed that this might negatively affect our ability
> to leverage direct I/O for both the commitlog (recently merged) and
> SSTables (potentially a future use case), since users won't be able to
> discern between a bug we ship and one that we hit as a result of our
> filesystem choices.
>
> I think it might be worth putting a note in our docs and in the config to
> warn the user to ensure they're not affected, and we may even want to
> consider hiding this feature if the blast radius is significant enough that
> users would be affected.
>
> https://lwn.net/Articles/954285/
>
> Jon
>


Re: Ext4 data corruption in stable kernels

2023-12-11 Thread Jon Haddad
Like I said, I didn't have time to verify the full scope and what's
affected, just that some stable kernels are affected.  Adding to the
problem is that it might be vendor specific as well.  For example, RH might
backport an upstream patch in the kernel they ship that's non-standard.

Hopefully someone compiles a list.

Jon

On Mon, Dec 11, 2023 at 11:51 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Aren't only specific kernels affected? If we can detect the kernel
> version, the feature can be force disabled with the problematic kernels
>
>
> pon., 11 gru 2023, 20:45 użytkownik Jon Haddad 
> napisał:
>
>> Hey folks,
>>
>> Just wanted to raise awareness about a I/O issue that seems to be
>> affecting some Linux Kernal releases that were listed as STABLE, causing
>> corruption when using the ext4 filesystem with direct I/O.  I don't have
>> time to get a great understanding of the full scope of the issue, what
>> versions are affected, etc, I just want to get this in front of the
>> project.  I am disappointed that this might negatively affect our ability
>> to leverage direct I/O for both the commitlog (recently merged) and
>> SSTables (potentially a future use case), since users won't be able to
>> discern between a bug we ship and one that we hit as a result of our
>> filesystem choices.
>>
>> I think it might be worth putting a note in our docs and in the config to
>> warn the user to ensure they're not affected, and we may even want to
>> consider hiding this feature if the blast radius is significant enough that
>> users would be affected.
>>
>> https://lwn.net/Articles/954285/
>>
>> Jon
>>
>


Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-12-11 Thread Brad
I'll add, the process for packaging CQLSH similar if not identical to how
the Python driver will be packaged by Apache for pypl.org once the driver
donation in CEP-8

is completed.  The python driver already has an official PYPL distribution
at https://pypi.org/project/cassandra-driver/.

On Wed, Dec 6, 2023 at 3:03 PM Jeff Widman  wrote:

> 👋 I'm the other current maintainer of https://github.com/jeffwidman/cqlsh
> .
>
> *> Knowing nothing about the pypi release/publish process, I'm curious how
> you would stage and then publish the signed convenience package.
> Background: what we publish post-release-vote needs to be signed and
> identical to what is staged when the release vote starts. See the two
> scripts prepare_release.sh and finish_release.sh
> in https://github.com/apache/cassandra-builds/tree/trunk/cassandra-release
>  ,
> where all the packaging is done in prepare_ and finish_ is just about
> pushing what's in staging to the correct public locations.  I am assuming
> that the CEP would be patching these two files, as well as adding files
> in-tree to the pylib/ directory*
>
> From a tactical implementation standpoint, there's a few ingredients to a
> release:
>
>1. Packaging code used by PyPI such as `pyproject.toml`:
>2. Code freeze of the functional code.
>3. Versioning.
>4. Secret management of the PyPI credentials
>5. The actual publishing to PyPI.
>6. Maintaining the above, ie, fixing breakage and keeping it up to
>date.
>
> Looking at them in more detail:
>
>1. Packaging code used by PyPI such as `pyproject.toml` - This should
>be easy to add into the tree. Brad / myself would be happy to contribute
>and we should be able to pull most of it directly from
>https://github.com/jeffwidman/cqlsh.
>2. Code freeze of the functional code - This already happens today
>upon every Cassandra release.
>3. Versioning - Versioning is a pain since currently CQLSH versions
>are not aligned with Cassandra. Furthermore the internal CQLSH version
>number doesn't always increment when a new version of Cassandra / CQLSH is
>released. However, PyPI requires every release artifact to have a unique
>version number. So we work around this currently by saying "Here's pypi
>version X, which contains the cqlsh version from Y extracted from Cassandra
>version Z".
>   1. If you want to keep CQLSH releases in-lockstep with Cassandra,
>   then life would be _much_ simpler if the CQLSH version directly pulled 
> from
>   the Cassandra version.
>   2. However, there's still the problem that sometimes the CQLSH
>   python packaging may have a bug, which forces a new release of CQLSH. 
> Seems
>   a bit heavyweight to require a new release of Cassandra just to fix a 
> bug
>   in the python packaging.
>   3. Another option is to have CQLSH release *not* tied at the hip to
>   Cassandra releases. Extract it to a separate project/repo and then pull 
> in
>   specific releases of CQLSH into the Cassandra final release. Probably 
> too
>   heavyweight right now given we are trying to simplify life, but wanted 
> to
>   mention it.
>   4. I don't feel strongly on the above, other than the current state
>   of affairs of requiring three different versions is worse than either of
>   the above options.
>   4. Secret management of the PyPI credentials
>   1. I'm not sure if Apache projects have a special "apache account"
>   that they typically use, or if they add multiple maintainers as admins 
> on
>   PyPI, and then add/remove them as they join/drop the core team. Either
>   works for me.
>   2. We'd probably want to keep Brad / myself as admins on PyPI as
>   we'll be more attentive to breakage / fixing things but that's really 
> up to
>   the discretion of the core team... I'm fine if you folks prefer to 
> remove
>   our access.
>   3. Making the secrets available to the publishing tool can be
>   managed using PyPI's trusted publishing:
>   
> https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/#configuring-trusted-publishing
>   .
>5. The actual publishing to PyPI.
>   1. The "staged" releases could be pushed to
>   https://test.pypi.org/project/cqlsh/ and then the final released
>   pushed to the normal https://pypi.org/project/cqlsh/
>   2. The commands to publish to PyPI could be added to the
>   prepare_release.sh and finish_release.sh.
>   3. Alternatively, could add a CI/CD action such as a github action
>   directly to the Cassandra repo that watches for new git tags on the repo
>   and pushes those versions out to PyPI.
>   4. I'm not particularly wedded to 

Re: Ext4 data corruption in stable kernels

2023-12-11 Thread Jacek Lewandowski
Frankly there only two kernel versions mentioned there. I've created
https://issues.apache.org/jira/browse/CASSANDRA-19196 to do something with
that.


pon., 11 gru 2023 o 21:05 Jon Haddad  napisał(a):

> Like I said, I didn't have time to verify the full scope and what's
> affected, just that some stable kernels are affected.  Adding to the
> problem is that it might be vendor specific as well.  For example, RH might
> backport an upstream patch in the kernel they ship that's non-standard.
>
> Hopefully someone compiles a list.
>
> Jon
>
> On Mon, Dec 11, 2023 at 11:51 AM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> Aren't only specific kernels affected? If we can detect the kernel
>> version, the feature can be force disabled with the problematic kernels
>>
>>
>> pon., 11 gru 2023, 20:45 użytkownik Jon Haddad 
>> napisał:
>>
>>> Hey folks,
>>>
>>> Just wanted to raise awareness about a I/O issue that seems to be
>>> affecting some Linux Kernal releases that were listed as STABLE, causing
>>> corruption when using the ext4 filesystem with direct I/O.  I don't have
>>> time to get a great understanding of the full scope of the issue, what
>>> versions are affected, etc, I just want to get this in front of the
>>> project.  I am disappointed that this might negatively affect our ability
>>> to leverage direct I/O for both the commitlog (recently merged) and
>>> SSTables (potentially a future use case), since users won't be able to
>>> discern between a bug we ship and one that we hit as a result of our
>>> filesystem choices.
>>>
>>> I think it might be worth putting a note in our docs and in the config
>>> to warn the user to ensure they're not affected, and we may even want to
>>> consider hiding this feature if the blast radius is significant enough that
>>> users would be affected.
>>>
>>> https://lwn.net/Articles/954285/
>>>
>>> Jon
>>>
>>