Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Venkata Hari Krishna Nukala
Thanks Jon & Scott for taking time to go through this CEP and providing
inputs.

I am completely with what Scott had mentioned earlier (I would have added
more details into the CEP). Adding a few more points to the same.

Having a solution with Sidecar can make the migration easy without
depending on rsync. At least in the cases I have seen, rsync is not enabled
by default and most of them want to run OS/images with as minimal
requirements as possible. Installing rsync requires admin privileges and
syncing data is a manual operation. If an API is provided with Sidecar,
then tooling can be built around it reducing the scope for manual errors.

>From performance wise, at least in the cases I had seen, the File Streaming
API in Sidecar performs a lot better. To give an idea on the performance, I
would like to quote "up to 7 Gbps/instance writes (depending on hardware)"
from CEP-28 as this CEP proposes to leverage the same.

For:

>When enabled for LCS, single sstable uplevel will mutate only the level of
an SSTable in its stats metadata component, which wouldn't alter the
filename and may not alter the length of the stats metadata component. A
change to the level of an SSTable on the source via single sstable uplevel
may not be caught by a digest based only on filename and length.

In this case file size may not change, but the timestamp of last modified
time would change, right? It is addressed in section MIGRATING ONE
INSTANCE, point 2.b.ii which says "If a file is present at the destination
but did not match (by size or timestamp) with the source file, then local
file is deleted and added to list of files to download.". And after
download by final data copy task, file should match with source.

On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
wrote:

> Oh, one note on this item:
>
> >  The operator can ensure that files in the destination matches with the
> source. In the first iteration of this feature, an API is introduced to
> calculate digest for the list of file names and their lengths to identify
> any mismatches. It does not validate the file contents at the binary level,
> but, such feature can be added at a later point of time.
>
> When enabled for LCS, single sstable uplevel will mutate only the level of
> an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> Including the file’s modification timestamp would address this without
> requiring a deep hash of the data. This would be good to include to ensure
> SSTables aren’t downleveled unexpectedly during migration.
>
> - Scott
>
> On Apr 8, 2024, at 2:15 PM, C. Scott Andreas  wrote:
>
> 
> Hi Jon,
>
> Thanks for taking the time to read and reply to this proposal. Would
> encourage you to approach it from an attitude of seeking understanding on
> the part of the first-time CEP author, as this reply casts it off pretty
> quickly as NIH.
>
> The proposal isn't mine, but I'll offer a few notes on where I see this as
> valuable:
>
> – It's valuable for Cassandra to have an ecosystem-native mechanism of
> migrating data between physical/virtual instances outside the standard
> streaming path. As Hari mentions, the current ecosystem-native approach of
> executing repairs, decommissions, and bootstraps is time-consuming and
> cumbersome.
>
> – An ecosystem-native solution is safer than a bunch of bash and rsync.
> Defining a safe protocol to migrate data between instances via rsync
> without downtime is surprisingly difficult - and even moreso to do safely
> and repeatedly at scale. Enabling this process to be orchestrated by a
> control plane mechanizing offical endpoints of the database and sidecar –
> rather than trying to move data around behind its back – is much safer than
> hoping one's cobbled together the right set of scripts to move data in a
> way that won't violate strong / transactional consistency guarantees. This
> complexity is kind of exemplified by the "Migrating One Instance" section
> of the doc and state machine diagram, which illustrates an approach to
> solving that problem.
>
> – An ecosystem-native approach poses fewer security concerns than rsync.
> mTLS-authenticated endpoints in the sidecar for data movement eliminate the
> requirement for orchestration to occur via (typically) high-privilege SSH,
> which often allows for code execution of some form or complex efforts to
> scope SSH privileges of particular users; and eliminates the need to manage
> and secure rsyncd processes on each instance if not via SSH.
>
> – An ecosystem-native approach is more instrumentable and measurable than
> rsync. Support for data migration endpoints in the sidecar would allow for
> metrics reporting, stats collection, and alerting via mature and modern
> mechanisms rather than monitoring the output of a shell s

[VOTE] Release Apache Cassandra 3.0.30

2024-04-11 Thread Brandon Williams
Proposing the test build of Cassandra 3.0.30 for release.

sha1: 657e595b78227c28a6b8808ef9bf62f646029f3b
Git: https://github.com/apache/cassandra/tree/3.0.30-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1326/org/apache/cassandra/cassandra-all/3.0.30/

The Source and Build Artifacts, and the Debian and RPM packages and
repositories, are available here:
https://dist.apache.org/repos/dist/dev/cassandra/3.0.30/

The vote will be open for 72 hours (longer if needed). Everyone who
has tested the build is invited to vote. Votes by PMC members are
considered binding. A vote passes if there are at least three binding
+1s and no -1's.

[1]: CHANGES.txt:
https://github.com/apache/cassandra/blob/3.0.30-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/3.0.30-tentative/NEWS.txt


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Dinesh Joshi
On Mon, Apr 8, 2024 at 10:23 AM Jon Haddad  wrote:

> This seems like a lot of work to create an rsync alternative.  I can't
> really say I see the point.  I noticed your "rejected alternatives"
> mentions it with this note:
>

I want to point out a few things before dismissing it as an 'rsync
alternative' -

1. rsync is dangerous for many reasons. Top reason is security. rsync
executed over ssh offers a much broader access than is necessary for this
use-case. Operators also have to maintain multiple sets of credentials for
AuthN/AuthZ - ssh being just one of them. Finally, ssh isn't simply allowed
in some environments.

2. rsync is an incomplete solution. You still need to wrap rsync in a
script that will ensure that it does the right thing for each version of
Cassandra, accounts for failures, retries, etc.

The way I see it is if this solves a problem and adds value for even a
subset of our users it would be valuable to accept it.

Dinesh


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Jon Haddad
First off, let me apologize for my initial reply, it came off harsher than
I had intended.

I know I didn't say it initially, but I like the idea of making it easier
to replace a node.  I think it's probably not obvious to folks that you can
use rsync (with stunnel, or alternatively rclone), and for a lot of teams
it's intimidating to do so.  Whether it actually is easy or not to do with
rsync is irrelevant.  Having tooling that does it right is better than duct
taping things together.

So with that said, if you're looking to get feedback on how to make the CEP
more generally useful, I have a couple thoughts.

> Managing the Cassandra processes like bringing them up or down while
migrating the instances.

Maybe I missed this, but I thought we already had support for managing the
C* lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me
that adding the ability to make this entire workflow self managed would be
the biggest win, because having a live migrate *feature* instead of what's
essentially a runbook would be far more useful.

> To verify whether the desired file set matches with source, only file
path and size is considered at the moment. Strict binary level verification
is deferred for later.

Scott already mentioned this is a problem and I agree, we cannot simply
rely on file path and size.

TL;DR: I like the intention of the CEP.  I think it would be better if it
managed the entire lifecycle of the migration, but you might not have an
appetite to implement all that.

Jon


On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> Thanks Jon & Scott for taking time to go through this CEP and providing
> inputs.
>
> I am completely with what Scott had mentioned earlier (I would have added
> more details into the CEP). Adding a few more points to the same.
>
> Having a solution with Sidecar can make the migration easy without
> depending on rsync. At least in the cases I have seen, rsync is not enabled
> by default and most of them want to run OS/images with as minimal
> requirements as possible. Installing rsync requires admin privileges and
> syncing data is a manual operation. If an API is provided with Sidecar,
> then tooling can be built around it reducing the scope for manual errors.
>
> From performance wise, at least in the cases I had seen, the File
> Streaming API in Sidecar performs a lot better. To give an idea on the
> performance, I would like to quote "up to 7 Gbps/instance writes (depending
> on hardware)" from CEP-28 as this CEP proposes to leverage the same.
>
> For:
>
> >When enabled for LCS, single sstable uplevel will mutate only the level
> of an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> In this case file size may not change, but the timestamp of last modified
> time would change, right? It is addressed in section MIGRATING ONE
> INSTANCE, point 2.b.ii which says "If a file is present at the destination
> but did not match (by size or timestamp) with the source file, then local
> file is deleted and added to list of files to download.". And after
> download by final data copy task, file should match with source.
>
> On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
> wrote:
>
>> Oh, one note on this item:
>>
>> >  The operator can ensure that files in the destination matches with
>> the source. In the first iteration of this feature, an API is introduced to
>> calculate digest for the list of file names and their lengths to identify
>> any mismatches. It does not validate the file contents at the binary level,
>> but, such feature can be added at a later point of time.
>>
>> When enabled for LCS, single sstable uplevel will mutate only the level
>> of an SSTable in its stats metadata component, which wouldn't alter the
>> filename and may not alter the length of the stats metadata component. A
>> change to the level of an SSTable on the source via single sstable uplevel
>> may not be caught by a digest based only on filename and length.
>>
>> Including the file’s modification timestamp would address this without
>> requiring a deep hash of the data. This would be good to include to ensure
>> SSTables aren’t downleveled unexpectedly during migration.
>>
>> - Scott
>>
>> On Apr 8, 2024, at 2:15 PM, C. Scott Andreas 
>> wrote:
>>
>> 
>> Hi Jon,
>>
>> Thanks for taking the time to read and reply to this proposal. Would
>> encourage you to approach it from an attitude of seeking understanding on
>> the part of the first-time CEP author, as this reply casts it off pretty
>> quickly as NIH.
>>
>> The proposal isn't mine, but I'll offer a few notes on where I see this
>> as valuable:
>>
>> – It's valuable for Cassandra to have an ecosystem-native mechanism of
>> migrating data bet

Cassandra Code Coverage Reports

2024-04-11 Thread Abe Ratnofsky
Hey folks,

Recently I put together per-suite and consolidated code coverage reports in 
advance of our upcoming releases. I’ve uploaded them to a static site on GitHub 
Pages so you can view them without downloading anything: 
https://aber.io/cassandra-coverage-reports

Disclaimers: This is just for informational purposes. I’m not advocating for 
any changes to the contribution process. I wanted to share this data as we go 
into qualification for our next major release, and hopefully start a discussion 
on how we can we can continue to improve our testing. I recognize that coverage 
is not a sole indicator of code quality or defect rate.

Potential areas for discussion:

- How can we improve the coverage of our fuzz suite?
- Are the 5.0 upgrade risks adequately covered? In particular, things like 
Config compatibility, Schema compatibility, mixed-version operation, etc.

These coverage reports are based on a fork of trunk with merge base 
16b43e4d4bd4b49029c0fc360bae1e732a7d5aae. The branch used to produce these 
reports is available here: 
https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:coverage-reports

I’m looking to merge these changes into trunk since they include fixes for a 
few issues that prevented correct coverage collection in the past, particularly 
for jvm-dtest-upgrade and jvm-dtest-fuzz suites. Feedback welcome.

--
Abe