Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-07 Thread Ekaterina Dimitrova
Thank you all for the kind words here and some of you reached out in Slack,
thank you for the warm welcome!

On Wed, 5 Mar 2025 at 22:43, Tolbert, Andy  wrote:

> Congratulations Ekaterina!!
>
> On Wed, Mar 5, 2025 at 8:52 PM Jordan West  wrote:
>
>> Congratulations!!!
>> On Wed, Mar 5, 2025 at 07:01 Abe Ratnofsky  wrote:
>>
>>> Congratulations Ekaterina! 🎉
>>>
>>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jon Haddad
If that's not your intent, then you should be more careful with your
replies.  When you write something like this:

> While this might work, what I find tricky is that we are forcing this to
users. Not everybody is interested in putting everything to a bucket and
server traffic from that. They just don't want to do that. Because reasons.
They are just happy with what they have etc, it works fine for years and so
on. They just want to upload SSTables upon snapshotting and call it a day.

> I don't think we should force our worldview on them if they are not
interested in it.

It comes off *extremely* negative.  You use the word "force" here multiple
times.




On Fri, Mar 7, 2025 at 9:18 AM Štefan Miklošovič 
wrote:

> I was explaining multiple times (1) that I don't have anything against
> what is discussed here.
>
> Having questions about what that is going to look like does not mean I am
> dismissive.
>
> (1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg
>
> On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad  wrote:
>
>> Nobody is saying you can't work with a mount, and this isn't a
>> conversation about snapshots.
>>
>> Nobody is forcing users to use object storage either.
>>
>> You're making a ton of negative assumptions here about both the
>> discussion, and the people you're having it with.  Try to be more open
>> minded.
>>
>>
>> On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič 
>> wrote:
>>
>>> The only way I see that working is that, if everything was in a bucket,
>>> if you take a snapshot, these SSTables would be "copied" from live data dir
>>> (living in a bucket) to snapshots dir (living in a bucket). Basically, we
>>> would need to say "and if you go to take a snapshot on this table, instead
>>> of hardlinking these SSTables, do a copy". But this "copying" would be
>>> internal to a bucket itself. We would not need to "upload" from node's
>>> machine to s3.
>>>
>>> While this might work, what I find tricky is that we are forcing this to
>>> users. Not everybody is interested in putting everything to a bucket and
>>> server traffic from that. They just don't want to do that. Because reasons.
>>> They are just happy with what they have etc, it works fine for years and so
>>> on. They just want to upload SSTables upon snapshotting and call it a day.
>>>
>>> I don't think we should force our worldview on them if they are not
>>> interested in it.
>>>
>>> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič <
>>> smikloso...@apache.org> wrote:
>>>
 BTW, snapshots are quite special because these are not "files", they
 are just hard links. They "materialize" as regular files once underlying
 SSTables are compacted away. How are you going to hardlink from local
 storage to an object storage anyway? We will always need to "upload".

 On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič <
 smikloso...@apache.org> wrote:

> Jon,
>
> all "big three" support mounting a bucket locally. That being said, I
> do not think that completely ditching this possibility for Cassandra
> working with a mount, e.g. for just uploading snapshots there etc, is
> reasonable.
>
> GCP
>
>
> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket
>
> Azure (this one is quite sophisticated), lot of options ...
>
>
> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL
>
> S3, lot of options how to mount that
>
> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system
>
> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad 
> wrote:
>
>> Assuming everything else is identical, might not matter for S3.
>> However, not every object store has a filesystem mount.
>>
>> Regarding sprawling dependencies, we can always make the provider
>> specific libraries available as a separate download and put them on their
>> own thread with a separate class path. I think in JVM dtest does this
>> already.  Someone just started asking about IAM for login, it sounds 
>> like a
>> similar problem.
>>
>>
>> On Thu, Mar 6, 2025 at 12:53 AM Benedict  wrote:
>>
>>> I think another way of saying what Stefan may be getting at is what
>>> does a library give us that an appropriately configured mount dir 
>>> doesn’t?
>>>
>>> We don’t want to treat S3 the same as local disk, but this can be
>>> achieved easily with config. Is there some other benefit of direct
>>> integration? Well defined exceptions if we need to distinguish cases is 
>>> one
>>> that maybe springs to mind but perhaps there are others?
>>>
>>>
>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>>
>>> That is cool but this still does not show / explain how it would
>>> look like when it comes to dependencies needed for actually talking to
>>> storages like s3.
>

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-07 Thread Jon Haddad
As long as operators are able to use all the OTel tooling, I'm happy.  I'm
not looking to try to decide what the metrics API looks like, although I
think trying to plan for 15 years out is a bit unnecessary. A lot of the DB
will be replaced by then.  That said, I'm mostly hands off on code and you
guys are more than capable of making the smart decision here.

Regarding virtual tables, I'm looking at writing a custom OTel receiver [1]
to ingest them.  I was really impressed with the performance work you did
there and it got my wheels turning on how to best make use of it.  I am
planning on using it with easy-cass-lab to pull DB metrics and logs down to
my local machine along with kernel metrics via eBPF.

Jon

[1] https://opentelemetry.io/docs/collector/building/receiver/



On Wed, Mar 5, 2025 at 1:06 PM Maxim Muzafarov  wrote:

> If we do swap, we may run into the same issues with third-party
> metrics libraries in the next 10-15 years that we are discussing now
> with the Codahale we added ~10-15 years ago, and given the fact that a
> proposed new API is quite small my personal feeling is that it would
> be our best choice for the metrics.
>
> Having our own API also doesn't prevent us from having all the
> integrations with new 3-rd party libraries the world will develop in
> future, just by writing custom adapters to our own -- this will be
> possible for the Codahale (with some suboptimal considerations), where
> we have to support backwards compatibility, and for the OpenTelemetry
> as well. We already have the CEP-32[1] proposal to instrument metrics;
> in this sense, it doesn't change much for us.
>
> Another point of having our own API is the virtual tables we have --
> it gives us enough flexibility and latitude to export the metrics
> efficiently via the virtual tables by implementing the access patterns
> we consider important.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071749#CEP32:(DRAFT)OpenTelemetryintegration-ExportingMetricsthroughOpenTelemetry
> [2 https://opentelemetry.io/docs/languages/java/instrumentation/
>
> On Wed, 5 Mar 2025 at 21:35, Jeff Jirsa  wrote:
> >
> > I think widely accepted that otel in general has won this stage of
> observability, as most metrics systems allow it and most saas providers
> support it. So Jon’s point there is important.
> >
> > The promise of unifying logs/traces/metrics usually (aka wide events) is
> far more important in the tracing side of our observability than in the
> areas we use Codahale/DropWizard.
> >
> > Scott: if we swap, we can (probably should) deprecate like everything
> else, and run both side by side for a release so people don’t lose metrics
> entirely on bounce? FF both, to control double cost during the transition.
> >
> >
> >
> >
> > On Mar 5, 2025, at 8:21 PM, C. Scott Andreas 
> wrote:
> >
> > No strong opinion on particular choice of metrics library.
> >
> > My primary feedback is that if we swap metrics implementations and the
> new values are *different*, we can anticipate broad user confusion/interest.
> >
> > In particular if latency stats are reported higher post-upgrade, we
> should expect users to interpret this as a performance regression,
> dedicating significant resources to investigating the change, and expending
> credibility with stakeholders in their systems.
> >
> > - Scott
> >
> > On Mar 5, 2025, at 11:57 AM, Benedict  wrote:
> >
> > 
> > I really like the idea of integrating tracing, metrics and logging
> frameworks.
> >
> > I would like to have the time to look closely at the API before we
> decide to adopt it though. I agree that a widely deployed API has inherent
> benefits, but any API we adopt also shapes future evolution of our
> capabilities. Hopefully this is also a good API that allows us plenty of
> evolutionary headroom.
> >
> >
> > On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:
> >
> > 
> >
> > if the plan is to rip out something old and unmaintained and replace
> with something new, I think there's a huge win to be had by implementing
> the standard that everyone's using now.
> >
> > Strong +1 on anything that's an ecosystem integration inflection point.
> The added benefit here is that if we architect ourselves to gracefully
> integrate with whatever system's are ubiquitous today, we'll inherit the
> migration work that any new industry-wide replacement system would need to
> do to become the new de facto standard.
> >
> > On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:
> >
> > Thank you for the replies.
> >
> > Dmitry: Based on some other patches you've worked on and your
> explanation here, it looks like you're optimizing the front door portion of
> write path - very cool.  Testing it in isolation with those settings makes
> sense if your goal is to push write throughput as far as you can, something
> I'm very much on board with, and is a key component to pushing density and
> reducing cost.  I'm spinning up a 5.0 cluster now to run a test, so I'll
> run a 

Re: [UPDATE] CEP-37

2025-03-07 Thread Jordan West
Thank you for the update Jaydeep. Very excited to see the progress here.
I’m removed the internal status of our deployment of it now but from the
JIRAs, meetings, and other conversations my impression is this feature has
been heavily tested and is production grade.

Jordan

On Fri, Mar 7, 2025 at 11:46 Jaydeep Chovatia 
wrote:

> Hello Everyone,
>
> I wanted to update you on CEP-37
> 
>  (Jira:
> CASSANDRA-19918 )
> work.
> Over the last year, some of us (Andy Tolbert, Chris Lohfink, Francisco
> Guerrero, and Kristijonas Zalys) have been working closely on making
> CEP-37 rock solid, with support from Josh McKenzie, Dinesh Joshi, and David
> Capwell.
> First and foremost, a huge thank you to everyone, including the
> broader Apache Cassandra community, for their invaluable contributions in
> making CEP-37 robust and solid!
>
> Here is the current status:
>
> *Feature stability*
>
>- *Voted feature:* All the features mentioned in CEP-37 have worked as
>expected.
>- *Post-voted feature:* A few new minor improvements
>
> 
>have been added to post-voting, and they are also working as expected.
>- Tested the functionality by multiple people over the period of time.
>- Some other facts: it has already been validated at scale
>. Another big Cassandra
>use case is in the process of validating/adopting it in their environment.
>
> *Source Code*
>
>- It is an opt-in feature; nobody notices anything unless someone opts
>in.
>- By default, this feature is pretty isolated (in a separate package)
>from the source code point of view (94% of the source code lines are in the
>new files)
>- A thorough documentation has been added:
>   - overview.doc
>   - metrics.doc
>   - cassandra.yaml doc
>   - NEWS.txt overview
>- Five people (Andy Tolbert, Chris Lohfink, Francisco Guerrero, and
>Kristijonas Zalys) have contributed.
>- The source code has been reviewed multiple times by the same five
>people.
>
> *Test Coverage*
>
>- A comprehensive test coverage has been added to cover all aspects.
>- The entire test suite has been passing
>
>
> We are in the final review phase and nearly ready to merge. If anyone has
> any last-minute feedback, this is the final opportunity for review.
>
> Thank you!
> Andy Tolbert, Chris Lohfink, Francisco Guerrero, Kristijonas Zalys, and
> Jaydeep
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
Jon,

all "big three" support mounting a bucket locally. That being said, I do
not think that completely ditching this possibility for Cassandra working
with a mount, e.g. for just uploading snapshots there etc, is reasonable.

GCP

https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket

Azure (this one is quite sophisticated), lot of options ...

https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL

S3, lot of options how to mount that

https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system

On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad  wrote:

> Assuming everything else is identical, might not matter for S3. However,
> not every object store has a filesystem mount.
>
> Regarding sprawling dependencies, we can always make the provider specific
> libraries available as a separate download and put them on their own thread
> with a separate class path. I think in JVM dtest does this already.
> Someone just started asking about IAM for login, it sounds like a similar
> problem.
>
>
> On Thu, Mar 6, 2025 at 12:53 AM Benedict  wrote:
>
>> I think another way of saying what Stefan may be getting at is what does
>> a library give us that an appropriately configured mount dir doesn’t?
>>
>> We don’t want to treat S3 the same as local disk, but this can be
>> achieved easily with config. Is there some other benefit of direct
>> integration? Well defined exceptions if we need to distinguish cases is one
>> that maybe springs to mind but perhaps there are others?
>>
>>
>> On 6 Mar 2025, at 08:39, Štefan Miklošovič 
>> wrote:
>>
>> 
>>
>> That is cool but this still does not show / explain how it would look
>> like when it comes to dependencies needed for actually talking to storages
>> like s3.
>>
>> Maybe I am missing something here and please explain when I am mistaken
>> but If I understand that correctly, for talking to s3 we would need to use
>> a library like this, right? (1). So that would be added among Cassandra
>> dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every
>> time somebody comes up with a new remote storage support, that would be
>> added to classpath as well? How are these dependencies going to play with
>> each other and with Cassandra in general? Will all these storage
>> provider libraries for arbitrary clouds be even compatible with Cassandra
>> licence-wise?
>>
>> I am sorry I keep repeating these questions but this part of that I just
>> don't get at all.
>>
>> We can indeed add an API for this, sure sure, why not. But for people who
>> do not want to deal with this at all and just be OK with a FS mounted, why
>> would we block them doing that?
>>
>> (1)
>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>>
>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever  wrote:
>>
>>>.
>>>
>>>
>>> It’s not an area where I can currently dedicate engineering effort. But
 if others are interested in contributing a feature like this, I’d see it as
 valuable for the project and would be happy to collaborate on
 design/architecture/goals.

>>>
>>>
>>> Jake mentioned 17 months ago a custom FileSystemProvider we could offer.
>>>
>>> None of us at DataStax has gotten around to providing that, but to
>>> quickly throw something over the wall this is it:
>>>
>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>>>
>>>   (with a few friend classes under o.a.c.io.util)
>>>
>>> We then have a RemoteStorageProvider, private in another repo, that
>>> implements that and also provides the RemoteFileSystemProvider that Jake
>>> refers to.
>>>
>>> Hopefully that's a start to get people thinking about CEP level details,
>>> while we get a cleaned abstract of RemoteStorageProvider and friends to
>>> offer.
>>>
>>>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
BTW, snapshots are quite special because these are not "files", they are
just hard links. They "materialize" as regular files once underlying
SSTables are compacted away. How are you going to hardlink from local
storage to an object storage anyway? We will always need to "upload".

On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič 
wrote:

> Jon,
>
> all "big three" support mounting a bucket locally. That being said, I do
> not think that completely ditching this possibility for Cassandra working
> with a mount, e.g. for just uploading snapshots there etc, is reasonable.
>
> GCP
>
>
> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket
>
> Azure (this one is quite sophisticated), lot of options ...
>
>
> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL
>
> S3, lot of options how to mount that
>
> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system
>
> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad  wrote:
>
>> Assuming everything else is identical, might not matter for S3. However,
>> not every object store has a filesystem mount.
>>
>> Regarding sprawling dependencies, we can always make the provider
>> specific libraries available as a separate download and put them on their
>> own thread with a separate class path. I think in JVM dtest does this
>> already.  Someone just started asking about IAM for login, it sounds like a
>> similar problem.
>>
>>
>> On Thu, Mar 6, 2025 at 12:53 AM Benedict  wrote:
>>
>>> I think another way of saying what Stefan may be getting at is what does
>>> a library give us that an appropriately configured mount dir doesn’t?
>>>
>>> We don’t want to treat S3 the same as local disk, but this can be
>>> achieved easily with config. Is there some other benefit of direct
>>> integration? Well defined exceptions if we need to distinguish cases is one
>>> that maybe springs to mind but perhaps there are others?
>>>
>>>
>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>>
>>> That is cool but this still does not show / explain how it would look
>>> like when it comes to dependencies needed for actually talking to storages
>>> like s3.
>>>
>>> Maybe I am missing something here and please explain when I am mistaken
>>> but If I understand that correctly, for talking to s3 we would need to use
>>> a library like this, right? (1). So that would be added among Cassandra
>>> dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every
>>> time somebody comes up with a new remote storage support, that would be
>>> added to classpath as well? How are these dependencies going to play with
>>> each other and with Cassandra in general? Will all these storage
>>> provider libraries for arbitrary clouds be even compatible with Cassandra
>>> licence-wise?
>>>
>>> I am sorry I keep repeating these questions but this part of that I just
>>> don't get at all.
>>>
>>> We can indeed add an API for this, sure sure, why not. But for people
>>> who do not want to deal with this at all and just be OK with a FS mounted,
>>> why would we block them doing that?
>>>
>>> (1)
>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>>>
>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever  wrote:
>>>
.


 It’s not an area where I can currently dedicate engineering effort. But
> if others are interested in contributing a feature like this, I’d see it 
> as
> valuable for the project and would be happy to collaborate on
> design/architecture/goals.
>


 Jake mentioned 17 months ago a custom FileSystemProvider we could offer.

 None of us at DataStax has gotten around to providing that, but to
 quickly throw something over the wall this is it:

 https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java

   (with a few friend classes under o.a.c.io.util)

 We then have a RemoteStorageProvider, private in another repo, that
 implements that and also provides the RemoteFileSystemProvider that Jake
 refers to.

 Hopefully that's a start to get people thinking about CEP level
 details, while we get a cleaned abstract of RemoteStorageProvider and
 friends to offer.




Re: CEP-15 Update

2025-03-07 Thread Jordan West
I would love to have my questions answered and see some graphs I don’t
think those are unreasonable asks nor do they take away from the awesome
work done. I was suggesting 1-2 weeks for folks to have the opportunity to
produce that data if the original authors didn’t have time. I also don’t
think that’s unreasonable. but to be clear I’m not blocking anything. If
folks want to merge I am not objecting.

I do think we should hold features to a high standard and personally “time
worked on a feature” is not a criteria for me when considering why we
should merge. It is absolutely worth recognizing and celebrating the
massive invest and effort made here. It’s just an orthogonal point to me.
As a contrived example: If 15452 was not as impactful performance wise
after a year of on and off work I would’ve happily continue to address it
or take a different approach. SASI took a year and a half or more and I
still regret that we merged it into 3.x in the form we did using the same
early contribution model. That was an example of an extreme, and out of our
control case, of an entire team disbanding right after merge.

Jordan

On Fri, Mar 7, 2025 at 06:28 Jon Haddad  wrote:

> I defer to the judgement of the folks that are most impacted by it - ones
> that are in the code, working on the next release.  If you all think it's
> good to merge, then I am 100% in support of it.  I suspect merging will
> help get it out faster, and I don't see any future in which we don't ship
> this in the next release.
>
> I will be happy to help answer the "how does it compare to paxos v2"
> question post-merge.
>
> Jon
>
>
>
> On Fri, Mar 7, 2025 at 5:52 AM Josh McKenzie  wrote:
>
>> 3.5 years is an incredible amount of time and work; it really is
>> significant and thanks to everyone involved for the investment of time and
>> energy.
>>
>> We have a rocky history with large, disruptive contributions in the past
>> that have either blocked forward progress post-merge (CASSANDRA-8099), or
>> lingered in the code-base increasing maintenance burden on other
>> contributors for minimal or no user benefit (early open post SSD
>> transition, witness replicas, materialized views). I'm sympathetic to where
>> Jordan's questions stem from, as our history of leaving things in the
>> codebase long after they've become vestigial or abandoned has slowed down
>> our collective momentum maintaining the project on actively used features.
>>
>> That said, I don't think Accord will run afoul of some of those same
>> patterns. Aside from the degree of investment already in it and sheer
>> number of pmc members and committers involved, I believe it's a feature
>> that's universally impactful and that if we had a metaphorical bus-factor
>> change (entire group of people working on it disappeared the day after
>> merge or decided to go on vacation for 5 years), others in the community
>> would be willing to pick things up and keep it moving given its proximity
>> to release readiness.
>>
>> The 2 questions Jordan asked resonate with me: 1) do we have line of
>> sight to a fix on the schema issues, and I'll take the liberty of reframing
>> 2) do we have line of sight to improvement on the performance front to be
>> usable for multi-key transactions? (subtle: I don't think "parity with
>> PaxosV2" is the right target, but rather "fast enough to be usable for
>> multi-key transactions" since it's a new query paradigm).
>>
>> Given the context on contributor backing and if the answer is yes to
>> those 2 questions (which I believe it is), I think we should generally be
>> comfortable with merging the feature as experimental at this time.
>>
>> On Fri, Mar 7, 2025, at 12:54 AM, Benedict wrote:
>>
>>
>> There are essentially three possible timelines to choose from here:
>>
>> 1) We agree in the next few days to merge to trunk. We will then
>> prioritise rebasing onto trunk and resolving any pre-merge items starting
>> next week.
>> 2) There’s some more debate and agreement to merge to trunk in a week or
>> two. In the meantime we will shift to internal-first development but we’ll
>> likely prioritise the above work as soon as we can, which may be in a few
>> weeks, so we can shift to trunk first development.
>> 3) We don’t agree to merge accord anytime soon, so we shift to
>> internal-first development for the time being. I’m not sure when we will
>> prioritise any of the above.
>>
>> Our resources are finite and we’ve exhausted them (literally), so it’s
>> pretty much pick one of the above. I don’t really mind which you pick, but
>> I won’t personally be prioritising merge after this third attempt.
>>
>>
>> On 6 Mar 2025, at 22:01, Jon Haddad  wrote:
>>
>> 
>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like
>> it's several hundred commits behind trunk.  Since you'll need to rebase
>> again before merge *anyways*, would it make sense to do it once more, and I
>> can publish easy-cass-lab with the latest branch?  If folks have concerns,
>>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jon Haddad
Nobody is saying you can't work with a mount, and this isn't a conversation
about snapshots.

Nobody is forcing users to use object storage either.

You're making a ton of negative assumptions here about both the discussion,
and the people you're having it with.  Try to be more open minded.


On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič 
wrote:

> The only way I see that working is that, if everything was in a bucket, if
> you take a snapshot, these SSTables would be "copied" from live data dir
> (living in a bucket) to snapshots dir (living in a bucket). Basically, we
> would need to say "and if you go to take a snapshot on this table, instead
> of hardlinking these SSTables, do a copy". But this "copying" would be
> internal to a bucket itself. We would not need to "upload" from node's
> machine to s3.
>
> While this might work, what I find tricky is that we are forcing this to
> users. Not everybody is interested in putting everything to a bucket and
> server traffic from that. They just don't want to do that. Because reasons.
> They are just happy with what they have etc, it works fine for years and so
> on. They just want to upload SSTables upon snapshotting and call it a day.
>
> I don't think we should force our worldview on them if they are not
> interested in it.
>
> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič 
> wrote:
>
>> BTW, snapshots are quite special because these are not "files", they are
>> just hard links. They "materialize" as regular files once underlying
>> SSTables are compacted away. How are you going to hardlink from local
>> storage to an object storage anyway? We will always need to "upload".
>>
>> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič 
>> wrote:
>>
>>> Jon,
>>>
>>> all "big three" support mounting a bucket locally. That being said, I do
>>> not think that completely ditching this possibility for Cassandra working
>>> with a mount, e.g. for just uploading snapshots there etc, is reasonable.
>>>
>>> GCP
>>>
>>>
>>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket
>>>
>>> Azure (this one is quite sophisticated), lot of options ...
>>>
>>>
>>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL
>>>
>>> S3, lot of options how to mount that
>>>
>>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system
>>>
>>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad 
>>> wrote:
>>>
 Assuming everything else is identical, might not matter for S3.
 However, not every object store has a filesystem mount.

 Regarding sprawling dependencies, we can always make the provider
 specific libraries available as a separate download and put them on their
 own thread with a separate class path. I think in JVM dtest does this
 already.  Someone just started asking about IAM for login, it sounds like a
 similar problem.


 On Thu, Mar 6, 2025 at 12:53 AM Benedict  wrote:

> I think another way of saying what Stefan may be getting at is what
> does a library give us that an appropriately configured mount dir doesn’t?
>
> We don’t want to treat S3 the same as local disk, but this can be
> achieved easily with config. Is there some other benefit of direct
> integration? Well defined exceptions if we need to distinguish cases is 
> one
> that maybe springs to mind but perhaps there are others?
>
>
> On 6 Mar 2025, at 08:39, Štefan Miklošovič 
> wrote:
>
> 
>
> That is cool but this still does not show / explain how it would look
> like when it comes to dependencies needed for actually talking to storages
> like s3.
>
> Maybe I am missing something here and please explain when I am
> mistaken but If I understand that correctly, for talking to s3 we would
> need to use a library like this, right? (1). So that would be added among
> Cassandra dependencies? Hence Cassandra starts to be biased against s3? 
> Why
> s3? Every time somebody comes up with a new remote storage support, that
> would be added to classpath as well? How are these dependencies going to
> play with each other and with Cassandra in general? Will all these storage
> provider libraries for arbitrary clouds be even compatible with Cassandra
> licence-wise?
>
> I am sorry I keep repeating these questions but this part of that I
> just don't get at all.
>
> We can indeed add an API for this, sure sure, why not. But for people
> who do not want to deal with this at all and just be OK with a FS mounted,
> why would we block them doing that?
>
> (1)
> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>
> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever  wrote:
>
>>.
>>
>>
>> It’s not an area where I can currently dedicate engineering effort.
>>> But if others are interested in contributing a feature like this, I’

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
The only way I see that working is that, if everything was in a bucket, if
you take a snapshot, these SSTables would be "copied" from live data dir
(living in a bucket) to snapshots dir (living in a bucket). Basically, we
would need to say "and if you go to take a snapshot on this table, instead
of hardlinking these SSTables, do a copy". But this "copying" would be
internal to a bucket itself. We would not need to "upload" from node's
machine to s3.

While this might work, what I find tricky is that we are forcing this to
users. Not everybody is interested in putting everything to a bucket and
server traffic from that. They just don't want to do that. Because reasons.
They are just happy with what they have etc, it works fine for years and so
on. They just want to upload SSTables upon snapshotting and call it a day.

I don't think we should force our worldview on them if they are not
interested in it.

On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič 
wrote:

> BTW, snapshots are quite special because these are not "files", they are
> just hard links. They "materialize" as regular files once underlying
> SSTables are compacted away. How are you going to hardlink from local
> storage to an object storage anyway? We will always need to "upload".
>
> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič 
> wrote:
>
>> Jon,
>>
>> all "big three" support mounting a bucket locally. That being said, I do
>> not think that completely ditching this possibility for Cassandra working
>> with a mount, e.g. for just uploading snapshots there etc, is reasonable.
>>
>> GCP
>>
>>
>> https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket
>>
>> Azure (this one is quite sophisticated), lot of options ...
>>
>>
>> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL
>>
>> S3, lot of options how to mount that
>>
>> https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system
>>
>> On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad 
>> wrote:
>>
>>> Assuming everything else is identical, might not matter for S3. However,
>>> not every object store has a filesystem mount.
>>>
>>> Regarding sprawling dependencies, we can always make the provider
>>> specific libraries available as a separate download and put them on their
>>> own thread with a separate class path. I think in JVM dtest does this
>>> already.  Someone just started asking about IAM for login, it sounds like a
>>> similar problem.
>>>
>>>
>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict  wrote:
>>>
 I think another way of saying what Stefan may be getting at is what
 does a library give us that an appropriately configured mount dir doesn’t?

 We don’t want to treat S3 the same as local disk, but this can be
 achieved easily with config. Is there some other benefit of direct
 integration? Well defined exceptions if we need to distinguish cases is one
 that maybe springs to mind but perhaps there are others?


 On 6 Mar 2025, at 08:39, Štefan Miklošovič 
 wrote:

 

 That is cool but this still does not show / explain how it would look
 like when it comes to dependencies needed for actually talking to storages
 like s3.

 Maybe I am missing something here and please explain when I am mistaken
 but If I understand that correctly, for talking to s3 we would need to use
 a library like this, right? (1). So that would be added among Cassandra
 dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every
 time somebody comes up with a new remote storage support, that would be
 added to classpath as well? How are these dependencies going to play with
 each other and with Cassandra in general? Will all these storage
 provider libraries for arbitrary clouds be even compatible with Cassandra
 licence-wise?

 I am sorry I keep repeating these questions but this part of that I
 just don't get at all.

 We can indeed add an API for this, sure sure, why not. But for people
 who do not want to deal with this at all and just be OK with a FS mounted,
 why would we block them doing that?

 (1)
 https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml

 On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever  wrote:

>.
>
>
> It’s not an area where I can currently dedicate engineering effort.
>> But if others are interested in contributing a feature like this, I’d see
>> it as valuable for the project and would be happy to collaborate on
>> design/architecture/goals.
>>
>
>
> Jake mentioned 17 months ago a custom FileSystemProvider we could
> offer.
>
> None of us at DataStax has gotten around to providing that, but to
> quickly throw something over the wall this is it:
>
> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>
>   

Re: [DISCUSS] AWS IAM-based client authentication

2025-03-07 Thread C. Scott Andreas

Joel, thanks for reaching out. This sounds interesting, I bet there are many who 
would benefit from IAM-based authentication. If you haven't yet, could you request a 
Jira account? Someone will be able to approve it almost immediately if you don't have 
one yet. https://selfserve.apache.org/jira-account.html For discussing/reviewing the 
implementation, I'd make the repos public and create a ticket under the database [1] 
and driver [2] projects with a description and source link to start. For new feature 
proposals, we'll usually open with a discuss thread as you've started here. That 
discussion will gauge receptivity and whether to proceed by acclamation; or whether 
the proposal is significant enough in scope to warrant a CEP doc and vote thread [3]. 
Cheers, – Scott [1] http://issues.apache.org/jira/browse/CASSANDRA [2] 
http://issues.apache.org/jira/browse/CASSJAVA [3] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201 On Mar 4, 
2025, at 12:48 PM, Joel Shepherd  wrote: Hi - I have a 
side project that provides client- and node-side Java plug-ins to enable 
client-to-node authentication based on AWS identities. This would, for example, 
enable clients to use EC2 instance roles to authenticate to Cassandra nodes, or use 
ordinary IAM keys/secret keys. The client needs to be able to obtain valid IAM 
credentials to sign a request, and the node needs to be able to connect to a public 
AWS Security Token Service (STS) endpoint. There are no other required AWS 
dependencies, and (I believe) no changes required driver or node code: just minor 
configuration updates. I'm seeking help in reviewing the concept and code. I'm new to 
this community, so I'm looking for suggestions on how to best engage you on this. The 
code (which is not quite production-ready) is in two private GitHub repos which I'm 
happy to grant access to for early review. I can also provide documentation on the 
approach: not sure whether that's best shared via this thread, a CEP, repo 
documentation ... suggestions wanted. Thanks: I'd appreciate any and all help in 
making these plug-ins available to the community. -- Joel.

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread guo Maxwell
Thank you very much, I'm certainly interested. I'll start working on the
update for cep-36 next week.


Mick Semb Wever 于2025年3月7日 周五下午7:07写道:

>
>
> On Thu, 6 Mar 2025 at 09:40, Štefan Miklošovič 
> wrote:
>
>> That is cool but this still does not show / explain how it would look
>> like when it comes to dependencies needed for actually talking to storages
>> like s3.
>>
>
>
> As Benedict writes, dealing with optional dependencies is not hard (and as
> Jon writes we should work on improving how we deal with it).
>
> More configurable paths would also be welcome (snapshots, backups, etc),
> and you can already see that in the StorageProvider.DirectoryType enum.
> Given the overlap this has with other work ongoing the CEP should be
> written to account and coordinate for that.  Simply configuring different
> paths will certainly satisfy a lot of user's demands.
>
> Object storage is still a very valuable goal (as Jeff and Scott writes).
> Maxwell, are you interested in updating the CEP to reflect all the new
> input ? I believe that's the next step here, and I'd be happy to help if
> you took the lead.
>
>


Re: CEP-15 Update

2025-03-07 Thread Jon Haddad
I defer to the judgement of the folks that are most impacted by it - ones
that are in the code, working on the next release.  If you all think it's
good to merge, then I am 100% in support of it.  I suspect merging will
help get it out faster, and I don't see any future in which we don't ship
this in the next release.

I will be happy to help answer the "how does it compare to paxos v2"
question post-merge.

Jon



On Fri, Mar 7, 2025 at 5:52 AM Josh McKenzie  wrote:

> 3.5 years is an incredible amount of time and work; it really is
> significant and thanks to everyone involved for the investment of time and
> energy.
>
> We have a rocky history with large, disruptive contributions in the past
> that have either blocked forward progress post-merge (CASSANDRA-8099), or
> lingered in the code-base increasing maintenance burden on other
> contributors for minimal or no user benefit (early open post SSD
> transition, witness replicas, materialized views). I'm sympathetic to where
> Jordan's questions stem from, as our history of leaving things in the
> codebase long after they've become vestigial or abandoned has slowed down
> our collective momentum maintaining the project on actively used features.
>
> That said, I don't think Accord will run afoul of some of those same
> patterns. Aside from the degree of investment already in it and sheer
> number of pmc members and committers involved, I believe it's a feature
> that's universally impactful and that if we had a metaphorical bus-factor
> change (entire group of people working on it disappeared the day after
> merge or decided to go on vacation for 5 years), others in the community
> would be willing to pick things up and keep it moving given its proximity
> to release readiness.
>
> The 2 questions Jordan asked resonate with me: 1) do we have line of sight
> to a fix on the schema issues, and I'll take the liberty of reframing 2) do
> we have line of sight to improvement on the performance front to be usable
> for multi-key transactions? (subtle: I don't think "parity with PaxosV2" is
> the right target, but rather "fast enough to be usable for multi-key
> transactions" since it's a new query paradigm).
>
> Given the context on contributor backing and if the answer is yes to those
> 2 questions (which I believe it is), I think we should generally be
> comfortable with merging the feature as experimental at this time.
>
> On Fri, Mar 7, 2025, at 12:54 AM, Benedict wrote:
>
>
> There are essentially three possible timelines to choose from here:
>
> 1) We agree in the next few days to merge to trunk. We will then
> prioritise rebasing onto trunk and resolving any pre-merge items starting
> next week.
> 2) There’s some more debate and agreement to merge to trunk in a week or
> two. In the meantime we will shift to internal-first development but we’ll
> likely prioritise the above work as soon as we can, which may be in a few
> weeks, so we can shift to trunk first development.
> 3) We don’t agree to merge accord anytime soon, so we shift to
> internal-first development for the time being. I’m not sure when we will
> prioritise any of the above.
>
> Our resources are finite and we’ve exhausted them (literally), so it’s
> pretty much pick one of the above. I don’t really mind which you pick, but
> I won’t personally be prioritising merge after this third attempt.
>
>
> On 6 Mar 2025, at 22:01, Jon Haddad  wrote:
>
> 
> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like
> it's several hundred commits behind trunk.  Since you'll need to rebase
> again before merge *anyways*, would it make sense to do it once more, and I
> can publish easy-cass-lab with the latest branch?  If folks have concerns,
> it's easy to fire up a cluster (I do it constantly) and try it out.
>
> I think if we were to do this, out of consideration we should time box the
> amount of time for an evaluation and unless someone raises an objection,
> consider lazy consensus achieved.
>
> Jon
>
>
>
> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Because we want to validate against the latest code in trunk, else we are
> validating stale behaviours. The cost of rebasing is high, so we do not do
> it frequently. That means we will likely stop developing OSS-first, as the
> focus will have to move to our internal branch that satisfies these
> criteria.
>
> Exactly what this might be for upstreaming I cannot say. Personally, I aim
> to work exclusively on the branch we are stabilising. If that is not trunk,
> the latency for my contributions being made public might be high, as I have
> a huge imbalance of over-investment to recoup, and anything unnecessary
> will be deferred.
>
> Since the feature is disabled, and the code is almost entirely isolated, I
> cannot imagine the cost to the community to removing this work would be
> very high. But, I do not intend to argue Accord’s case here. I will let you
> all decide.
>
> Please decide

Re: CEP-15 Update

2025-03-07 Thread Josh McKenzie
3.5 years is an incredible amount of time and work; it really is significant 
and thanks to everyone involved for the investment of time and energy.

We have a rocky history with large, disruptive contributions in the past that 
have either blocked forward progress post-merge (CASSANDRA-8099), or lingered 
in the code-base increasing maintenance burden on other contributors for 
minimal or no user benefit (early open post SSD transition, witness replicas, 
materialized views). I'm sympathetic to where Jordan's questions stem from, as 
our history of leaving things in the codebase long after they've become 
vestigial or abandoned has slowed down our collective momentum maintaining the 
project on actively used features.

That said, I don't think Accord will run afoul of some of those same patterns. 
Aside from the degree of investment already in it and sheer number of pmc 
members and committers involved, I believe it's a feature that's universally 
impactful and that if we had a metaphorical bus-factor change (entire group of 
people working on it disappeared the day after merge or decided to go on 
vacation for 5 years), others in the community would be willing to pick things 
up and keep it moving given its proximity to release readiness.

The 2 questions Jordan asked resonate with me: 1) do we have line of sight to a 
fix on the schema issues, and I'll take the liberty of reframing 2) do we have 
line of sight to improvement on the performance front to be usable for 
multi-key transactions? (subtle: I don't think "parity with PaxosV2" is the 
right target, but rather "fast enough to be usable for multi-key transactions" 
since it's a new query paradigm).

Given the context on contributor backing and if the answer is yes to those 2 
questions (which I believe it is), I think we should generally be comfortable 
with merging the feature as experimental at this time.

On Fri, Mar 7, 2025, at 12:54 AM, Benedict wrote:
> 
> There are essentially three possible timelines to choose from here: 
> 
> 1) We agree in the next few days to merge to trunk. We will then prioritise 
> rebasing onto trunk and resolving any pre-merge items starting next week.
> 2) There’s some more debate and agreement to merge to trunk in a week or two. 
> In the meantime we will shift to internal-first development but we’ll likely 
> prioritise the above work as soon as we can, which may be in a few weeks, so 
> we can shift to trunk first development.
> 3) We don’t agree to merge accord anytime soon, so we shift to internal-first 
> development for the time being. I’m not sure when we will prioritise any of 
> the above.
> 
> Our resources are finite and we’ve exhausted them (literally), so it’s pretty 
> much pick one of the above. I don’t really mind which you pick, but I won’t 
> personally be prioritising merge after this third attempt.
> 
> 
>> On 6 Mar 2025, at 22:01, Jon Haddad  wrote:
>> 
>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like 
>> it's several hundred commits behind trunk.  Since you'll need to rebase 
>> again before merge *anyways*, would it make sense to do it once more, and I 
>> can publish easy-cass-lab with the latest branch?  If folks have concerns, 
>> it's easy to fire up a cluster (I do it constantly) and try it out.
>> 
>> I think if we were to do this, out of consideration we should time box the 
>> amount of time for an evaluation and unless someone raises an objection, 
>> consider lazy consensus achieved.
>> 
>> Jon
>> 
>> 
>> 
>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith  
>> wrote:
>>> Because we want to validate against the latest code in trunk, else we are 
>>> validating stale behaviours. The cost of rebasing is high, so we do not do 
>>> it frequently. That means we will likely stop developing OSS-first, as the 
>>> focus will have to move to our internal branch that satisfies these 
>>> criteria.
>>> 
>>> Exactly what this might be for upstreaming I cannot say. Personally, I aim 
>>> to work exclusively on the branch we are stabilising. If that is not trunk, 
>>> the latency for my contributions being made public might be high, as I have 
>>> a huge imbalance of over-investment to recoup, and anything unnecessary 
>>> will be deferred.
>>> 
>>> Since the feature is disabled, and the code is almost entirely isolated, I 
>>> cannot imagine the cost to the community to removing this work would be 
>>> very high. But, I do not intend to argue Accord’s case here. I will let you 
>>> all decide.
>>> 
>>> Please decide soon though, as it shapes our work planning. The positive 
>>> reception so far had lead me to consider prioritising a move to trunk-first 
>>> development within the next week or two, and the associated work that 
>>> entails. However, if that was optimistic we will have to shift our plans.
>>> 
>>> 
>>> 
 On 6 Mar 2025, at 20:16, Jordan West  wrote:
 
 The work and effort in accord has been amazing. And I’m sure it sets a new 
 

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Mick Semb Wever
On Thu, 6 Mar 2025 at 09:40, Štefan Miklošovič 
wrote:

> That is cool but this still does not show / explain how it would look like
> when it comes to dependencies needed for actually talking to storages like
> s3.
>


As Benedict writes, dealing with optional dependencies is not hard (and as
Jon writes we should work on improving how we deal with it).

More configurable paths would also be welcome (snapshots, backups, etc),
and you can already see that in the StorageProvider.DirectoryType enum.
Given the overlap this has with other work ongoing the CEP should be
written to account and coordinate for that.  Simply configuring different
paths will certainly satisfy a lot of user's demands.

Object storage is still a very valuable goal (as Jeff and Scott writes).
Maxwell, are you interested in updating the CEP to reflect all the new
input ? I believe that's the next step here, and I'd be happy to help if
you took the lead.


March 2025 project status update

2025-03-07 Thread Josh McKenzie
What happened last month? *New PMC members and committers, that's what.*

We welcomed *three* new people to the PMC in Feb: Jeremiah Jordan, Caleb 
Rackliffe, and Ekaterina Dimitrova. We're the better for all three of them 
being here with us; thank you all for your hard work over the years and 
dedication to making this project a better place to collaborate.

We added 3 new committers as well! Busy month. Maxwell Guo, Dmitry 
Konstantinov, and Aaron Ploetz. It's really heartening to see this much 
movement with us deepening our bench of both long-time contributors and new 
entrants to the project.

Congratulations again to all three of you new PMC members and you three new 
committers!

*[Releases]
*
The 4.1 and 5.0 releases that were in limbo last we emailed were released, 
along with the Java Driver 4.19.0 release, and the inaugural Sidecar 0.1.0 
release! Congratulations to everyone working hard on the sidecar to start 
gaining momentum and building inertia on the project.

*[Email Activity]
*
- We had 38(!) topics on dev@, so I'll select a handful that I think everyone 
should be aware of:
- CEP-15 Update: Benedict and the other contributors working on Accord are 
looking for a community consensus on when to merge and what prereqs might be 
reasonable. https://lists.apache.org/thread/j9qmj9rn1p0tlntc9k90bygh43y8sb57
- Joel Shepherd is exploring run-time collisions of dependencies with an 
ask around whether we've considered factoring interfaces out of the core DB 
into a shared library: 
https://lists.apache.org/thread/7x9wsn265mq5jd5q236y0nkjsw6ycbct
- Radim Vansa is looking for feedback on a patch for Spring Boot to make 
things work with the OpenJDK CRaC project: 
https://lists.apache.org/thread/9sms1sk8fd739mp7699wrbj0vnd0kzd1. Seems the 
Spring folks felt their community wasn't the right place for the work so Radim 
is wondering if ours is.
- Dmitry's opened up the discussion about dropping Dropwizard/Codahale 
metrics in the Cassandra Server; some good discussion there: 
https://lists.apache.org/thread/xwxqhmzbtxo4job1r0t8j0b9838t7jvd
- Cassandra Forward 2025 happens on March 11; see Patric's announcement and 
reminder here: https://lists.apache.org/thread/thkksv42wkbksfhv986mdwzo1r4cxdo9
- And last but definitely not least, David wins an award for the best email 
subject line in recent memory with "Meaningless emptiness and filtering": 
https://lists.apache.org/thread/h3mlccz4g6o9tfz9bgb9p4f395lpmswp
The user list has a couple topics folks have engaged with already around 
reattaching EBS replicas async and doing a PoC of Cassandra on AWS EC2 
Graviton's. Check those out if you're interested.

*[JIRA Activity]*
Closed: 106 issues closed in JIRA since Feb 3: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20!%3D%20unresolved%20and%20resolved%20%3E%3D%202025-02-03.
 Again: that's a lot. Color me impressed.

Created: 103 new issues: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20created%20%3E%3D%202025-02-03.
 Of those, 81 are still unresolved.

*[New Contributors]*
Hello and welcome! We've flagged some tickets as good starter tickets and you 
can find them in our JIRA kanban board here: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2160.
 All of these are unassigned so feel free to browse and find what interests you.

Join us on the ASF slack: https://the-asf.slack.com, in #cassandra-dev for dev 
discussion and #cassandra for user discussion. If you need an invite to the 
slack server, let me know and I'll get you setup.

*[Needs Committer]*
We have 9 tickets that qualify for needing a committer, up from 8 last month: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2170.
 For those that don't know, we require 2 +1 from committers on the project 
before merge.

*[Needs Reviewer]*
As a reminder here's our JQL: 
https://issues.apache.org/jira/issues/?filter=12353799#. We're up from 97 to 
108 tickets that are Patch Available and need a reviewer. I'll actually set a 
reminder to myself to follow up with pinging people on these old tickets this 
time. :)

*[CI]*
Butler: https://butler.cassandra.apache.org/#/
Nothing untoward seems to be happening here.; holding steady.

And there we have it folks. Lots of talking, lots of JIRAs, lots of code. Good 
month for the project.

As always - thanks everyone for your time and commitment to the project!

~Josh

Re: [RELEASE] Apache Cassandra Sidecar 0.1.0 released

2025-03-07 Thread Patrick McFadin
We happen to have some very informed engineers discussing Sidecar and
the many things you can do with it at Cassandra Forward. Come check
out the talk and give them a shout-out!
https://www.datastax.com/events/cassandra-forward-march-2025

Congrats on the release!

On Fri, Mar 7, 2025 at 10:44 AM Bernardo Botella
 wrote:
>
> This is a huge milestone! It’s incredible to see this release happening. 
> Congrats to everyone involved!
>
> On Mar 7, 2025, at 9:48 AM, Francisco Guerrero  wrote:
>
> The Cassandra team is pleased to announce the release of Apache Sidecar 
> Cassandra version 0.1.0.
>
>
> Downloads of source and binary distributions are available here:
>
>  https://dlcdn.apache.org/cassandra/cassandra-sidecar/0.1.0/
>
>
> The Maven artifacts can be found at:
>
>  https://repo.maven.apache.org/maven2/org/apache/cassandra/
>
> These will be mirrored to other repositories.
>
>
> As always, please review the changes[1] and pay attention to the release 
> notes[2]. Let us know[3] if you were to encounter any problem.
>
>
> Enjoy!
>
> [1]: CHANGES.txt 
> https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/CHANGES.txt
> [2]: NEWS.txt 
> https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/NEWS.txt
> [3]: https://issues.apache.org/jira/browse/CASSSIDECAR
>
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
Because an earlier reply hinted that mounting a bucket yields "terrible
results". That has moved the discussion, in my mind, practically to the
place of "we are not going to do this", to which I explained that in this
particular case I do not find the speed important, because the use cases
you want to use it for do not have anything in common with what I want to
use it for.

Since then, in my eyes this was binary "either / or", I was repeatedly
trying to get an opinion about being able to mount it regardless, to which,
afaik, only you explicitly expressed an opinion that it is OK but you are
not a fan of it:

"I personally can't see myself using something that treats an object store
as cold storage where SSTables are moved (implying they weren't there
before), and I've expressed my concerns with this, but other folks seem to
want it and that's OK."

So my assumption was, except you being ok with it, that mounting is not
viable, so it looks like we are forcing it.

To be super honest, if we made custom storage providers / proxies possible
and it was already in place, then my urge to do "something fast and
functioning" (e.g. mounting a bucket) would not exist. I would not use
mounted buckets if we had this already in place and configurable in such a
way that we could say that everything except (e.g) snapshots would be
treated as it is now.

But, I can see how this will take a very, very long time to implement. This
is a complex CEP to tackle. I remember this topic being discussed in the
past as well. I _think_ there were at least two occasions when this was
already discussed, that it might be ported / retrofitted from what Mick was
showing etc. Nothing happened. Maybe mounting a bucket is not perfect and
doing it the other way is a more fitting solution etc. but as the saying
goes "perfect is the enemy of good".

On Fri, Mar 7, 2025 at 6:32 PM Jon Haddad  wrote:

> If that's not your intent, then you should be more careful with your
> replies.  When you write something like this:
>
> > While this might work, what I find tricky is that we are forcing this to
> users. Not everybody is interested in putting everything to a bucket and
> server traffic from that. They just don't want to do that. Because reasons.
> They are just happy with what they have etc, it works fine for years and so
> on. They just want to upload SSTables upon snapshotting and call it a day.
>
> > I don't think we should force our worldview on them if they are not
> interested in it.
>
> It comes off *extremely* negative.  You use the word "force" here multiple
> times.
>
>
>
>
> On Fri, Mar 7, 2025 at 9:18 AM Štefan Miklošovič 
> wrote:
>
>> I was explaining multiple times (1) that I don't have anything against
>> what is discussed here.
>>
>> Having questions about what that is going to look like does not mean I am
>> dismissive.
>>
>> (1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg
>>
>> On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad 
>> wrote:
>>
>>> Nobody is saying you can't work with a mount, and this isn't a
>>> conversation about snapshots.
>>>
>>> Nobody is forcing users to use object storage either.
>>>
>>> You're making a ton of negative assumptions here about both the
>>> discussion, and the people you're having it with.  Try to be more open
>>> minded.
>>>
>>>
>>> On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič 
>>> wrote:
>>>
 The only way I see that working is that, if everything was in a bucket,
 if you take a snapshot, these SSTables would be "copied" from live data dir
 (living in a bucket) to snapshots dir (living in a bucket). Basically, we
 would need to say "and if you go to take a snapshot on this table, instead
 of hardlinking these SSTables, do a copy". But this "copying" would be
 internal to a bucket itself. We would not need to "upload" from node's
 machine to s3.

 While this might work, what I find tricky is that we are forcing this
 to users. Not everybody is interested in putting everything to a bucket and
 server traffic from that. They just don't want to do that. Because reasons.
 They are just happy with what they have etc, it works fine for years and so
 on. They just want to upload SSTables upon snapshotting and call it a day.

 I don't think we should force our worldview on them if they are not
 interested in it.

 On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič <
 smikloso...@apache.org> wrote:

> BTW, snapshots are quite special because these are not "files", they
> are just hard links. They "materialize" as regular files once underlying
> SSTables are compacted away. How are you going to hardlink from local
> storage to an object storage anyway? We will always need to "upload".
>
> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič <
> smikloso...@apache.org> wrote:
>
>> Jon,
>>
>> all "big three" support mounting a bucket locally. That being said, I
>> do

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
I was explaining multiple times (1) that I don't have anything against what
is discussed here.

Having questions about what that is going to look like does not mean I am
dismissive.

(1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg

On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad  wrote:

> Nobody is saying you can't work with a mount, and this isn't a
> conversation about snapshots.
>
> Nobody is forcing users to use object storage either.
>
> You're making a ton of negative assumptions here about both the
> discussion, and the people you're having it with.  Try to be more open
> minded.
>
>
> On Fri, Mar 7, 2025 at 2:28 AM Štefan Miklošovič 
> wrote:
>
>> The only way I see that working is that, if everything was in a bucket,
>> if you take a snapshot, these SSTables would be "copied" from live data dir
>> (living in a bucket) to snapshots dir (living in a bucket). Basically, we
>> would need to say "and if you go to take a snapshot on this table, instead
>> of hardlinking these SSTables, do a copy". But this "copying" would be
>> internal to a bucket itself. We would not need to "upload" from node's
>> machine to s3.
>>
>> While this might work, what I find tricky is that we are forcing this to
>> users. Not everybody is interested in putting everything to a bucket and
>> server traffic from that. They just don't want to do that. Because reasons.
>> They are just happy with what they have etc, it works fine for years and so
>> on. They just want to upload SSTables upon snapshotting and call it a day.
>>
>> I don't think we should force our worldview on them if they are not
>> interested in it.
>>
>> On Fri, Mar 7, 2025 at 11:02 AM Štefan Miklošovič 
>> wrote:
>>
>>> BTW, snapshots are quite special because these are not "files", they are
>>> just hard links. They "materialize" as regular files once underlying
>>> SSTables are compacted away. How are you going to hardlink from local
>>> storage to an object storage anyway? We will always need to "upload".
>>>
>>> On Fri, Mar 7, 2025 at 10:51 AM Štefan Miklošovič <
>>> smikloso...@apache.org> wrote:
>>>
 Jon,

 all "big three" support mounting a bucket locally. That being said, I
 do not think that completely ditching this possibility for Cassandra
 working with a mount, e.g. for just uploading snapshots there etc, is
 reasonable.

 GCP


 https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstart-mount-bucket

 Azure (this one is quite sophisticated), lot of options ...


 https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=RHEL

 S3, lot of options how to mount that

 https://bluexp.netapp.com/blog/amazon-s3-as-a-file-system

 On Thu, Mar 6, 2025 at 4:17 PM Jon Haddad 
 wrote:

> Assuming everything else is identical, might not matter for S3.
> However, not every object store has a filesystem mount.
>
> Regarding sprawling dependencies, we can always make the provider
> specific libraries available as a separate download and put them on their
> own thread with a separate class path. I think in JVM dtest does this
> already.  Someone just started asking about IAM for login, it sounds like 
> a
> similar problem.
>
>
> On Thu, Mar 6, 2025 at 12:53 AM Benedict  wrote:
>
>> I think another way of saying what Stefan may be getting at is what
>> does a library give us that an appropriately configured mount dir 
>> doesn’t?
>>
>> We don’t want to treat S3 the same as local disk, but this can be
>> achieved easily with config. Is there some other benefit of direct
>> integration? Well defined exceptions if we need to distinguish cases is 
>> one
>> that maybe springs to mind but perhaps there are others?
>>
>>
>> On 6 Mar 2025, at 08:39, Štefan Miklošovič 
>> wrote:
>>
>> 
>>
>> That is cool but this still does not show / explain how it would look
>> like when it comes to dependencies needed for actually talking to 
>> storages
>> like s3.
>>
>> Maybe I am missing something here and please explain when I am
>> mistaken but If I understand that correctly, for talking to s3 we would
>> need to use a library like this, right? (1). So that would be added among
>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? 
>> Why
>> s3? Every time somebody comes up with a new remote storage support, that
>> would be added to classpath as well? How are these dependencies going to
>> play with each other and with Cassandra in general? Will all these 
>> storage
>> provider libraries for arbitrary clouds be even compatible with Cassandra
>> licence-wise?
>>
>> I am sorry I keep repeating these questions but this part of that I
>> just don't get at all.
>>
>> We can indeed add an API for this, sure sure, why not.

Re: March 2025 project status update

2025-03-07 Thread Josh McKenzie
Did you know I sometimes fail at email filtering?

4 new committers! FOUR. Welcome to Bernardo Botella Corbi as well!

That was 3 days ago too. /sigh

Congrats everyone!

On Fri, Mar 7, 2025, at 1:39 PM, Josh McKenzie wrote:
> What happened last month? *New PMC members and committers, that's what.*
> 
> We welcomed *three* new people to the PMC in Feb: Jeremiah Jordan, Caleb 
> Rackliffe, and Ekaterina Dimitrova. We're the better for all three of them 
> being here with us; thank you all for your hard work over the years and 
> dedication to making this project a better place to collaborate.
> 
> We added 3 new committers as well! Busy month. Maxwell Guo, Dmitry 
> Konstantinov, and Aaron Ploetz. It's really heartening to see this much 
> movement with us deepening our bench of both long-time contributors and new 
> entrants to the project.
> 
> Congratulations again to all three of you new PMC members and you three new 
> committers!
> 
> *[Releases]*
> The 4.1 and 5.0 releases that were in limbo last we emailed were released, 
> along with the Java Driver 4.19.0 release, and the inaugural Sidecar 0.1.0 
> release! Congratulations to everyone working hard on the sidecar to start 
> gaining momentum and building inertia on the project.
> 
> *[Email Activity]*
> - We had 38(!) topics on dev@, so I'll select a handful that I think everyone 
> should be aware of:
> - CEP-15 Update: Benedict and the other contributors working on Accord 
> are looking for a community consensus on when to merge and what prereqs might 
> be reasonable. 
> https://lists.apache.org/thread/j9qmj9rn1p0tlntc9k90bygh43y8sb57
> - Joel Shepherd is exploring run-time collisions of dependencies with an 
> ask around whether we've considered factoring interfaces out of the core DB 
> into a shared library: 
> https://lists.apache.org/thread/7x9wsn265mq5jd5q236y0nkjsw6ycbct
> - Radim Vansa is looking for feedback on a patch for Spring Boot to make 
> things work with the OpenJDK CRaC project: 
> https://lists.apache.org/thread/9sms1sk8fd739mp7699wrbj0vnd0kzd1. Seems the 
> Spring folks felt their community wasn't the right place for the work so 
> Radim is wondering if ours is.
> - Dmitry's opened up the discussion about dropping Dropwizard/Codahale 
> metrics in the Cassandra Server; some good discussion there: 
> https://lists.apache.org/thread/xwxqhmzbtxo4job1r0t8j0b9838t7jvd
> - Cassandra Forward 2025 happens on March 11; see Patric's announcement 
> and reminder here: 
> https://lists.apache.org/thread/thkksv42wkbksfhv986mdwzo1r4cxdo9
> - And last but definitely not least, David wins an award for the best 
> email subject line in recent memory with "Meaningless emptiness and 
> filtering": https://lists.apache.org/thread/h3mlccz4g6o9tfz9bgb9p4f395lpmswp
> The user list has a couple topics folks have engaged with already around 
> reattaching EBS replicas async and doing a PoC of Cassandra on AWS EC2 
> Graviton's. Check those out if you're interested.
> 
> *[JIRA Activity]*
> Closed: 106 issues closed in JIRA since Feb 3: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20!%3D%20unresolved%20and%20resolved%20%3E%3D%202025-02-03.
>  Again: that's a lot. Color me impressed.
> 
> Created: 103 new issues: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20created%20%3E%3D%202025-02-03.
>  Of those, 81 are still unresolved.
> 
> *[New Contributors]*
> Hello and welcome! We've flagged some tickets as good starter tickets and you 
> can find them in our JIRA kanban board here: 
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2160.
>  All of these are unassigned so feel free to browse and find what interests 
> you.
> 
> Join us on the ASF slack: https://the-asf.slack.com, in #cassandra-dev for 
> dev discussion and #cassandra for user discussion. If you need an invite to 
> the slack server, let me know and I'll get you setup.
> 
> *[Needs Committer]*
> We have 9 tickets that qualify for needing a committer, up from 8 last month: 
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2170.
>  For those that don't know, we require 2 +1 from committers on the project 
> before merge.
> 
> *[Needs Reviewer]*
> As a reminder here's our JQL: 
> https://issues.apache.org/jira/issues/?filter=12353799#. We're up from 97 to 
> 108 tickets that are Patch Available and need a reviewer. I'll actually set a 
> reminder to myself to follow up with pinging people on these old tickets this 
> time. :)
> 
> *[CI]*
> Butler: https://butler.cassandra.apache.org/#/
> Nothing untoward seems to be happening here.; holding steady.
> 
> And there we have it folks. Lots of talking, lots of JIRAs, lots of code. 
> Good month for the project.
> 
> As always - thanks everyone for your time and commitment to the project!
> 
> ~Josh


Re: [DISCUSS] AWS IAM-based client authentication

2025-03-07 Thread Joel Shepherd

Related JIRA: https://issues.apache.org/jira/browse/CASSANDRA-20416

Includes links to the draft code and more complete documentation of the 
proposed approach.


Thanks -- Joel.

On 3/4/2025 12:48 PM, Joel Shepherd wrote:
Hi - I have a side project that provides client- and node-side Java 
plug-ins to enable client-to-node authentication based on AWS 
identities. This would, for example, enable clients to use EC2 
instance roles to authenticate to Cassandra nodes, or use ordinary IAM 
keys/secret keys. The client needs to be able to obtain valid IAM 
credentials to sign a request, and the node needs to be able to 
connect to a public AWS Security Token Service (STS) endpoint. There 
are no other required AWS dependencies, and (I believe) no changes 
required driver or node code: just minor configuration updates.


I'm seeking help in reviewing the concept and code. I'm new to this 
community,  so I'm looking for suggestions on how to best engage you 
on this.


The code (which is not quite production-ready) is in two private 
GitHub repos which I'm happy to grant access to for early review. I 
can also provide documentation on the approach: not sure whether 
that's best shared via this thread, a CEP, repo documentation ... 
suggestions wanted.


Thanks: I'd appreciate any and all help in making these plug-ins 
available to the community.


-- Joel.




Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jon Haddad
Supporting a filesystem mount is perfectly reasonable. If you wanted to use
that with the S3 mount, there's nothing that should prevent you from doing
so, and the filesystem version is probably the default implementation that
we'd want to ship with since, to your point, it doesn't require additional
dependencies.

Allowing support for object stores is just that - allowing it.  It's just
more functionality, more flexibility.  I don't claim to know every object
store or remote filesystem, there are going to be folks that want to do
something custom I can't think of right now and there's no reason to box
ourselves in.  The abstraction to allow it is a small one.  If folks want
to put SSTables on HDFS, they should be able to.  Who am I to police them,
especially if we can do it in a way that doesn't carry any risk, making it
available as a separate plugin?

My comment with regard to not wanting to treat object store as tiered
storage has everything to do with what I want to do with the data.  I want
100% of it on object store (with a copy of hot data on the local node) for
multiple reasons, some of which (maybe all) were made by Jeff and Scott:

* Analytics is easier, no transfer off C*
* Replace is dead simple, just pull the data off the mount when it boots
up.  Like using EBS, but way cheaper.  (thanks Scott for some numbers on
this)
* It makes backups easier, just copy the bucket.
* If you're using object store for backups anyways than I don't see why you
wouldn't keep all your data there
* I hadn't even really thought about scale to zero before but I love this
too

Some folks want to treat the object store as a second tier, implying to me
that once the SSTables reach a certain age or aren't touched, they're
uploaded to the object store. Supporting this use case shouldn't be that
much different.  Maybe you don't care about the most recent data, and
you're OK with losing everything from the last few days, because you can
reload from Kafka.  You'd be treating C* as a temporary staging area for
whatever purpose, and you only want to do analytics on the cold data.   As
I've expressed already, it's just a difference in policy of when to
upload.  Yes I'm being a bit hand wavy about this part but its an email
discussion.  I don't have this use case today, but it's certainly valid.
Or maybe it works in conjunction with witness replicas / transient
replication, allowing you to offload data from a node until there's a
failure, in which case C* can grab it.  I'm just throwing out ideas here.

Feel free to treat object store here as "secondary location".  It can be a
NAS, an S3 mount, a custom FUSE filesystem, or the S3 api, or whatever else
people come up with.

In the past, I've made the case that this functionality can be achieved
with LVM cache pools [1][2], and even provided some benchmarks showing it
can be used.  I've also argued that we can do node replacements with
rsync.  While these things are both technically true, others have convinced
me that having this functionality as first class in the database makes it
easier for our users and thus better for the project.  Should someone have
to *just* understand all of LVM, or *just* understand the nuance of
potential data loss due to rsync's default second resolution?  I keep
relearning that whenever I say "*just* do X", that X isn't as convenient or
easy for other people as it is for me, and I need to relax a bit.

The view of the world where someone *just* needs to know dozens of
workarounds to use the database makes it harder for non-experts to use.
The bar for usability is constantly being raised, and whatever makes it
better for a first time user is better for the community.

Anyways, that's where I'm at now.  Easier = better, even if we're
reinventing some of the wheel to do so.  Sometimes you get a better wheel,
too.

Jon

[1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
[2] https://issues.apache.org/jira/browse/CASSANDRA-8460



On Fri, Mar 7, 2025 at 10:08 AM Štefan Miklošovič 
wrote:

> Because an earlier reply hinted that mounting a bucket yields "terrible
> results". That has moved the discussion, in my mind, practically to the
> place of "we are not going to do this", to which I explained that in this
> particular case I do not find the speed important, because the use cases
> you want to use it for do not have anything in common with what I want to
> use it for.
>
> Since then, in my eyes this was binary "either / or", I was repeatedly
> trying to get an opinion about being able to mount it regardless, to which,
> afaik, only you explicitly expressed an opinion that it is OK but you are
> not a fan of it:
>
> "I personally can't see myself using something that treats an object store
> as cold storage where SSTables are moved (implying they weren't there
> before), and I've expressed my concerns with this, but other folks seem to
> want it and that's OK."
>
> So my assumption was, except you being ok with it, that mounting is not
> vi

Re: [RELEASE] Apache Cassandra Sidecar 0.1.0 released

2025-03-07 Thread Bernardo Botella
This is a huge milestone! It’s incredible to see this release happening. 
Congrats to everyone involved!

> On Mar 7, 2025, at 9:48 AM, Francisco Guerrero  wrote:
> 
> The Cassandra team is pleased to announce the release of Apache Sidecar 
> Cassandra version 0.1.0.
> 
> 
> Downloads of source and binary distributions are available here:
> 
>  https://dlcdn.apache.org/cassandra/cassandra-sidecar/0.1.0/
> 
> 
> The Maven artifacts can be found at:
> 
>  https://repo.maven.apache.org/maven2/org/apache/cassandra/
> 
> These will be mirrored to other repositories.
> 
> 
> As always, please review the changes[1] and pay attention to the release 
> notes[2]. Let us know[3] if you were to encounter any problem.
> 
> 
> Enjoy!
> 
> [1]: CHANGES.txt 
> https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/CHANGES.txt
> [2]: NEWS.txt 
> https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/NEWS.txt
> [3]: https://issues.apache.org/jira/browse/CASSSIDECAR



Re: March 2025 project status update

2025-03-07 Thread Blake Eggleston
ahem, sorry looks like we're still waiting on something from Abe on that one. 
Nevermind for now :)

On Fri, Mar 7, 2025, at 1:05 PM, Blake Eggleston wrote:
> Technically March, but Abe Ratnofsky was also added as a committer. Looks 
> like the announcement for that was missed
> 
> On Fri, Mar 7, 2025, at 10:45 AM, Josh McKenzie wrote:
>> Did you know I sometimes fail at email filtering?
>> 
>> 4 new committers! FOUR. Welcome to Bernardo Botella Corbi as well!
>> 
>> That was 3 days ago too. /sigh
>> 
>> Congrats everyone!
>> 
>> On Fri, Mar 7, 2025, at 1:39 PM, Josh McKenzie wrote:
>>> What happened last month? *New PMC members and committers, that's what.*
>>> 
>>> We welcomed *three* new people to the PMC in Feb: Jeremiah Jordan, Caleb 
>>> Rackliffe, and Ekaterina Dimitrova. We're the better for all three of them 
>>> being here with us; thank you all for your hard work over the years and 
>>> dedication to making this project a better place to collaborate.
>>> 
>>> We added 3 new committers as well! Busy month. Maxwell Guo, Dmitry 
>>> Konstantinov, and Aaron Ploetz. It's really heartening to see this much 
>>> movement with us deepening our bench of both long-time contributors and new 
>>> entrants to the project.
>>> 
>>> Congratulations again to all three of you new PMC members and you three new 
>>> committers!
>>> 
>>> *[Releases]*
>>> The 4.1 and 5.0 releases that were in limbo last we emailed were released, 
>>> along with the Java Driver 4.19.0 release, and the inaugural Sidecar 0.1.0 
>>> release! Congratulations to everyone working hard on the sidecar to start 
>>> gaining momentum and building inertia on the project.
>>> 
>>> *[Email Activity]*
>>> - We had 38(!) topics on dev@, so I'll select a handful that I think 
>>> everyone should be aware of:
>>> - CEP-15 Update: Benedict and the other contributors working on Accord 
>>> are looking for a community consensus on when to merge and what prereqs 
>>> might be reasonable. 
>>> https://lists.apache.org/thread/j9qmj9rn1p0tlntc9k90bygh43y8sb57
>>> - Joel Shepherd is exploring run-time collisions of dependencies with 
>>> an ask around whether we've considered factoring interfaces out of the core 
>>> DB into a shared library: 
>>> https://lists.apache.org/thread/7x9wsn265mq5jd5q236y0nkjsw6ycbct
>>> - Radim Vansa is looking for feedback on a patch for Spring Boot to 
>>> make things work with the OpenJDK CRaC project: 
>>> https://lists.apache.org/thread/9sms1sk8fd739mp7699wrbj0vnd0kzd1. Seems the 
>>> Spring folks felt their community wasn't the right place for the work so 
>>> Radim is wondering if ours is.
>>> - Dmitry's opened up the discussion about dropping Dropwizard/Codahale 
>>> metrics in the Cassandra Server; some good discussion there: 
>>> https://lists.apache.org/thread/xwxqhmzbtxo4job1r0t8j0b9838t7jvd
>>> - Cassandra Forward 2025 happens on March 11; see Patric's announcement 
>>> and reminder here: 
>>> https://lists.apache.org/thread/thkksv42wkbksfhv986mdwzo1r4cxdo9
>>> - And last but definitely not least, David wins an award for the best 
>>> email subject line in recent memory with "Meaningless emptiness and 
>>> filtering": https://lists.apache.org/thread/h3mlccz4g6o9tfz9bgb9p4f395lpmswp
>>> The user list has a couple topics folks have engaged with already around 
>>> reattaching EBS replicas async and doing a PoC of Cassandra on AWS EC2 
>>> Graviton's. Check those out if you're interested.
>>> 
>>> *[JIRA Activity]*
>>> Closed: 106 issues closed in JIRA since Feb 3: 
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20!%3D%20unresolved%20and%20resolved%20%3E%3D%202025-02-03.
>>>  Again: that's a lot. Color me impressed.
>>> 
>>> Created: 103 new issues: 
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20created%20%3E%3D%202025-02-03.
>>>  Of those, 81 are still unresolved.
>>> 
>>> *[New Contributors]*
>>> Hello and welcome! We've flagged some tickets as good starter tickets and 
>>> you can find them in our JIRA kanban board here: 
>>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2160.
>>>  All of these are unassigned so feel free to browse and find what interests 
>>> you.
>>> 
>>> Join us on the ASF slack: https://the-asf.slack.com, in #cassandra-dev for 
>>> dev discussion and #cassandra for user discussion. If you need an invite to 
>>> the slack server, let me know and I'll get you setup.
>>> 
>>> *[Needs Committer]*
>>> We have 9 tickets that qualify for needing a committer, up from 8 last 
>>> month: 
>>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2170.
>>>  For those that don't know, we require 2 +1 from committers on the project 
>>> before merge.
>>> 
>>> *[Needs Reviewer]*
>>> As a reminder here's our JQL: 
>>> https://issues.apache.org/jira/issues/?filter=12353799#. We're up from 97 
>>> to 108 tickets that are Patch Availabl

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jordan West
I too initially felt we should just use mounts and was excited by e.g.
Single Zone Express mounting. As Cheng mentioned we tried it…and the
results were disappointing (except for use cases who could sometimes
tolerate seconds of p99 latency. That brought me around to needing an
implementation we own that we can optimize properly as others have
discussed.

Regarding what percent of data should be in the cold store I would love to
see an implementation that allows what Jon is proposing with the full data
set and what the original proposal included with partial dataset. I think
there are different reasons to use both.

Jordan


On Fri, Mar 7, 2025 at 11:02 Jon Haddad  wrote:

> Supporting a filesystem mount is perfectly reasonable. If you wanted to
> use that with the S3 mount, there's nothing that should prevent you from
> doing so, and the filesystem version is probably the default implementation
> that we'd want to ship with since, to your point, it doesn't require
> additional dependencies.
>
> Allowing support for object stores is just that - allowing it.  It's just
> more functionality, more flexibility.  I don't claim to know every object
> store or remote filesystem, there are going to be folks that want to do
> something custom I can't think of right now and there's no reason to box
> ourselves in.  The abstraction to allow it is a small one.  If folks want
> to put SSTables on HDFS, they should be able to.  Who am I to police them,
> especially if we can do it in a way that doesn't carry any risk, making it
> available as a separate plugin?
>
> My comment with regard to not wanting to treat object store as tiered
> storage has everything to do with what I want to do with the data.  I want
> 100% of it on object store (with a copy of hot data on the local node) for
> multiple reasons, some of which (maybe all) were made by Jeff and Scott:
>
> * Analytics is easier, no transfer off C*
> * Replace is dead simple, just pull the data off the mount when it boots
> up.  Like using EBS, but way cheaper.  (thanks Scott for some numbers on
> this)
> * It makes backups easier, just copy the bucket.
> * If you're using object store for backups anyways than I don't see why
> you wouldn't keep all your data there
> * I hadn't even really thought about scale to zero before but I love this
> too
>
> Some folks want to treat the object store as a second tier, implying to me
> that once the SSTables reach a certain age or aren't touched, they're
> uploaded to the object store. Supporting this use case shouldn't be that
> much different.  Maybe you don't care about the most recent data, and
> you're OK with losing everything from the last few days, because you can
> reload from Kafka.  You'd be treating C* as a temporary staging area for
> whatever purpose, and you only want to do analytics on the cold data.   As
> I've expressed already, it's just a difference in policy of when to
> upload.  Yes I'm being a bit hand wavy about this part but its an email
> discussion.  I don't have this use case today, but it's certainly valid.
> Or maybe it works in conjunction with witness replicas / transient
> replication, allowing you to offload data from a node until there's a
> failure, in which case C* can grab it.  I'm just throwing out ideas here.
>
> Feel free to treat object store here as "secondary location".  It can be a
> NAS, an S3 mount, a custom FUSE filesystem, or the S3 api, or whatever else
> people come up with.
>
> In the past, I've made the case that this functionality can be achieved
> with LVM cache pools [1][2], and even provided some benchmarks showing it
> can be used.  I've also argued that we can do node replacements with
> rsync.  While these things are both technically true, others have convinced
> me that having this functionality as first class in the database makes it
> easier for our users and thus better for the project.  Should someone have
> to *just* understand all of LVM, or *just* understand the nuance of
> potential data loss due to rsync's default second resolution?  I keep
> relearning that whenever I say "*just* do X", that X isn't as convenient or
> easy for other people as it is for me, and I need to relax a bit.
>
> The view of the world where someone *just* needs to know dozens of
> workarounds to use the database makes it harder for non-experts to use.
> The bar for usability is constantly being raised, and whatever makes it
> better for a first time user is better for the community.
>
> Anyways, that's where I'm at now.  Easier = better, even if we're
> reinventing some of the wheel to do so.  Sometimes you get a better wheel,
> too.
>
> Jon
>
> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460
>
>
>
> On Fri, Mar 7, 2025 at 10:08 AM Štefan Miklošovič 
> wrote:
>
>> Because an earlier reply hinted that mounting a bucket yields "terrible
>> results". That has moved the discussion, in my mind, practically to t

[RELEASE] Apache Cassandra Sidecar 0.1.0 released

2025-03-07 Thread Francisco Guerrero
The Cassandra team is pleased to announce the release of Apache Sidecar
Cassandra version 0.1.0.


Downloads of source and binary distributions are available here:

 https://dlcdn.apache.org/cassandra/cassandra-sidecar/0.1.0/


The Maven artifacts can be found at:

 https://repo.maven.apache.org/maven2/org/apache/cassandra/

These will be mirrored to other repositories.


As always, please review the changes[1] and pay attention to the release
notes[2]. Let us know[3] if you were to encounter any problem.


Enjoy!

[1]: CHANGES.txt
https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/CHANGES.txt
[2]: NEWS.txt
https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSSIDECAR


Re: CEP-15 Update

2025-03-07 Thread Caleb Rackliffe
   1. Just a quick reminder that CASSANDRA-18196
    is where we had
   been tracking this previously with respect to things like the feature flag (
   CASSANDRA-18195 ),
   etc. I'm not sure if we want to officially tie up/resolve the subtasks
   there for Jira hygiene...