Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Tolbert, Andy
> @Andy - you can set the default compaction strategy in C* yaml now. Oh, this is very cool and I'm happy to see it! Looks like that landed as part of the UCS contribution itself (CASSANDRA-18397 Unified Compaction Strategy ), great idea. >

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread cnlwsu
Same. I can’t think of a scenario beyond just writes out pacing compaction throughput. What’s the 20%?ChrisSent from my iPhoneOn Dec 6, 2024, at 10:58 PM, Dinesh Joshi wrote:I’m genuinely curious to understand how is defaulting to LCS going to cause a nightmare? I am not sure what the concern is

Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Soheil Rahsaz
Hello Mellisa I am also eager to volunteer and excited to help. I have previously given talks about Cassandra. Regards, Soheil On Sat, Dec 7, 2024 at 1:11 AM Melissa Logan wrote: > Wonderful, thank you Rahul! I'll connect with you to discuss. > > > > On Fri, Dec 6, 2024 at 1:28 PM Rahul Singh

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Dinesh Joshi
On Fri, Dec 6, 2024 at 9:23 PM Jon Haddad wrote: > For a very common example, a lot of clusters are now using the k8ssandra > operator in AWS, which needs EBS. It's incredibly easy to fall behind on > compaction there. It's why I'm so interested in seeing CASSANDRA-15452 get > merged in. I've

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Jon Haddad
For a very common example, a lot of clusters are now using the k8ssandra operator in AWS, which needs EBS. It's incredibly easy to fall behind on compaction there. It's why I'm so interested in seeing CASSANDRA-15452 get merged in. I've dealt with quite a few of these clusters, in fact I just wo

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Dinesh Joshi
I’m genuinely curious to understand how is defaulting to LCS going to cause a nightmare? I am not sure what the concern is over here. On Fri, Dec 6, 2024 at 8:53 PM Jon Haddad wrote: > You're ignoring the other side here. For the folks who *can't* use LCS, > defaulting to it is a nightmare. > >

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Tolbert, Andy
It's also quite easy for STCS to make clusters inoperable, and it can be quite difficult to dig yourself out of. It's not hard to find yourself in a state where you have old 100GB+ SSTables full of expired data that never get compacted sitting around for months. Write amplification is a thing, b

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Dinesh Joshi
I would argue that vast majority of real world workloads are read heavy. LCS would therefore be a net benefit for the average user. To mitigate the write amplification concern I would make this change and make sure it is well documented for operators so they’re not caught off guard. On Fri, Dec 6

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Jon Haddad
You're ignoring the other side here. For the folks who *can't* use LCS, defaulting to it is a nightmare. Sorry, but you can't screw over 20% of the community to make life a little better for the 80%. This is a terrible tradeoff. Jon On Fri, Dec 6, 2024 at 8:36 PM Dinesh Joshi wrote: > I woul

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Dinesh Joshi
I have to agree with Chris here. In the vast majority of cases LCS hits the sweet spot and avoids the STCS pitfalls. UCS is too new and I would not make that a default OOTB. Philosophically, as a project, we should wait until critical features like these reach a certain level of maturity prior to

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Jeff Jirsa
And it works for that most of the time, so what’s the concern? “You lose throughput because iops / write amplification go up, so the perf of the default install goes down” ? (But the cost per byte goes way down, too)? > On Dec 6, 2024, at 8:01 PM, Brad wrote: > > > Could you elaborate what

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Brad
> Could you elaborate what you mean by 'disk storage management'? I often see clusters use LCS as an easy fix to avoid the 50% disk free recommendation of STCS without considering the write magnification implications. On Fri, Dec 6, 2024 at 10:46 PM Dinesh Joshi wrote: > Could you elaborate wha

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Dinesh Joshi
Could you elaborate what you mean by 'disk storage management'? On Fri, Dec 6, 2024 at 7:30 PM Brad wrote: > I'm -1 on LCS being the default, seen far too many people use it for disk > storage management > > On Fri, Dec 6, 2024 at 10:08 PM Jon Haddad > wrote: > >> I'm -1 on LCS being the defaul

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Jeff Jirsa
I’m probably closely aligned with Chris here, fwiw. - Jeff > On Dec 6, 2024, at 7:40 PM, Chris Lohfink wrote: > > While I am actually +1 on LCS being default as it handles more use cases well > compared to STCS. I am -1 on UCS being default anywhere currently, the UX is > horrible, documenta

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Chris Lohfink
While I am actually +1 on LCS being default as it handles more use cases well compared to STCS. I am -1 on UCS being default anywhere currently, the UX is horrible, documentation is unreadable and it's only available on a release barely anyone uses yet (not adequately tested in production). Seems l

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Brad
I'm -1 on LCS being the default, seen far too many people use it for disk storage management On Fri, Dec 6, 2024 at 10:08 PM Jon Haddad wrote: > I'm -1 on LCS being the default, since using it in the wrong situations > renders clusters inoperable. > > > On Fri, Dec 6, 2024 at 7:03 PM Paulo Motta

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Jon Haddad
I'm -1 on LCS being the default, since using it in the wrong situations renders clusters inoperable. On Fri, Dec 6, 2024 at 7:03 PM Paulo Motta wrote: > > I'd prefer to see the default go from STCS to UCS > > I’m proposing this for latest unstable (cassandra_latest.yaml) since it’s > a more rec

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Paulo Motta
> I'd prefer to see the default go from STCS to UCS I’m proposing this for latest unstable (cassandra_latest.yaml) since it’s a more recent strategy still being adopted. For latest stable (cassandra.yaml) I’d prefer LCS since it does not need tuning to support mutable workloads (UPDATE/DELETE) and

Re: Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Jon Haddad
I'd prefer to see the default go from STCS to UCS, probably with scaling_parameters T4. That's essentially the same as STCS but without the ridiculous SSTable growth, allowing us to leverage the fast streaming path more often. I don't think there's any valid use cases for STCS anymore now that we

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread Jaydeep Chovatia
Thanks for the information, Yifan and James!Given that, we can scope this email discussion only for this specific MV repair. Two points:1. Can this MV repair job provide some value addition?2. If yes, does it even make sense to merge this MV repair tooling, which uses Spak as its underlying technol

Re-evaluate compaction defaults in 5.1/trunk

2024-12-06 Thread Paulo Motta
Hi, It’s 2024 and users are still facing issues due to misconfigured compaction when using default configuration. I would like to start a conversation around improving compaction defaults in 5.1/trunk, so users trying out CQL transactions don’t need to worry about tuning compaction. A few sugges

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread Yifan Cai
Oh, I just noticed that James already mentioned it. On Fri, Dec 6, 2024 at 3:51 PM Yifan Cai wrote: > I would like to highlight an existing tooling for "many things beyond the > MV work, such as counting rows, etc." > > The Apache Cassandra Analytics project ( > http://github.com/apache/cassandr

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread Yifan Cai
I would like to highlight an existing tooling for "many things beyond the MV work, such as counting rows, etc." The Apache Cassandra Analytics project ( http://github.com/apache/cassandra-analytics/) could be a great resource for this type of task. It reads directly from the SSTables in the Spark

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread Jaydeep Chovatia
There are two approaches I have been thinking about for MV. *1. **Short Term (**Status Quo)* Here, we do not improve Cassandra MV architecture such that it reduces the data inconsistencies drastically; thus, we continually mark MV as an experimental feature. In this case, we can have two suboptio

Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Melissa Logan
Wonderful, thank you Rahul! I'll connect with you to discuss. On Fri, Dec 6, 2024 at 1:28 PM Rahul Singh (ANANT) wrote: > Melissa, Bernardo, am happy to help and volunteer some of our resources > as well. We've done many Cassandra Lunches over the years and would love to > support and continue

Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Rahul Singh (ANANT)
Melissa, Bernardo, am happy to help and volunteer some of our resources as well. We've done many Cassandra Lunches over the years and would love to support and continue the Planet Cassandra meetups. Sent via Superhuman On Fri, Dec 06, 2024 at 2:56 PM,

Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Melissa Logan
Fantastic, thanks Bernardo! I'll reach out separately. On Fri, Dec 6, 2024 at 10:22 AM Bernardo Botella < conta...@bernardobotella.com> wrote: > Hi Melissa, > > I’ll be happy to jump in to keep this going. Let’s sync about this when > you have a chance. > > Regards, > Bernardo > > On Dec 6, 2024,

Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Melissa Logan
Bernardo, I would be delighted for you to do this. Let me know a good time to chat: https://cal.com/mklogan On Fri, Dec 6, 2024 at 10:22 AM Bernardo Botella < conta...@bernardobotella.com> wrote: > Hi Melissa, > > I’ll be happy to jump in to keep this going. Let’s sync about this when > you ha

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread Jeff Jirsa
It feels uncomfortable asking users to rely on a third party that’s as heavy-weight as spark to use a built-in feature. Can we really not do this internally? I get that the obvious way with merkle trees is hard because the range fanout of the MV using a different partitioner, but have we tried

Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Bernardo Botella
Hi Melissa, I’ll be happy to jump in to keep this going. Let’s sync about this when you have a chance. Regards, Bernardo > On Dec 6, 2024, at 10:17 AM, Melissa Logan wrote: > > Hi folks: > > My team and I created and managed the Planet Cassandra Meetup - a virtual > monthly meetup to share

Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Melissa Logan
Hi folks: My team and I created and managed the Planet Cassandra Meetup - a virtual monthly meetup to share Cassandra use cases, best practices, case studies, community updates, and other similar topics: https://www.meetup.com/cassandra-global/ As we have stepped back from organizing this, we're

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread James Berragan
I think this would be useful and - having never really used Materialized Views - I didn't know it was an issue for some users. I would say the Cassandra Analytics library (http://github.com/apache/cassandra-analytics/) could be utilized for much of this, with a specialized Spark job for this purpos

[DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread Jaydeep Chovatia
Hi, *NOTE: *This email does not promote using Cassandra's Materialized View (MV) but assists those stuck with it for various reasons. The primary issue with MV is that once it goes out of sync with the base table, no tooling is available to remediate it. This Spark job aims to fill this gap by lo