Re: [Discuss] Repair inside C*

2024-10-28 Thread Jeff Jirsa


> On Oct 28, 2024, at 9:52 PM, Alexander Dejanovski 
>  wrote:
> 
> 
> 
>> If a repair session finishes gracefully, then this timeout is not 
>> applicable. Anyway, I do not have any strong opinion on the value. I am open 
>> to lowering it to 1h or something.
> True, it will only delay killing hanging repairs.
> One thing that we cannot solve in Reaper at the moment is that sequential and 
> dc aware repair sessions that get terminated due to the timeout leave 
> ephemeral snapshots behind. Since they're only reclaimed on restart, having a 
> lot of timeouts can end up filling the the disks if the snapshots get 
> materialized.
> Since the auto repair is running from within Cassandra, we might have more 
> control over this and implement a proper cleanup of such snapshots.

Jira to auto-delete snapshots at X% disk full ? 



Re: [Discuss] Repair inside C*

2024-10-28 Thread Jaydeep Chovatia
Thanks, Mick, for the comment, please find my response below.

>(1)

I think I covered most of the points in my response to Alexander (except
one, which I am responding to below separately). Tl;dr is the MVP that can
be easily extended to do a table-level schedule; it is just going to be
another CQL table property as opposed to a yaml config (currently in MVP).
I had already added this as a near-term feature here and added that when we
add repair priority table-wise, we need to ensure the table-level
scheduling is also taken care of. Please visit my latest few comments to
the ticket https://issues.apache.org/jira/browse/CASSANDRA-20013

>You may also want to do repairs in different DCs differently.

Currently, the MVP allows one to skip one or more DCs if they wish to do so
by defaulting all DCs. This again points to the similar theme of allowing
schedule (or priority) at a table level followed by a DC level. The MVP can
be easily extended at whatever granularity we want scheduling to be without
many architectural changes. We all just have to finalize the granularity we
want. I've also added to the ticket above that scheduling support at a
table-level followed by DC-level granularity.

>I'm curious as to how crashed repairs are handled and resumed

The MVP has a max allowed quota at a keyspace level and at a table level.
So, if a repair and/or keyspace takes much longer than the timeout due to
failures/more data it needs to repair, etc., then it will skip to the next
table/keyspace.

>Without any per-table scheduling and history (IIUC)  a node would have to
restart the repairs for all keyspaces and tables.

The above-mentioned quote should work fine and will make sure the bad
tables/keyspaces are skipped, allowing the good keyspaces/tables to proceed
on a node as long as the Cassandra JVM itself keeps crashing. If a JVM
keeps crashing, then it will restart all over again, but then fixing the
JVM crashing might be a more significant issue and does not happen
regularly, IMO.

>And without such per-table tracking, I'm also kinda curious as to how we
interact with manual repair invocations the user makes.  There are
operational requirements to do manual repairs, e.g. node replacement or if
a node has been down for too long, and consistency breakages until such
repair is complete.  Leaving such operational requirements to this CEP's
in-built scheduler is a limited approach, it may be many days before it
gets to doing it, and even with node priority will it appropriately switch
from primary-range to all-replica-ranges?

To alleviate some of this, the MVP has two options one can configure
dynamically through *nodetool*: 1) Setting priority for nodes, 2) Telling
the scheduler to repair one or more nodes immediately
If an admin sets some nodes on a priority queue, those nodes will be
repaired over the scheduler's own list. If an admin tags some nodes on the
emergency list, then those nodes will repair immediately. Basically, an
admin tells the scheduler, "*Just do what I say instead of using your list
of nodes*".
Even with this, if an admin decides to trigger repair manually directly
through *nodetool repair*, then the scheduler should not interfere with
that manually triggered operation - they can progress independently. The
MVP has options to disable the scheduler's repair dynamically without any
cluster restart, etc., so the admin can use some of the combinations and
decide what to do when they invoke any manual repair operation.

>What if the user accidentally invokes an incremental repair when the
in-built scheduler is expecting only to ever perform full repairs? Does it
know how to detect/remedy that?

The user invocation and the scheduler invocations go through two different
Repair sessions. If the MVP scheduler has been configured only to perform
FR, then the scheduler will never fire IR, but it does not prohibit the
user from firing IR through *nodetool repair*. As an enhancement to the
MVP, in the future, we must warn the user that it might not be safe to run
IR as the in-built scheduler has been configured not to do IR, etc., so be
careful, etc.

>Having read the design doc and PR, I am impressed how lightweight the
design of the tables are.

Thanks. To reiterate, the number of records in the system_distributed will
be equivalent to the number of nodes in the Cluster.

>But I do still think we deserve some numbers, and a further line of
questioning:  what consistency guarantees do we need, how does this work
cross-dc, during topology changes, does an event that introduces
data-at-rest inconsistencies in the cluster then become
confused/inefficient when the mechanism to repair it also now has its
metadata inconsistent.  For the most part this is a problem not unique to
any table in system_distributed and otherwise handled, but how does the
system_distributed keyspace handling of such failures impact repairs.

Keeping practicality in mind, the record count to the table should be as
small as three rows and 

Re: [Discuss] Repair inside C*

2024-10-28 Thread Alexander Dejanovski
>
> The scheduler repairs, by default, the primary ranges for all the nodes
> going through the repair. Since it uses the primary ranges, all the nodes
> repairing parallelly would not overlap in any form for the primary ranges.
> However, the replica set for the nodes going through repair may or may not
> overlap, but it totally depends on the cluster size and parallelism used.
> If a cluster is small, there is a possibility, but if it is large, the
> possibility reduces. Even if we go with a range-centric approach and if we
> repair N token ranges in parallel, there is no guarantee that their replica
> sets won't overlap for smaller clusters.

That's inaccurate, we can check the replica set for the subrange we're
about to run and see if it overlaps with the replica set of other ranges
which are being processed already.


The only solution is to reduce the repair parallelism to one. node at a
> time.

Yes, I agree.

This is supported with the MVP, we can set "min_repair_interval: 7d"  (the
> default is 24h) and the nodes will repair only once every 7 days.
>
The MVP implementation allows running full and incremental repairs (and
> Preview repair code changes are done and it is coming soon) independently
> and in parallel. One can set the above config for each repair type with
> their preferred schedule.

Nice, sorry I missed these in the CEP doc.


>  I have already created a ticket to add this as an enhancement
> https://issues.apache.org/jira/browse/CASSANDRA-20013

Thanks,  table level repair priority could be a very interesting
improvement, that's something Reaper lacks as well at the moment.

If a repair session finishes gracefully, then this timeout is not
> applicable. Anyway, I do not have any strong opinion on the value. I am
> open to lowering it to *1h* or something.

True, it will only delay killing hanging repairs.
One thing that we cannot solve in Reaper at the moment is that sequential
and dc aware repair sessions that get terminated due to the timeout leave
ephemeral snapshots behind. Since they're only reclaimed on restart, having
a lot of timeouts can end up filling the the disks if the snapshots get
materialized.
Since the auto repair is running from within Cassandra, we might have more
control over this and implement a proper cleanup of such snapshots.


Alexander Dejanovski

Astra Managed Clusters / Mission Control

w. www.datastax.com

 


On Mon, Oct 28, 2024 at 7:01 PM Jaydeep Chovatia 
wrote:

> Thanks a lot, Alexander, for the review! Please find my response below:
>
> >  making these replicas process 3 concurrent repairs while others could
> be left uninvolved in any repair at all...Taking a range centric approach
> (we're not repairing nodes, we're repairing the token ranges) allows to
> spread the load evenly without overlap in the replica sets.
> The scheduler repairs, by default, the primary ranges for all the nodes
> going through the repair. Since it uses the primary ranges, all the nodes
> repairing parallelly would not overlap in any form for the primary ranges.
> However, the replica set for the nodes going through repair may or may not
> overlap, but it totally depends on the cluster size and parallelism used.
> If a cluster is small, there is a possibility, but if it is large, the
> possibility reduces. Even if we go with a range-centric approach and if we
> repair N token ranges in parallel, there is no guarantee that their replica
> sets won't overlap for smaller clusters.
>
> > I'm more worried even with incremental repair here, because you might
> end up with some conflicts around sstables which would be in the pending
> repair pool but would be needed by a competing repair job.
> This can happen regardless of whether we go by "node-centric" vs.
> "range-centric" if we run multiple parallel repair sessions. The reason is
> that SSTables for all the nodes going through repair may not be physically
> isolated 1:1 as per the token ranges being repaired. We just had a detailed
> discussion about the SSTable overlap for incremental repair (IR) last week
> in Slack (#cassandra-repair-scheduling-cep37), and the general consensus
> was that there is no better way to address it than just to retry a few
> times. The only solution is to reduce the repair parallelism to one. node
> at a time.
> The ideal and reliable way to repair IR is to calculate the token ranges
> based on the unrepaired data size and also apply the upper cap on the data
> size being repaired. The good news is that Andy Tolbert already extended
> the CEP-37 MVP for this, and he is working on making it perfect by adding
> necessary tests, etc., so it can be landed on top of this MVP. tl;dr Andy T
> and Chris L are already on top of this and soon it will be available on top
> of CEP-37 MVP.
>
> >I don't know if in the latest versions such sstables would be totally
> ignored or if the competing repair job would fail.
> The competing IR session would be 

Re: [Discuss] Repair inside C*

2024-10-28 Thread Jaydeep Chovatia
>
>
> > That's inaccurate, we can check the replica set for the subrange we're
about to run and see if it overlaps with the replica set of other ranges
which are being processed already.
We can definitely check the replicas for the subrange we plan to run and
see if they overlap with the ongoing one. I am saying that for a smaller
cluster if we want to repair multiple token ranges in parallel, it is tough
to guarantee that replica sets won't overlap.

>Jira to auto-delete snapshots at X% disk full ?
Sure, just created a new JIRA
https://issues.apache.org/jira/browse/CASSANDRA-20035

Jaydeep


Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-28 Thread guo Maxwell
Here  is the latest updated CEP-43



guo Maxwell  于2024年10月24日周四 19:53写道:

> yes,you are right. I will add this
>
> Štefan Miklošovič 于2024年10月24日 周四下午4:42写道:
>
>> The CEP should also mention that copying system tables or virtual tables
>> or materialized views and similar are not supported and an attempt of doing
>> so will error out.
>>
>> On Thu, Oct 24, 2024 at 7:16 AM Dave Herrington 
>> wrote:
>>
>>> Strong +1 to copy all options by default. This is intuitive to me.  Then
>>> I would like to explicitly override any options of my choosing.
>>>
>>> -Dave
>>>
>>> On Wed, Oct 23, 2024 at 9:57 PM guo Maxwell 
>>> wrote:
>>>
 OK,thank you for your suggestions ,I will revise the CEP and copy table
 OPTIONS by default.

 Jon Haddad 于2024年10月23日 周三下午9:18写道:

> Also strongly +1 to copying all the options.
>
>
> On Wed, Oct 23, 2024 at 5:52 AM Josh McKenzie 
> wrote:
>
>> I'm a very strong +1 to having the default functionality be to copy
>> *ALL* options.
>>
>> Intuitively, as a user, if I tell a software system to make a clone
>> of something I don't expect it to be shallow or a subset defined by some
>> external developer somewhere. I expect it to be a clone.
>>
>> Adding in some kind of "lean" mode or "column only" is fine if
>> someone can make a cogent argument around its inclusion. I don't 
>> personally
>> see a use-case for it right now but definitely open to being educated.
>>
>> On Wed, Oct 23, 2024, at 3:03 AM, Štefan Miklošovič wrote:
>>
>> options are inherently part of that table as well, same as schema. In
>> fact, _schema_ includes all options. Not just columns and its names. If 
>> you
>> change some option, you effectively have a different schema, schema 
>> version
>> changes by changing an option. So if we do not copy options too, we are
>> kind of faking it (when we do not specify WITH OPTIONS).
>>
>> Also, imagine a situation where Accord is merged to trunk. It
>> introduces a new schema option called "transactional = full" which is not
>> default. (I am sorry if I did the spelling wrong here). So, when you 
>> have a
>> table with transactional support and you do "create table ks.tb_copy like
>> ks.tb", when you _do not_ copy all options, this table will _not_ become
>> transactional.
>>
>> The next thing you go to do is to execute some transactions against
>> this table but well ... you can not do that, because your table is not
>> transactional, because you have forgotten to add "WITH OPTIONS". So you
>> need to go back to that and do "ALTER ks.tb_copy WITH transactional = 
>> full"
>> just to support that.
>>
>> I think that you see from this pattern that it is way better if we
>> copy all options by default instead of consciously opt-in into them.
>>
>> also:
>>
>> "but I think there are also some users want to do basic column
>> information copy"
>>
>> where is this coming from? Do you have this idea somehow empirically
>> tested? I just do not see why somebody would want to have Cassandra's
>> defaults instead of what a base table contains.
>>
>> On Wed, Oct 23, 2024 at 8:28 AM guo Maxwell 
>> wrote:
>>
>> The reason for using OPTION keyword is that I want to provide users
>> with more choices .
>> The default behavior for copying a table is to copy the basic item of
>> table (column and their data type,mask,constraint),others thing belongs 
>> to
>> the table like option,views,trigger
>> are optional in my mind.
>> You are absolutely right that users may want to copy all stuff but I
>> think there are aslo some users want to do basic column information 
>> copy,So
>> I just give them a choice。As we know that the number of table parameters 
>> is
>> not small,compression,compaction,gc_seconds,bf_chance,speculative_retry 
>> and
>> so on.
>>
>> Besides we can see that pg have also the keyword COMMENT,COMPRESSION
>> which have the similar behavior as our OPTION keyword。
>>
>> So that is why I add this keyword OPTION.
>>
>>
>> Štefan Miklošovič 于2024年10月22日 周二下午11:40写道:
>>
>> The problem is that when I do this minimal CQL which shows this
>> feature:
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb;
>>
>> then you are saying that when I _do not_ specify WITH OPTIONS then I
>> get Cassandra's defaults. Only after I specify WITH OPTIONS, it would
>> truly be a copy.
>>
>> This is not a good design. Because to have an exact copy, I have to
>> make a conscious effort to include OPTIONS as well. That should not be 
>> the
>> case. I just want to have a copy, totally the same stuff, when I use the
>> min

Re: [Discuss] Repair inside C*

2024-10-28 Thread Mick Semb Wever
any name works for me, Jaydeep :-)

I've taken a run through of the CEP, design doc, and current PR.  Below are
my four (rough categories of) questions.
I am keen to see a MVP land, so I'm more looking at what the CEP's design
might not be able to do, rather than what may or may not land in an initial
implementation.  There's a bit below, and some of it really would be better
in the PR, feel free to take it there if deemed more constructive.


1) The need for different schedules for different tables
2) Failure mode: repairs failing and thrashing repairs for all
keyspaces+tables
3) Small concerns on relying on system tables
4) Small concerns on tuning requirements


(1)
Alex also touched on this.  I'm aware of too many reasons where this is a
must-have.  Many users cannot repair their clusters without tuning
per-table schedules.  Different gc_grace_seconds is the biggest reason.
But there's also running full repairs infrequently for disk rot (or similar
reason) on a table that's otherwise frequently incremental repaired (also
means an incremental repair could be skipped if the full repair was
currently running).  Or TWCS tables where you benefit from higher frequency
of incremental repair (and/or want to minimise repairs older than the
current time_window).   You may also want to do repairs in different DCs
differently.

(2)
I'm curious as to how crashed repairs are handled and resumed…
A problem a lot of users struggle with is where the repair on one table is
enigmatically problematic, crashing or timing out, and it takes a long time
to figure it out.
Without any per-table scheduling and history (IIUC)  a node would have to
restart the repairs for all keyspaces and tables.  This will lead to
over-repairing some tables and never repairing others.

And without such per-table tracking, I'm also kinda curious as to how we
interact with manual repair invocations the user makes.

There are operational requirements to do manual repairs, e.g. node
replacement or if a node has been down for too long, and consistency
breakages until such repair is complete.  Leaving such operational
requirements to this CEP's in-built scheduler is a limited approach, it may
be many days before it gets to doing it, and even with node priority will
it appropriately switch from primary-range to all-replica-ranges?

What if the user accidently invokes an incremental repair when the in-built
scheduler is expecting only to ever perform full repairs, does it know how
to detect/remedy that?


(3)
Having stuff in system tables is brittle and a write-amplification, we have
plenty of experience of this from DSE NodeSync and Reaper.  Reaper's
ability to store its metadata out-of-cluster is a huge benefit.  Having
read the design doc and PR, I am impressed how lightweight the design of
the tables are.  But I do still think we deserve some numbers, and a
further line of questioning:  what consistency guarantees do we need, how
does this work cross-dc, during topology changes, does an event that
introduces data-at-rest inconsistencies in the cluster then become
confused/inefficient when the mechanism to repair it also now has its
metadata inconsistent.  For the most part this is a problem not unique to
any table in system_distributed and otherwise handled, but how does the
system_distributed keyspace handling of such failures impact repairs.

Even with strong consistency, I would assume the design needs to be
pessimistic, e.g. multiple node repairs can be started at the time.  Is
this true, if so how is it handled ?

I am also curious as to how the impact of these tables changes as we
address (1) and (2).

(4)
I can see how the CEP's design works well for the biggest clusters, and
those with heterogeneous data-models (which often comes with larger
deployment sets).  But I don't think we can use this as the bar to quality
or acceptance.   Many smaller clusters that come with lots of keyspaces and
tables have real troubles trying to get repairs to run weekly.  We can't
simply blame users for not having optimal data models and deployments.

Carefully tuning the schedules of tables, and the cluster itself, is often
a requirement – time-consuming and a real pain point.  The CEP as it stands
today I can, with confidence, say will simply not work for many users.
Worse than that it will provide false hope, and take time and effort for
users until they realise it won't work, leaving them having to revert to
their previous solution.   No one expects the CEP to initially handle and
solve every situation, especially poor data-models and over-capacity
clusters.  Hope here is just a bit of discussion that can help us be
informative about our limitations, and possibly save some users from
thinking this is their silver bullet.

The biggest aspect to this I believe is (1), but operational stability and
tuning is also critical.  Alex mentions the range-centric approach, which
helps balance load, which in turn gives you more head room.  But there's
also stuff like p

Re: [Discuss] Repair inside C*

2024-10-28 Thread Jaydeep Chovatia
Thanks a lot, Alexander, for the review! Please find my response below:

>  making these replicas process 3 concurrent repairs while others could be
left uninvolved in any repair at all...Taking a range centric approach
(we're not repairing nodes, we're repairing the token ranges) allows to
spread the load evenly without overlap in the replica sets.
The scheduler repairs, by default, the primary ranges for all the nodes
going through the repair. Since it uses the primary ranges, all the nodes
repairing parallelly would not overlap in any form for the primary ranges.
However, the replica set for the nodes going through repair may or may not
overlap, but it totally depends on the cluster size and parallelism used.
If a cluster is small, there is a possibility, but if it is large, the
possibility reduces. Even if we go with a range-centric approach and if we
repair N token ranges in parallel, there is no guarantee that their replica
sets won't overlap for smaller clusters.

> I'm more worried even with incremental repair here, because you might end
up with some conflicts around sstables which would be in the pending repair
pool but would be needed by a competing repair job.
This can happen regardless of whether we go by "node-centric" vs.
"range-centric" if we run multiple parallel repair sessions. The reason is
that SSTables for all the nodes going through repair may not be physically
isolated 1:1 as per the token ranges being repaired. We just had a detailed
discussion about the SSTable overlap for incremental repair (IR) last week
in Slack (#cassandra-repair-scheduling-cep37), and the general consensus
was that there is no better way to address it than just to retry a few
times. The only solution is to reduce the repair parallelism to one. node
at a time.
The ideal and reliable way to repair IR is to calculate the token ranges
based on the unrepaired data size and also apply the upper cap on the data
size being repaired. The good news is that Andy Tolbert already extended
the CEP-37 MVP for this, and he is working on making it perfect by adding
necessary tests, etc., so it can be landed on top of this MVP. tl;dr Andy T
and Chris L are already on top of this and soon it will be available on top
of CEP-37 MVP.

>I don't know if in the latest versions such sstables would be totally
ignored or if the competing repair job would fail.
The competing IR session would be aborted, and the scheduler would retry a
few times.

>Continuous repair might create a lot of overhead for full repairs which
often don't require more than 1 run per week.
This is supported with the MVP, we can set "min_repair_interval: 7d"  (the
default is 24h) and the nodes will repair only once every 7 days.

>It also will not allow running a mix of scheduled full/incremental repairs
The MVP implementation allows running full and incremental repairs (and
Preview repair code changes are done and it is coming soon) independently
and in parallel. One can set the above config for each repair type with
their preferred schedule.

>Here, nodes will be processed sequentially and each node will process the
keyspaces sequentially, tying the repair cycle of all keyspaces together.
The keyspaces and tables on each node will be randomly shuffled to avoid
multiple nodes working on the same table/keyspaces.

>There are many cases where one might have differentiated gc_grace_seconds
settings to optimize reclaiming tombstones when applicable. That requires
having some fine control over the repair cycle for a given keyspace/set of
tables.
As I mentioned, there is already a way to schedule a frequency of repair
cycle, but the frequency is currently a global config on a node; hence
applicable to all the tables on a node. However, the MVP design is flexible
enough to be easily extended to add the schedule as a new CQL table-level
property, which will then honor the table-level schedule as opposed to a
global schedule. There was another suggestion from @masokol (from
Ecchronos) to maybe assign a repair priority on a table level to prioritize
one table over the other, and that can also solve this problem, which is
also feasible on top of the MVP. I have already created a ticket to add
this as an enhancement https://issues.apache.org/jira/browse/CASSANDRA-20013

>I think the 3 hours timeout might be quite large and probably means a lot
of data is being repaired for each split. That usually involves some level
of overstreaming
This timeout applies to unstuck stuck repair sessions due to some bug in
the repair code path. e.g.
https://issues.apache.org/jira/browse/CASSANDRA-14674
If a repair session finishes gracefully, then this timeout is not
applicable. Anyway, I do not have any strong opinion on the value. I am
open to lowering it to *1h* or something.

Jaydeep

On Mon, Oct 28, 2024 at 4:45 AM Alexander DEJANOVSKI 
wrote:

> Hi Jaydeep,
>
> I've taken a look at the proposed design and have a few comments/questions.
> As one of the maintainers of Reaper, I'm looking th

Re: [Discuss] Repair inside C*

2024-10-28 Thread Alexander DEJANOVSKI
Hi Jaydeep,

I've taken a look at the proposed design and have a few comments/questions.
As one of the maintainers of Reaper, I'm looking this through the lens of
how Reaper does things.


*The approach taken in the CEP-37 design is "node-centric" vs a "range
centric" approach (which is the one Reaper takes).*I'm worried that this
will not allow spreading the repair load evenly across the cluster, since
nodes are the concurrency unit. You could allow running repair on 3 nodes
concurrently for example, but these 3 nodes could all involve the same
replicas, making these replicas process 3 concurrent repairs while others
could be left uninvolved in any repair at all.
Taking a range centric approach (we're not repairing nodes, we're repairing
the token ranges) allows to spread the load evenly without overlap in the
replica sets.
I'm more worried even with incremental repair here, because you might end
up with some conflicts around sstables which would be in the pending repair
pool but would be needed by a competing repair job.
I don't know if in the latest versions such sstables would be totally
ignored or if the competing repair job would fail.

*Each repair command will repair all keyspaces (with the ability to fully
exclude some tables) and **I haven't seen a notion of schedule which seems
to suggest repairs are running continuously (unless I missed something?).*
There are many cases where one might have differentiated gc_grace_seconds
settings to optimize reclaiming tombstones when applicable. That requires
having some fine control over the repair cycle for a given keyspace/set of
tables.
Here, nodes will be processed sequentially and each node will process the
keyspaces sequentially, tying the repair cycle of all keyspaces together.
If one of the ranges for a specific keyspace cannot be repaired within the
3 hours timeout, it could block all the other keyspaces repairs.
Continuous repair might create a lot of overhead for full repairs which
often don't require more than 1 run per week.
It also will not allow running a mix of scheduled full/incremental repairs
(I'm unsure if that is still a recommendation, but it was still recommended
not so long ago)

*The timeout base duration is large*
I think the 3 hours timeout might be quite large and probably means a lot
of data is being repaired for each split. That usually involves some level
of overstreaming. I don't have numbers to support this, it's more about my
own experience on sizing splits in production with Reaper to reduce the
impact as much as possible on cluster performance.
We use 30 minutes as default in Reaper with subsequent attempts growing the
timeout dynamically for challenging splits.

Finally thanks for picking this up, I'm eager to see Reaper not being
needed anymore and having the database manage its own repairs!


Le mar. 22 oct. 2024 à 21:10, Benedict  a écrit :

> I realise it’s out of scope, but to counterbalance all of the
> pro-decomposition messages I wanted to chime in with a strong -1. But we
> can debate that in a suitable context later.
>
> On 22 Oct 2024, at 16:36, Jordan West  wrote:
>
> 
> Agreed with the sentiment that decomposition is a good target but out of
> scope here. I’m personally excited to see an in-tree repair scheduler and
> am supportive of the approach shared here.
>
> Jordan
>
> On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi  wrote:
>
>> Decomposing Cassandra may be architecturally desirable but that is not
>> the goal of this CEP. This CEP brings value to operators today so it should
>> be considered on that merit. We definitely need to have a separate
>> conversation on Cassandra's architectural direction.
>>
>> On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch 
>> wrote:
>>
>>> Definitely like this in C* itself. We only changed our proposal to
>>> putting repair scheduling in the sidecar before because trunk was frozen
>>> for the foreseeable future at that time. With trunk unfrozen and
>>> development on the main process going at a fast pace I think it makes way
>>> more sense to integrate natively as table properties as this CEP proposes.
>>> Completely agree the scheduling overhead should be minimal.
>>>
>>> Moving the actual repair operation (comparing data and streaming
>>> mismatches) along with compaction operations to a separate process long
>>> term makes a lot of sense but imo only once we both have a release of
>>> sidecar and a contract figured out between them on communication. I'm
>>> watching CEP-38 there as I think CQL and virtual tables are looking much
>>> stronger than when we wrote CEP-1 and chose HTTP but that's for that
>>> discussion and not this one.
>>>
>>> -Joey
>>>
>>> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
>>> wrote:
>>>
 Like others have said, I was expecting the scheduling portion of repair
 is
 negligible. I was mostly curious if you had something handy that you can
 quickly share.

 On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
 > >Jaydeep, do