Re: [Discuss] Repair inside C*

Francisco Guerrero Mon, 21 Oct 2024 10:01:39 -0700

Jaydeep, do you have any metrics on your clusters comparing them before
and after introducing repair scheduling into the Cassandra process?


On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> Sounds good. Just wanted to bring it up. I agree that the scheduling bit is
> pretty light weight and the ideal would be to bring the whole of the repair
> external, which is a much bigger can of worms to open.
> 
>   
> 
> -Jeremiah
> 
>   
> 
> > On Oct 21, 2024, at 11:21 AM, Chris Lohfink <[email protected]> wrote:  
> >  
> >
> 
> > 
> >
> > > I actually think we should be looking at how we can move things out of the
> > database process.  
> >
> >
> >  
> >
> >
> > While worth pursuing, I think we would need a different CEP just to figure
> > out how to do that. Not only is there a lot of infrastructure difficulty in
> > running multi process, the inter app communication needs to be figured out
> > better then JMX. Even the sidecar we dont have a solid story on how to
> > ensure both are running or anything yet. It's up to each app owner to figure
> > it out. Once we have a good thing in place I think we can start moving
> > compactions, repairs, etc out of the database. Even then it's the _repairs_
> > that is expensive, not the scheduling.
> >
> >  
> >
> >
> > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > <[[email protected]](mailto:[email protected])> wrote:  
> >
> >
> 
> >> I love the idea of a repair service being there by default for an install
> of C*.  My main concern here is that it is putting more services into the main
> database process.  I actually think we should be looking at how we can move
> things out of the database process.  The C* process being a giant monolith has
> always been a pain point.  Is there anyway it makes sense for this to be an
> external process rather than a new thread pool inside the C* process?
> 
> >>
> 
> >>  
> >
> >>
> 
> >> -Jeremiah Jordan
> 
> >>
> 
> >>  
> >
> >>
> 
> >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>
> 
> >>>  
> >
> >>>
> 
> >>> This is looking strong, thanks Jaydeep.
> 
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> I would suggest folk take a look at the design doc and the PR in the CEP.
> A lot is there (that I have completely missed).
> 
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> I would especially ask all authors of prior art (Reaper, DSE nodesync,
> ecchronos)  to take a final review of the proposal  
> >
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> Jaydeep, can we ask for a two week window while we reach out to these
> people ?  There's a lot of prior art in this space, and it feels like we're in
> a good place now where it's clear this has legs and we can use that to bring
> folk in and make sure there's no remaining blindspots.
> 
> >>>
> 
> >>>  
> >
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>
> 
> >>>> Sorry, there is a typo in the CEP-37 link; here is the correct
> [link](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)
> 
> >>>>
> 
> >>>>  
> >
> >>>>
> 
> >>>>  
> >
> >>>>
> 
> >>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>>
> 
> >>>>> First, thank you for your patience while we strengthened the CEP-37.
> 
> >>>>>
> 
> >>>>>  
> >
> >>>>>
> 
> >>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) to
> come up with the best possible design that not only significantly simplifies
> repair operations but also includes the most common features that everyone
> will benefit from running at Scale.
> 
> >>>>>
> 
> >>>>> For example,
> 
> >>>>>
> 
> >>>>>   * Apache Cassandra must be capable of running multiple repair types,
> such as Full, Incremental, Paxos, and Preview - so the framework should be
> easily extendable with no additional overhead from the operator’s point of
> view.
> 
> >>>>>
> 
> >>>>>   * An easy way to extend the token-split calculation algorithm with a
> default implementation should exist.
> 
> >>>>>
> 
> >>>>>   * Running incremental repair reliably at Scale is pretty challenging,
> so we need to place safeguards, such as migration/rollback w/o restart and
> stopping incremental repair automatically if the disk is about to get full.
> 
> >>>>>
> 
> >>>>>
> 
> >>>>>
> 
> >>>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
> now officially ready for review after multiple rounds of design, testing, code
> reviews, documentation reviews, and, more importantly, validation that it runs
> at Scale!
> 
> >>>>>
> 
> >>>>>  
> >
> >>>>>
> 
> >>>>> Some facts about CEP-37.
> 
> >>>>>
> 
> >>>>>   * Multiple members have verified all aspects of CEP-37 numerous times.
> 
> >>>>>
> 
> >>>>>   * The design proposed in CEP-37 has been thoroughly tried and tested
> on an immense scale (hundreds of unique Cassandra clusters, tens of thousands
> of Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source
> for more than five years; please see more details[
> here](https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-
> at-scale/).
> 
> >>>>>
> 
> >>>>>   * The following
> [presentation](https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13)
> highlights the rigorous applied to CEP-37, which was given during last week’s
> Apache Cassandra Bay Area [Meetup](https://www.meetup.com/apache-cassandra-
> bay-area/events/303469006/),
> 
> >>>>>
> 
> >>>>>
> 
>   
> >
> >>>>>
> 
> >>>>> Since things are massively overhauled, we believe it is almost ready for
> a final pass pre-VOTE. We would like you to please review the
> [CEP-37](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution\))
> and the associated detailed design
> [doc](https://docs.google.com/document/d/1CJWxjEi-
> mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0).
> 
> >>>>>
> 
> >>>>>  
> >
> >>>>>
> 
> >>>>> Thank you everyone!
> 
> >>>>>
> 
> >>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
> 
> >>>>>
> 
> >>>>>  
> >  
> >
> >>>>>
> 
> >>>>>  
> >
> >>>>>
> 
> >>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>>>
> 
> >>>>>>  __
> 
> >>>>>>
> 
> >>>>>> Not quite; finishing touches on the CEP and design doc are in flight
> (as of last / this week).  
> >
> >>>>>>
> 
> >>>>>>  
> >
> >>>>>>
> 
> >>>>>> Soon(tm).
> 
> >>>>>>
> 
> >>>>>>  
> >
> >>>>>>
> 
> >>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:  
> >
> >>>>>>
> 
> >>>>>>> Is this CEP ready for a VOTE thread?
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution>
>   
> >
> >>>>>>>
> 
> >>>>>>>  
> >
> >>>>>>>
> 
> >>>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>>>>>
> 
> >>>>>>>> Thanks, Josh. I've just updated the
> [CEP](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution)
> and included all the solutions you mentioned below.  
> >
> >>>>>>>>
> 
> >>>>>>>>  
> >
> >>>>>>>>
> 
> >>>>>>>> Jaydeep  
> >
> >>>>>>>>
> 
> >>>>>>>>  
> >
> >>>>>>>>
> 
> >>>>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>>>>>>
> 
> >>>>>>>>>  __  
> >
> >>>>>>>>>
> 
> >>>>>>>>> Very late response from me here (basically necro'ing this thread).  
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> I think it'd be useful to get this condensed into a CEP that we can
> then discuss in that format. It's clearly something we all agree we need and
> having an implementation that works, even if it's not in your preferred
> execution domain, is vastly better than nothing IMO.  
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> I don't have cycles (nor background ;) ) to do that, but it sounds
> like you do Jaydeep given the implementation you have on a private fork +
> design.  
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> A non-exhaustive list of things that might be useful incorporating
> into or referencing from a CEP:  
> >
> >>>>>>>>>
> 
> >>>>>>>>> Slack thread: <https://the-
> asf.slack.com/archives/CK23JSY2K/p1690225062383619>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> Joey's old C* ticket:
> <https://issues.apache.org/jira/browse/CASSANDRA-14346>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> Even older automatic repair scheduling:
> <https://issues.apache.org/jira/browse/CASSANDRA-10070>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> Your design gdoc: <https://docs.google.com/document/d/1CJWxjEi-
> mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> PR with automated repair:
> <https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c>
>   
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> My intuition is that we're all basically in agreement that this is
> something the DB needs, we're all willing to bikeshed for our personal
> preference on where it lives and how it's implemented, and at the end of the
> day, code talks. I don't think anyone's said they'll die on the hill of
> implementation details, so that feels like CEP time to me.  
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> If you were willing and able to get a CEP together for automated
> repair based on the above material, given you've done the work and have the
> proof points it's working at scale, I think this would be a  _huge
> contribution_ to the community.  
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>>>>
> 
> >>>>>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:  
> >
> >>>>>>>>>
> 
> >>>>>>>>>> Is anyone going to file an official CEP for this?  
> >
> >>>>>>>>>>
> 
> >>>>>>>>>> As mentioned in this email thread, here is one of the solution's
> [design doc](https://docs.google.com/document/d/1CJWxjEi-
> mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0) and source
> code on a private Apache Cassandra patch. Could you go through it and let me
> know what you think?  
> >
> >>>>>>>>>>
> 
> >>>>>>>>>>  
> >
> >>>>>>>>>>
> 
> >>>>>>>>>> Jaydeep  
> >
> >>>>>>>>>>
> 
> >>>>>>>>>>  
> >
> >>>>>>>>>>
> 
> >>>>>>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>>>>>>>>
> 
> >>>>>>>>>>> > That said I would happily support an effort to bring repair
> scheduling to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>>  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> This is something I hadn't thought much about, and is a pretty
> good argument for using the sidecar initially.  There's a lot of deployments
> out there and having an official repair option would be a big win.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>>  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>>  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > I agree that it would be ideal for Cassandra to have a repair
> scheduler in-DB.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > That said I would happily support an effort to bring repair
> scheduling to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > Once TCM has landed, we’ll have much stronger primitives for
> repair orchestration in the database itself. But I don’t think that should
> block progress on a repair scheduling solution in the sidecar, and there is
> nothing that would prevent someone from continuing to use a sidecar-based
> solution in perpetuity if they preferred.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > \- Scott  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > > I'm 100% in favor of repair being part of the core DB, not
> the sidecar.  The current (and past) state of things where running the DB
> correctly *requires* running a separate process (either community maintained
> or official C* sidecar) is incredibly painful for folks.  The idea that your
> data integrity needs to be opt-in has never made sense to me from the
> perspective of either the product or the end user.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > > I've worked with way too many teams that have either
> configured this incorrectly or not at all.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > > Ideally Cassandra would ship with repair built in and on by
> default.  Power users can disable if they want to continue to maintain their
> own repair tooling for some reason.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > > Jon  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> All,  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> We had a brief discussion in [2] about the Uber article [1]
> where they talk about having integrated repair into Cassandra and how great
> that is. I expressed my disappointment that they didn't work with the
> community on that (Uber, if you are listening time to make amends 🙂) and it
> turns out Joey already had the idea and wrote the code [3] - so I wanted to
> start a discussion to gauge interest and maybe how to revive that effort.  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> Thanks,  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> German  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> [1] <https://www.uber.com/blog/how-uber-optimized-cassandra-
> operations-at-scale/>  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> [2] <https://the-
> asf.slack.com/archives/CK23JSY2K/p1690225062383619>  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> > >> [3] <https://issues.apache.org/jira/browse/CASSANDRA-14346>  
> >
> >>>>>>>>>>>
> 
> >>>>>>>>>>> >  
> >
> >>>>>>>>>
> 
> >>>>>>>>>  
> >
> >>>>>>
> 
> >>>>>>  
> >
> 
>

Re: [Discuss] Repair inside C*

Reply via email to