+1 to Benedict’s (and others) comments on plugability and low overhead when disabled. The latter I think needs little justification. The reason I am big on the former is, in my opinion: decisions on approach need to be settled with numbers not anecdotes or past experience (including my own). So I would like to see us compare different approaches (what metrics to use, etc).
Personally, I’m a bit skeptical that we will come up with a metric based heuristic that works well in most scenarios and doesn’t require significant knowledge and tuning. I think past implementations of the dynamic snitch are good evidence of that. However, I expressed the same concerns internally for a client level project where we exposed metrics to induce back pressure and early experiments are encouraging / contrary to my expectations. At different layers different approaches can work better or worse. Same with different workloads. I don’t think we should dismiss approaches out right in this thread without hard numbers. In short, I think the testing and evaluation of this CEP is as important as its design and implementation. We will need to test a wide variety of workloads and potentially implementations and that’s where pluggability will be a huge benefit. I would go as far as saying the CEP should focus more on a framework for pluggable implementations that has low to zero cost when disabled than a specific set of metrics to use or specific approach. Jordan On Thu, Sep 19, 2024 at 14:38 Benedict Elliott Smith <bened...@apache.org> wrote: > I just want to flag here that this is a topic I have strong opinions on, > but the CEP is not really specific or detailed enough to understand > precisely how it will be implemented. So, if a patch is already being > produced, most of my feedback is likely to be provided some time after a > patch appears, through the normal review process. I want to flag this now > to avoid any surprise. > > I will say that upfront that, ideally, this system should be designed to > have ~zero overhead when disabled, and with minimal coupling (between its > own components and C* itself), so that entirely orthogonal approaches can > be integrated in future without polluting the codebase. > > > On 19 Sep 2024, at 19:14, Patrick McFadin <pmcfa...@gmail.com> wrote: > > The work has begun but we don't have a VOTE thread for this CEP. Can one > get started? > > On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia < > chovatia.jayd...@gmail.com> wrote: > >> Sure, Caleb. I will include the work as part of CASSANDRA-19534 >> <https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41. >> >> Jaydeep >> >> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <calebrackli...@gmail.com> >> wrote: >> >>> FYI, there is some ongoing sort-of-related work going on in >>> CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534> >>> >>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia < >>> chovatia.jayd...@gmail.com> wrote: >>> >>>> Just created an official CEP-41 >>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter> >>>> incorporating the feedback from this discussion. Feel free to let me know >>>> if I may have missed some important feedback in this thread that is not >>>> captured in the CEP-41. >>>> >>>> Jaydeep >>>> >>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia < >>>> chovatia.jayd...@gmail.com> wrote: >>>> >>>>> Thanks, Josh. I will file an official CEP with all the details in a >>>>> few days and update this thread with that CEP number. >>>>> Thanks a lot everyone for providing valuable insights! >>>>> >>>>> Jaydeep >>>>> >>>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmcken...@apache.org> >>>>> wrote: >>>>> >>>>>> Do folks think we should file an official CEP and take it there? >>>>>> >>>>>> +1 here. >>>>>> >>>>>> Synthesizing your gdoc, Caleb's work, and the feedback from this >>>>>> thread into a draft seems like a solid next step. >>>>>> >>>>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote: >>>>>> >>>>>> I see a lot of great ideas being discussed or proposed in the past to >>>>>> cover the most common rate limiter candidate use cases. Do folks think we >>>>>> should file an official CEP and take it there? >>>>>> >>>>>> Jaydeep >>>>>> >>>>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe < >>>>>> calebrackli...@gmail.com> wrote: >>>>>> >>>>>> I just remembered the other day that I had done a quick writeup on >>>>>> the state of compaction stress-related throttling in the project: >>>>>> >>>>>> >>>>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing >>>>>> >>>>>> I'm sure most of it is old news to the people on this thread, but I >>>>>> figured I'd post it just in case :) >>>>>> >>>>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie <jmcken...@apache.org> >>>>>> wrote: >>>>>> >>>>>> >>>>>> 2.) We should make sure the links between the "known" root causes of >>>>>> cascading failures and the mechanisms we introduce to avoid them remain >>>>>> very strong. >>>>>> >>>>>> Seems to me that our historical strategy was to address individual >>>>>> known cases one-by-one rather than looking for a more holistic >>>>>> load-balancing and load-shedding solution. While the engineer in me likes >>>>>> the elegance of a broad, more-inclusive *actual SEDA-like* approach, >>>>>> the pragmatist in me wonders how far we think we are today from a stable >>>>>> set-point. >>>>>> >>>>>> i.e. are we facing a handful of cases where nodes can still get >>>>>> pushed over and then cascade that we can surgically address, or are we >>>>>> facing a broader lack of back-pressure that rears its head in different >>>>>> domains (client -> coordinator, coordinator -> replica, internode with >>>>>> other operations, etc) at surprising times and should be considered more >>>>>> holistically? >>>>>> >>>>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote: >>>>>> >>>>>> I almost forgot CASSANDRA-15817, which introduced >>>>>> reject_repair_compaction_threshold, which provides a mechanism to stop >>>>>> repairs while compaction is underwater. >>>>>> >>>>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe < >>>>>> calebrackli...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Hey all, >>>>>> >>>>>> I'm a bit late to the discussion. I see that we've already discussed >>>>>> CASSANDRA-15013 >>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-15013> and >>>>>> CASSANDRA-16663 >>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in >>>>>> passing. Having written the latter, I'd be the first to admit it's a >>>>>> crude >>>>>> tool, although it's been useful here and there, and provides a couple >>>>>> primitives that may be useful for future work. As Scott mentions, while >>>>>> it >>>>>> is configurable at runtime, it is not adaptive, although we did >>>>>> make configuration easier in CASSANDRA-17423 >>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is >>>>>> global to the node, although we've lightly discussed some ideas around >>>>>> making it more granular. (For example, keyspace-based limiting, or >>>>>> limiting >>>>>> "domains" tagged by the client in requests, could be interesting.) It >>>>>> also >>>>>> does not deal with inter-node traffic, of course. >>>>>> >>>>>> Something we've not yet mentioned (that does address internode >>>>>> traffic) is CASSANDRA-17324 >>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I >>>>>> proposed shortly after working on the native request limiter (and have >>>>>> just >>>>>> not had much time to return to). The basic idea is this: >>>>>> >>>>>> When a node is struggling under the weight of a compaction backlog >>>>>> and becomes a cause of increased read latency for clients, we have two >>>>>> safety valves: >>>>>> >>>>>> >>>>>> 1.) Disabling the native protocol server, which stops the node from >>>>>> coordinating reads and writes. >>>>>> 2.) Jacking up the severity on the node, which tells the dynamic >>>>>> snitch to avoid the node for reads from other coordinators. >>>>>> >>>>>> >>>>>> These are useful, but we don’t appear to have any mechanism that >>>>>> would allow us to temporarily reject internode hint, batch, and mutation >>>>>> messages that could further delay resolution of the compaction backlog. >>>>>> >>>>>> >>>>>> Whether it's done as part of a larger framework or on its own, it >>>>>> still feels like a good idea. >>>>>> >>>>>> Thinking in terms of opportunity costs here (i.e. where we spend our >>>>>> finite engineering time to holistically improve the experience of >>>>>> operating >>>>>> this database) is healthy, but we probably haven't reached the point of >>>>>> diminishing returns on nodes being able to protect themselves from >>>>>> clients >>>>>> and from other nodes. I would just keep in mind two things: >>>>>> >>>>>> 1.) The effectiveness of rate-limiting in the system (which includes >>>>>> the database and all clients) as a whole necessarily decreases as we move >>>>>> from the application to the lowest-level database internals. Limiting >>>>>> correctly at the client will save more resources than limiting at the >>>>>> native protocol server, and limiting correctly at the native protocol >>>>>> server will save more resources than limiting after we've dispatched >>>>>> requests to some thread pool for processing. >>>>>> 2.) We should make sure the links between the "known" root causes of >>>>>> cascading failures and the mechanisms we introduce to avoid them remain >>>>>> very strong. >>>>>> >>>>>> In any case, I'd be happy to help out in any way I can as this moves >>>>>> forward (especially as it relates to our past/current attempts to address >>>>>> this problem space). >>>>>> >>>>>> >>>>>> >>>>>> >