Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

Jordan West Fri, 20 Sep 2024 11:22:49 -0700

+1 to Benedict’s (and others) comments on plugability and low overhead when
disabled. The latter I think needs little justification. The reason I am
big on the former is, in my opinion: decisions on approach need to be
settled with numbers not anecdotes or past experience (including my own).
So I would like to see us compare different approaches (what metrics to
use, etc).


Personally, I’m a bit skeptical that we will come up with a metric based
heuristic that works well in most scenarios and doesn’t require significant
knowledge and tuning. I think past implementations of the dynamic snitch
are good evidence of that. However, I expressed the same concerns
internally for a client level project where we exposed metrics to induce
back pressure and early experiments are encouraging / contrary to my
expectations. At different layers different approaches can work better or
worse. Same with different workloads. I don’t think we should dismiss
approaches out right in this thread without hard numbers.

In short, I think the testing and evaluation of this CEP is as important as
its design and implementation. We will need to test a wide variety of
workloads and potentially implementations and that’s where pluggability
will be a huge benefit. I would go as far as saying the CEP should focus
more on a framework for pluggable implementations that has low to zero cost
when disabled than a specific set of metrics to use or specific approach.

Jordan

On Thu, Sep 19, 2024 at 14:38 Benedict Elliott Smith <[email protected]>
wrote:

> I just want to flag here that this is a topic I have strong opinions on,
> but the CEP is not really specific or detailed enough to understand
> precisely how it will be implemented. So, if a patch is already being
> produced, most of my feedback is likely to be provided some time after a
> patch appears, through the normal review process. I want to flag this now
> to avoid any surprise.
>
> I will say that upfront that, ideally, this system should be designed to
> have ~zero overhead when disabled, and with minimal coupling (between its
> own components and C* itself), so that entirely orthogonal approaches can
> be integrated in future without polluting the codebase.
>
>
> On 19 Sep 2024, at 19:14, Patrick McFadin <[email protected]> wrote:
>
> The work has begun but we don't have a VOTE thread for this CEP. Can one
> get started?
>
> On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> Sure, Caleb. I will include the work as part of CASSANDRA-19534
>> <https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41.
>>
>> Jaydeep
>>
>> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <[email protected]>
>> wrote:
>>
>>> FYI, there is some ongoing sort-of-related work going on in
>>> CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534>
>>>
>>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
>>>> Just created an official CEP-41
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter>
>>>> incorporating the feedback from this discussion. Feel free to let me know
>>>> if I may have missed some important feedback in this thread that is not
>>>> captured in the CEP-41.
>>>>
>>>> Jaydeep
>>>>
>>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
>>>> [email protected]> wrote:
>>>>
>>>>> Thanks, Josh. I will file an official CEP with all the details in a
>>>>> few days and update this thread with that CEP number.
>>>>> Thanks a lot everyone for providing valuable insights!
>>>>>
>>>>> Jaydeep
>>>>>
>>>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Do folks think we should file an official CEP and take it there?
>>>>>>
>>>>>> +1 here.
>>>>>>
>>>>>> Synthesizing your gdoc, Caleb's work, and the feedback from this
>>>>>> thread into a draft seems like a solid next step.
>>>>>>
>>>>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>>>>>
>>>>>> I see a lot of great ideas being discussed or proposed in the past to
>>>>>> cover the most common rate limiter candidate use cases. Do folks think we
>>>>>> should file an official CEP and take it there?
>>>>>>
>>>>>> Jaydeep
>>>>>>
>>>>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> I just remembered the other day that I had done a quick writeup on
>>>>>> the state of compaction stress-related throttling in the project:
>>>>>>
>>>>>>
>>>>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>>>>>
>>>>>> I'm sure most of it is old news to the people on this thread, but I
>>>>>> figured I'd post it just in case :)
>>>>>>
>>>>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 2.) We should make sure the links between the "known" root causes of
>>>>>> cascading failures and the mechanisms we introduce to avoid them remain
>>>>>> very strong.
>>>>>>
>>>>>> Seems to me that our historical strategy was to address individual
>>>>>> known cases one-by-one rather than looking for a more holistic
>>>>>> load-balancing and load-shedding solution. While the engineer in me likes
>>>>>> the elegance of a broad, more-inclusive *actual SEDA-like* approach,
>>>>>> the pragmatist in me wonders how far we think we are today from a stable
>>>>>> set-point.
>>>>>>
>>>>>> i.e. are we facing a handful of cases where nodes can still get
>>>>>> pushed over and then cascade that we can surgically address, or are we
>>>>>> facing a broader lack of back-pressure that rears its head in different
>>>>>> domains (client -> coordinator, coordinator -> replica, internode with
>>>>>> other operations, etc) at surprising times and should be considered more
>>>>>> holistically?
>>>>>>
>>>>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>>>>>
>>>>>> I almost forgot CASSANDRA-15817, which introduced
>>>>>> reject_repair_compaction_threshold, which provides a mechanism to stop
>>>>>> repairs while compaction is underwater.
>>>>>>
>>>>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> 
>>>>>> Hey all,
>>>>>>
>>>>>> I'm a bit late to the discussion. I see that we've already discussed
>>>>>> CASSANDRA-15013
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-15013> and
>>>>>> CASSANDRA-16663
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in
>>>>>> passing. Having written the latter, I'd be the first to admit it's a 
>>>>>> crude
>>>>>> tool, although it's been useful here and there, and provides a couple
>>>>>> primitives that may be useful for future work. As Scott mentions, while 
>>>>>> it
>>>>>> is configurable at runtime, it is not adaptive, although we did
>>>>>> make configuration easier in CASSANDRA-17423
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is
>>>>>> global to the node, although we've lightly discussed some ideas around
>>>>>> making it more granular. (For example, keyspace-based limiting, or 
>>>>>> limiting
>>>>>> "domains" tagged by the client in requests, could be interesting.) It 
>>>>>> also
>>>>>> does not deal with inter-node traffic, of course.
>>>>>>
>>>>>> Something we've not yet mentioned (that does address internode
>>>>>> traffic) is CASSANDRA-17324
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I
>>>>>> proposed shortly after working on the native request limiter (and have 
>>>>>> just
>>>>>> not had much time to return to). The basic idea is this:
>>>>>>
>>>>>> When a node is struggling under the weight of a compaction backlog
>>>>>> and becomes a cause of increased read latency for clients, we have two
>>>>>> safety valves:
>>>>>>
>>>>>>
>>>>>> 1.) Disabling the native protocol server, which stops the node from
>>>>>> coordinating reads and writes.
>>>>>> 2.) Jacking up the severity on the node, which tells the dynamic
>>>>>> snitch to avoid the node for reads from other coordinators.
>>>>>>
>>>>>>
>>>>>> These are useful, but we don’t appear to have any mechanism that
>>>>>> would allow us to temporarily reject internode hint, batch, and mutation
>>>>>> messages that could further delay resolution of the compaction backlog.
>>>>>>
>>>>>>
>>>>>> Whether it's done as part of a larger framework or on its own, it
>>>>>> still feels like a good idea.
>>>>>>
>>>>>> Thinking in terms of opportunity costs here (i.e. where we spend our
>>>>>> finite engineering time to holistically improve the experience of 
>>>>>> operating
>>>>>> this database) is healthy, but we probably haven't reached the point of
>>>>>> diminishing returns on nodes being able to protect themselves from 
>>>>>> clients
>>>>>> and from other nodes. I would just keep in mind two things:
>>>>>>
>>>>>> 1.) The effectiveness of rate-limiting in the system (which includes
>>>>>> the database and all clients) as a whole necessarily decreases as we move
>>>>>> from the application to the lowest-level database internals. Limiting
>>>>>> correctly at the client will save more resources than limiting at the
>>>>>> native protocol server, and limiting correctly at the native protocol
>>>>>> server will save more resources than limiting after we've dispatched
>>>>>> requests to some thread pool for processing.
>>>>>> 2.) We should make sure the links between the "known" root causes of
>>>>>> cascading failures and the mechanisms we introduce to avoid them remain
>>>>>> very strong.
>>>>>>
>>>>>> In any case, I'd be happy to help out in any way I can as this moves
>>>>>> forward (especially as it relates to our past/current attempts to address
>>>>>> this problem space).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

Reply via email to