Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jordan West
I agree with Josh. We need to be able to protect from a sudden burst of
traffic. 19534 went a long way in that regard — at least wrt to minimizing
the effects. The challenge with latency and queue depths can be that they
trigger when the bad has already occurred.

One other thing we are considering internally in that project I mentioned
is having separate thresholds. One that starts to slow traffic and one that
rejects it.

All this said, I think our intuition can guide us to initial
implementations but we I’m still pretty adamant that our decisions should
be guided by testing and numbers for this CEP.

Jordan

On Sat, Sep 21, 2024 at 04:36 Josh McKenzie  wrote:

> Are those three sufficient to protect against a client that unexpectedly
> comes up with 100x a previous provisioned-for workload? Or 100 clients at
> 100x concurrently? Given that can be 100x in terms of quantity (helped by
> queueing and shedding), but also 100x in terms of *computational and disk
> implications*? We don't have control over what users do, and while in an
> ideal world client applications have fairly predictable workloads over
> time, in practice things spike around various events for different
> applications.
>
> i.e. 1 thing in a queue could produce orders of magnitude more work than 1
> other thing in a queue. Especially as we move into a world with SAI and
> Accord.
>
> we need to solve load balancing (summarized by the above three points)
> before we start working on the rate limiter
>
> I see these 2 things as complementary, not as interdependent. Is there
> something I'm missing?
>
> At least for me, mentally I frame this as "How can nodes protect
> themselves from variable and changing user behavior in a way that's
> minimally disruptive to the user and requires as little configuration as
> possible for operators?". Basically, how do we keep the limits of node
> performance from leaking into the scope of user awareness and
> responsibility outside simply pushing various exceptions to the client to
> indicate what's going on (OverloadedException to the client, etc).
>
> It seems to me both rate limiting and resource balancing are integral
> parts of this, but also parts that could be worked on independently. Were
> we to wave a magic wand tomorrow and have robust rate limiting, as we
> improved load balancing over time it would raise the ceiling at which rate
> limiting kicked in.
>
> So concretely to the thread, I think I agree with Jon:
>
> * use the rate of timeouts to limit the depth of the queues for each of
> the thread pools
> * reject requests when the queue is full with an OverloadedException.
>
> followed by:
>
> If you want to follow this up with the ability to dynamically resize
> thread pools that could be interesting.
>
>
> Simple is very much a feature here.
>
> On Sat, Sep 21, 2024, at 5:20 AM, Alex Petrov wrote:
>
> > Personally, I’m a bit skeptical that we will come up with a metric based
> heuristic that works well in most scenarios and doesn’t require significant
> knowledge and tuning. I think past implementations of the dynamic snitch
> are good evidence of that.
>
> I am more optimistic on that font. I think we can achieve a lot. However,
> in my opinion, we need to focus on balancing the load rather than rate
> limiting. Rate limiting is going to be important if/when we decide to
> implement workload isolation. Until then, I think we should focus on three
> things:
>
>   * Node health (Nodes should produce useful work and should be stable and
> not overloaded)
>   * Latency (we always need to find an optimal way to process request and 
> minimize
> overall queueing time)
>   * Fairness (avoid workload and utilization imbalances)
>
> All three points are achievable with very straightforward approaches that
> will not require much operator involvement.
>
> I guess my main point is we need to solve load balancing (summarized by
> the above three points) before we start working on the rate limiter, but
> there's a good chance we may not need one apart from use cases that require
> workload isolation.
>
>
> On Fri, Sep 20, 2024, at 8:14 PM, Jordan West wrote:
>
> +1 to Benedict’s (and others) comments on plugability and low overhead
> when disabled. The latter I think needs little justification. The reason I
> am big on the former is, in my opinion: decisions on approach need to be
> settled with numbers not anecdotes or past experience (including my own).
> So I would like to see us compare different approaches (what metrics to
> use, etc).
>
> Personally, I’m a bit skeptical that we will come up with a metric based
> heuristic that works well in most scenarios and doesn’t require significant
> knowledge and tuning. I think past implementations of the dynamic snitch
> are good evidence of that. However, I expressed the same concerns
> internally for a client level project where we exposed metrics to induce
> back pressure and early experiments are encouraging / contrary to my
> expectations. At differ

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Josh McKenzie
Are those three sufficient to protect against a client that unexpectedly comes 
up with 100x a previous provisioned-for workload? Or 100 clients at 100x 
concurrently? Given that can be 100x in terms of quantity (helped by queueing 
and shedding), but also 100x in terms of *computational and disk implications*? 
We don't have control over what users do, and while in an ideal world client 
applications have fairly predictable workloads over time, in practice things 
spike around various events for different applications.

i.e. 1 thing in a queue could produce orders of magnitude more work than 1 
other thing in a queue. Especially as we move into a world with SAI and Accord.

> we need to solve load balancing (summarized by the above three points) before 
> we start working on the rate limiter
I see these 2 things as complementary, not as interdependent. Is there 
something I'm missing?

At least for me, mentally I frame this as "How can nodes protect themselves 
from variable and changing user behavior in a way that's minimally disruptive 
to the user and requires as little configuration as possible for operators?". 
Basically, how do we keep the limits of node performance from leaking into the 
scope of user awareness and responsibility outside simply pushing various 
exceptions to the client to indicate what's going on (OverloadedException to 
the client, etc).

It seems to me both rate limiting and resource balancing are integral parts of 
this, but also parts that could be worked on independently. Were we to wave a 
magic wand tomorrow and have robust rate limiting, as we improved load 
balancing over time it would raise the ceiling at which rate limiting kicked in.

So concretely to the thread, I think I agree with Jon:
> * use the rate of timeouts to limit the depth of the queues for each of the 
> thread pools
> * reject requests when the queue is full with an OverloadedException.
followed by:
> If you want to follow this up with the ability to dynamically resize thread 
> pools that could be interesting.

Simple is very much a feature here.

On Sat, Sep 21, 2024, at 5:20 AM, Alex Petrov wrote:
> > Personally, I’m a bit skeptical that we will come up with a metric based 
> > heuristic that works well in most scenarios and doesn’t require significant 
> > knowledge and tuning. I think past implementations of the dynamic snitch 
> > are good evidence of that.
> 
> I am more optimistic on that font. I think we can achieve a lot. However, in 
> my opinion, we need to focus on balancing the load rather than rate limiting. 
> Rate limiting is going to be important if/when we decide to implement 
> workload isolation. Until then, I think we should focus on three things:
> 
>   * Node health (Nodes should produce useful work and should be stable and 
> not overloaded)
>   * Latency (we always need to find an optimal way to process request and 
> minimize overall queueing time)
>   * Fairness (avoid workload and utilization imbalances)
> 
> All three points are achievable with very straightforward approaches that 
> will not require much operator involvement.
> 
> I guess my main point is we need to solve load balancing (summarized by the 
> above three points) before we start working on the rate limiter, but there's 
> a good chance we may not need one apart from use cases that require workload 
> isolation. 
> 
> 
> On Fri, Sep 20, 2024, at 8:14 PM, Jordan West wrote:
>> +1 to Benedict’s (and others) comments on plugability and low overhead when 
>> disabled. The latter I think needs little justification. The reason I am big 
>> on the former is, in my opinion: decisions on approach need to be settled 
>> with numbers not anecdotes or past experience (including my own). So I would 
>> like to see us compare different approaches (what metrics to use, etc). 
>> 
>> Personally, I’m a bit skeptical that we will come up with a metric based 
>> heuristic that works well in most scenarios and doesn’t require significant 
>> knowledge and tuning. I think past implementations of the dynamic snitch are 
>> good evidence of that. However, I expressed the same concerns internally for 
>> a client level project where we exposed metrics to induce back pressure and 
>> early experiments are encouraging / contrary to my expectations. At 
>> different layers different approaches can work better or worse. Same with 
>> different workloads. I don’t think we should dismiss approaches out right in 
>> this thread without hard numbers. 
>> 
>> In short, I think the testing and evaluation of this CEP is as important as 
>> its design and implementation. We will need to test a wide variety of 
>> workloads and potentially implementations and that’s where pluggability will 
>> be a huge benefit. I would go as far as saying the CEP should focus more on 
>> a framework for pluggable implementations that has low to zero cost when 
>> disabled than a specific set of metrics to use or specific approach. 
>> 
>> Jordan 
>> 
>> On Thu, S

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jon Haddad
Can you elaborate what “the bad” is here? Maybe a scenario would help. I’m
trying to visualize what kind of workload would be running where you
wouldn’t have timeouts or a deep queue yet a node is overloaded.  What is
“the bad” if requests aren’t timing out? How is a node overloaded if there
isn’t work queued up?

The levers we have under our control are concurrent work and the queue of
work. Applying a rate limiter to the queue is a roundabout way of limiting
the queue depth, so i view it as unnecessary in a world where the queue has
a limit.

Dynamically sizing the depth of the queue based on timeouts allows us to
adjust it based on the heuristic of the amount of work that was able to be
completed successfully in the past.  If you know that the queue has seen
timeouts when it gets deeper than 100k requests, you stop accepting
requests after 100k is reached.  The rate of rejection will be roughly
correlated to the number of requests that would otherwise timeout.
Basically we say, we can’t serve you, so don’t even get in line, and we use
the length of the line as our throttle.

Fwiw, I fully acknowledge that I didn’t invent any of this, it’s how
hardware works already. I’m just applying the same concepts to our software
queues that cpu, disks, and network cards already successfully use.

Jon

On Sat, Sep 21, 2024 at 12:05 PM Jordan West  wrote:

> I agree with Josh. We need to be able to protect from a sudden burst of
> traffic. 19534 went a long way in that regard — at least wrt to minimizing
> the effects. The challenge with latency and queue depths can be that they
> trigger when the bad has already occurred.
>
> One other thing we are considering internally in that project I mentioned
> is having separate thresholds. One that starts to slow traffic and one that
> rejects it.
>
> All this said, I think our intuition can guide us to initial
> implementations but we I’m still pretty adamant that our decisions should
> be guided by testing and numbers for this CEP.
>
> Jordan
>
> On Sat, Sep 21, 2024 at 04:36 Josh McKenzie  wrote:
>
>> Are those three sufficient to protect against a client that unexpectedly
>> comes up with 100x a previous provisioned-for workload? Or 100 clients at
>> 100x concurrently? Given that can be 100x in terms of quantity (helped by
>> queueing and shedding), but also 100x in terms of *computational and
>> disk implications*? We don't have control over what users do, and while
>> in an ideal world client applications have fairly predictable workloads
>> over time, in practice things spike around various events for different
>> applications.
>>
>> i.e. 1 thing in a queue could produce orders of magnitude more work than
>> 1 other thing in a queue. Especially as we move into a world with SAI and
>> Accord.
>>
>> we need to solve load balancing (summarized by the above three points)
>> before we start working on the rate limiter
>>
>> I see these 2 things as complementary, not as interdependent. Is there
>> something I'm missing?
>>
>> At least for me, mentally I frame this as "How can nodes protect
>> themselves from variable and changing user behavior in a way that's
>> minimally disruptive to the user and requires as little configuration as
>> possible for operators?". Basically, how do we keep the limits of node
>> performance from leaking into the scope of user awareness and
>> responsibility outside simply pushing various exceptions to the client to
>> indicate what's going on (OverloadedException to the client, etc).
>>
>> It seems to me both rate limiting and resource balancing are integral
>> parts of this, but also parts that could be worked on independently. Were
>> we to wave a magic wand tomorrow and have robust rate limiting, as we
>> improved load balancing over time it would raise the ceiling at which rate
>> limiting kicked in.
>>
>> So concretely to the thread, I think I agree with Jon:
>>
>> * use the rate of timeouts to limit the depth of the queues for each of
>> the thread pools
>> * reject requests when the queue is full with an OverloadedException.
>>
>> followed by:
>>
>> If you want to follow this up with the ability to dynamically resize
>> thread pools that could be interesting.
>>
>>
>> Simple is very much a feature here.
>>
>> On Sat, Sep 21, 2024, at 5:20 AM, Alex Petrov wrote:
>>
>> > Personally, I’m a bit skeptical that we will come up with a metric
>> based heuristic that works well in most scenarios and doesn’t require
>> significant knowledge and tuning. I think past implementations of the
>> dynamic snitch are good evidence of that.
>>
>> I am more optimistic on that font. I think we can achieve a lot. However,
>> in my opinion, we need to focus on balancing the load rather than rate
>> limiting. Rate limiting is going to be important if/when we decide to
>> implement workload isolation. Until then, I think we should focus on three
>> things:
>>
>>   * Node health (Nodes should produce useful work and should be stable
>> and not overloaded)
>>   *

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jon Haddad
Oh, one last thing.  If the client drivers were to implement a rate limiter
based on each node's error rate, and had the ability to back off, paired
with CASSANDRA-19534 ,
I think the majority of severe cluster outages that people experience would
simply disappear, making fast rejection of requests at the server side more
of an optimization than a necessity.

There was a nice Netflix post about it here [1], and was the inspiration
behind my latency target mode that's in easy-cass-stress.

https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581

Jon

On Sat, Sep 21, 2024 at 12:38 PM Jon Haddad  wrote:

> Can you elaborate what “the bad” is here? Maybe a scenario would help. I’m
> trying to visualize what kind of workload would be running where you
> wouldn’t have timeouts or a deep queue yet a node is overloaded.  What is
> “the bad” if requests aren’t timing out? How is a node overloaded if there
> isn’t work queued up?
>
> The levers we have under our control are concurrent work and the queue of
> work. Applying a rate limiter to the queue is a roundabout way of limiting
> the queue depth, so i view it as unnecessary in a world where the queue has
> a limit.
>
> Dynamically sizing the depth of the queue based on timeouts allows us to
> adjust it based on the heuristic of the amount of work that was able to be
> completed successfully in the past.  If you know that the queue has seen
> timeouts when it gets deeper than 100k requests, you stop accepting
> requests after 100k is reached.  The rate of rejection will be roughly
> correlated to the number of requests that would otherwise timeout.
> Basically we say, we can’t serve you, so don’t even get in line, and we use
> the length of the line as our throttle.
>
> Fwiw, I fully acknowledge that I didn’t invent any of this, it’s how
> hardware works already. I’m just applying the same concepts to our software
> queues that cpu, disks, and network cards already successfully use.
>
> Jon
>
> On Sat, Sep 21, 2024 at 12:05 PM Jordan West  wrote:
>
>> I agree with Josh. We need to be able to protect from a sudden burst of
>> traffic. 19534 went a long way in that regard — at least wrt to minimizing
>> the effects. The challenge with latency and queue depths can be that they
>> trigger when the bad has already occurred.
>>
>> One other thing we are considering internally in that project I mentioned
>> is having separate thresholds. One that starts to slow traffic and one that
>> rejects it.
>>
>> All this said, I think our intuition can guide us to initial
>> implementations but we I’m still pretty adamant that our decisions should
>> be guided by testing and numbers for this CEP.
>>
>> Jordan
>>
>> On Sat, Sep 21, 2024 at 04:36 Josh McKenzie  wrote:
>>
>>> Are those three sufficient to protect against a client that unexpectedly
>>> comes up with 100x a previous provisioned-for workload? Or 100 clients at
>>> 100x concurrently? Given that can be 100x in terms of quantity (helped by
>>> queueing and shedding), but also 100x in terms of *computational and
>>> disk implications*? We don't have control over what users do, and while
>>> in an ideal world client applications have fairly predictable workloads
>>> over time, in practice things spike around various events for different
>>> applications.
>>>
>>> i.e. 1 thing in a queue could produce orders of magnitude more work than
>>> 1 other thing in a queue. Especially as we move into a world with SAI and
>>> Accord.
>>>
>>> we need to solve load balancing (summarized by the above three points)
>>> before we start working on the rate limiter
>>>
>>> I see these 2 things as complementary, not as interdependent. Is there
>>> something I'm missing?
>>>
>>> At least for me, mentally I frame this as "How can nodes protect
>>> themselves from variable and changing user behavior in a way that's
>>> minimally disruptive to the user and requires as little configuration as
>>> possible for operators?". Basically, how do we keep the limits of node
>>> performance from leaking into the scope of user awareness and
>>> responsibility outside simply pushing various exceptions to the client to
>>> indicate what's going on (OverloadedException to the client, etc).
>>>
>>> It seems to me both rate limiting and resource balancing are integral
>>> parts of this, but also parts that could be worked on independently. Were
>>> we to wave a magic wand tomorrow and have robust rate limiting, as we
>>> improved load balancing over time it would raise the ceiling at which rate
>>> limiting kicked in.
>>>
>>> So concretely to the thread, I think I agree with Jon:
>>>
>>> * use the rate of timeouts to limit the depth of the queues for each of
>>> the thread pools
>>> * reject requests when the queue is full with an OverloadedException.
>>>
>>> followed by:
>>>
>>> If you want to follow this up with the ability to dynamically resize
>>> thread pools that could be interesting.
>>>

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Alex Petrov
> Personally, I’m a bit skeptical that we will come up with a metric based 
> heuristic that works well in most scenarios and doesn’t require significant 
> knowledge and tuning. I think past implementations of the dynamic snitch are 
> good evidence of that.

I am more optimistic on that font. I think we can achieve a lot. However, in my 
opinion, we need to focus on balancing the load rather than rate limiting. Rate 
limiting is going to be important if/when we decide to implement workload 
isolation. Until then, I think we should focus on three things:

  * Node health (Nodes should produce useful work and should be stable and not 
overloaded)
  * Latency (we always need to find an optimal way to process request and 
minimize overall queueing time)
  * Fairness (avoid workload and utilization imbalances)

All three points are achievable with very straightforward approaches that will 
not require much operator involvement.

I guess my main point is we need to solve load balancing (summarized by the 
above three points) before we start working on the rate limiter, but there's a 
good chance we may not need one apart from use cases that require workload 
isolation. 


On Fri, Sep 20, 2024, at 8:14 PM, Jordan West wrote:
> +1 to Benedict’s (and others) comments on plugability and low overhead when 
> disabled. The latter I think needs little justification. The reason I am big 
> on the former is, in my opinion: decisions on approach need to be settled 
> with numbers not anecdotes or past experience (including my own). So I would 
> like to see us compare different approaches (what metrics to use, etc). 
> 
> Personally, I’m a bit skeptical that we will come up with a metric based 
> heuristic that works well in most scenarios and doesn’t require significant 
> knowledge and tuning. I think past implementations of the dynamic snitch are 
> good evidence of that. However, I expressed the same concerns internally for 
> a client level project where we exposed metrics to induce back pressure and 
> early experiments are encouraging / contrary to my expectations. At different 
> layers different approaches can work better or worse. Same with different 
> workloads. I don’t think we should dismiss approaches out right in this 
> thread without hard numbers. 
> 
> In short, I think the testing and evaluation of this CEP is as important as 
> its design and implementation. We will need to test a wide variety of 
> workloads and potentially implementations and that’s where pluggability will 
> be a huge benefit. I would go as far as saying the CEP should focus more on a 
> framework for pluggable implementations that has low to zero cost when 
> disabled than a specific set of metrics to use or specific approach. 
> 
> Jordan 
> 
> On Thu, Sep 19, 2024 at 14:38 Benedict Elliott Smith  
> wrote:
>> I just want to flag here that this is a topic I have strong opinions on, but 
>> the CEP is not really specific or detailed enough to understand precisely 
>> how it will be implemented. So, if a patch is already being produced, most 
>> of my feedback is likely to be provided some time after a patch appears, 
>> through the normal review process. I want to flag this now to avoid any 
>> surprise.
>> 
>> I will say that upfront that, ideally, this system should be designed to 
>> have ~zero overhead when disabled, and with minimal coupling (between its 
>> own components and C* itself), so that entirely orthogonal approaches can be 
>> integrated in future without polluting the codebase.
>> 
>> 
>>> On 19 Sep 2024, at 19:14, Patrick McFadin  wrote:
>>> 
>>> The work has begun but we don't have a VOTE thread for this CEP. Can one 
>>> get started?
>>> 
>>> On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia 
>>>  wrote:
 Sure, Caleb. I will include the work as part of CASSANDRA-19534 
  in the CEP-41.
 
 Jaydeep
 
 On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe  
 wrote:
> FYI, there is some ongoing sort-of-related work going on in 
> CASSANDRA-19534 
> 
> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia 
>  wrote:
>> Just created an official CEP-41 
>> 
>>  incorporating the feedback from this discussion. Feel free to let me 
>> know if I may have missed some important feedback in this thread that is 
>> not captured in the CEP-41.
>> 
>> Jaydeep
>> 
>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia 
>>  wrote:
>>> Thanks, Josh. I will file an official CEP with all the details in a few 
>>> days and update this thread with that CEP number.
>>> Thanks a lot everyone for providing valuable insights!
>>> 
>>> Jaydeep
>>> 
>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie  
>>> wrote:
>>