[ANNOUNCE] Apache Cassandra 4.0.12 test artifact available

2024-01-17 Thread Štefan Miklošovič
The test build of Cassandra 4.0.12 is available.

sha1: af752fcd535ccdac69b9fed88047b2dd7625801e
Git: https://github.com/apache/cassandra/tree/4.0.12-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1323/org/apache/cassandra/cassandra-all/4.0.12/

The Source and Build Artifacts, and the Debian and RPM packages and
repositories, are available here:
https://dist.apache.org/repos/dist/dev/cassandra/4.0.12/

A vote of this test build will be initiated within the next couple of days.

[1]: CHANGES.txt:
https://github.com/apache/cassandra/blob/4.0.12-tentative/CHANGES.txt
[2]: NEWS.txt:
https://github.com/apache/cassandra/blob/4.0.12-tentative/NEWS.txt


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread German Eichberger via dev
Jaydeep,

I concur with Stefan that extensibility of this  should be a design goal:

  *   It should be easy to add additional metrics (e.g. write queue depth) and 
decision logic
  *   There should be a way to interact with other systems to signal a resource 
need  which then could kick off things like scaling

Super interested in this and we have been thinking about siimilar things 
internally 😉

Thanks,
German

From: Jaydeep Chovatia 
Sent: Tuesday, January 16, 2024 1:16 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

You don't often get email from chovatia.jayd...@gmail.com. Learn why this is 
important
Hi Stefan,

Please find my response below:
1) Currently, I am keeping the signals as interface, so one can override with a 
different implementation, but a point noted that even the interface APIs could 
be also made dynamic so one can define APIs and its implementation, if they 
wish to override.
2) I've not looked into that yet, but I will look into it and see if it can be 
easily integrated into the Guardrails framework.
3) On the server side, when the framework detects that a node is overloaded, 
then it will throw OverloadedException back to the client. Because if the node 
while busy continues to serve additional requests, then it will slow down other 
peer nodes due to dependencies on meeting the QUORUM, etc. In this, we are at 
least preventing server nodes from melting down, and giving the control to the 
client via OverloadedException. Now, it will be up to the client policy, if 
client wishes to retry immediately on a different server node then eventually 
that server node might be impacted, but if client wishes to do exponential back 
off or throw exception back to the application then that server node will not 
be impacted.


Jaydeep

On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič 
mailto:stefan.mikloso...@gmail.com>> wrote:
Hi Jaydeep,

That seems quite interesting. Couple points though:

1) It would be nice if there is a way to "subscribe" to decisions your 
detection framework comes up with. Integration with e.g. diagnostics subsystem 
would be beneficial. This should be pluggable - just coding up an interface to 
dump / react on the decisions how I want. This might also act as a notifier to 
other systems, e-mail, slack channels ...

2) Have you tried to incorporate this with the Guardrails framework? I think 
that if something is detected to be throttled or rejected (e.g writing to a 
table), there might be a guardrail which would be triggered dynamically in 
runtime. Guardrails are useful as such but here we might reuse them so we do 
not need to code it twice.

3) I am curious how complex this detection framework would be, it can be 
complicated pretty fast I guess. What would be desirable is to act on it in 
such a way that you will not put that node under even more pressure. In other 
words, your detection system should work in such a way that there will not be 
any "doom loop" whereby mere throttling of various parts of Cassandra you make 
it even worse for other nodes in the cluster. For example, if a particular node 
starts to be overwhelmed and you detect this and requests start to be rejected, 
is it not possible that Java driver would start to see this node as "erroneous" 
with delayed response time etc and it would start to prefer other nodes in the 
cluster when deciding what node to contact for query coordination? So you would 
put more load on other nodes, making them more susceptible to be throttled as 
well ...

Regards

Stefan Miklosovic

On Tue, Jan 16, 2024 at 6:41 PM Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:
Hi,

Happy New Year!

I would like to discuss the following idea:

Open-source Cassandra 
(CASSANDRA-15013) has an 
elementary built-in memory rate limiter based on the incoming payload from user 
requests. This rate limiter activates if any incoming user request’s payload 
exceeds certain thresholds. However, the existing rate limiter only solves 
limited-scope issues. Cassandra's server-side meltdown due to overload is a 
known problem. Often we see that a couple of busy nodes take down the entire 
Cassandra ring due to the ripple effect. The following document proposes a 
generic purpose comprehensive rate limiter that works considering system 
signals, such as CPU, and internal signals, such as thread pools. The rate 
limiter will have knobs to filter out internal traffic, system traffic, 
replication traffic, and furthermore based on the types of queries.

More design details to this doc: [OSS] Cassandra Generic Purpose Rate Limiter - 
Google 
Docs

Please let me know your thoughts.

Jaydeep


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread Jaydeep Chovatia
Jon,

The major challenge with latency based rate limiters is that the latency is
subjective from one workload to another. As a result, in the proposal I
have described, the idea is to make decision on the following combinations:

   1. System parameters (such as CPU usage, etc.)
   2. Cassandra thread pools health (are they dropping requests, etc.)

And if these two are +ve then consider the server under pressure. And once
it is under the pressure, then shed the traffic from less aggressive to
more aggressive, etc. The idea is to prevent Cassandra server from melting
(by considering the above two signals to begin with and add any more based
on the learnings)

Scott,

Yes, I did look at some of the implementations, but they are all great
systems and helping quite a lot. But they are still not relying on system
health, etc. and also not in the generic coordinator/replication read/write
path. The idea here is on the similar lines as the existing
implementations, but making it a bit more generic and trying to cover as
many paths as possible.

German,

Sure, let's first continue the discussions here. If it turns out that there
is no widespread interest in the idea then we can do 1:1 and see how we can
help each other on a private fork, etc.

Jaydeep

On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Jaydeep,
>
> I concur with Stefan that extensibility of this  should be a design goal:
>
>- It should be easy to add additional metrics (e.g. write queue depth)
>and decision logic
>- There should be a way to interact with other systems to signal a
>resource need  which then could kick off things like scaling
>
>
> Super interested in this and we have been thinking about siimilar things
> internally 😉
>
> Thanks,
> German
> --
> *From:* Jaydeep Chovatia 
> *Sent:* Tuesday, January 16, 2024 1:16 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in
> Cassandra
>
> You don't often get email from chovatia.jayd...@gmail.com. Learn why this
> is important 
> Hi Stefan,
>
> Please find my response below:
> 1) Currently, I am keeping the signals as interface, so one can override
> with a different implementation, but a point noted that even the interface
> APIs could be also made dynamic so one can define APIs and its
> implementation, if they wish to override.
> 2) I've not looked into that yet, but I will look into it and see if it
> can be easily integrated into the Guardrails framework.
> 3) On the server side, when the framework detects that a node is
> overloaded, then it will throw *OverloadedException* back to the client.
> Because if the node while busy continues to serve additional requests, then
> it will slow down other peer nodes due to dependencies on meeting the
> QUORUM, etc. In this, we are at least preventing server nodes from melting
> down, and giving the control to the client via *OverloadedException.*
> Now, it will be up to the client policy, if client wishes to retry
> immediately on a different server node then eventually that server node
> might be impacted, but if client wishes to do exponential back off or throw
> exception back to the application then that server node will not be
> impacted.
>
>
> Jaydeep
>
> On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> Hi Jaydeep,
>
> That seems quite interesting. Couple points though:
>
> 1) It would be nice if there is a way to "subscribe" to decisions your
> detection framework comes up with. Integration with e.g. diagnostics
> subsystem would be beneficial. This should be pluggable - just coding up an
> interface to dump / react on the decisions how I want. This might also act
> as a notifier to other systems, e-mail, slack channels ...
>
> 2) Have you tried to incorporate this with the Guardrails framework? I
> think that if something is detected to be throttled or rejected (e.g
> writing to a table), there might be a guardrail which would be triggered
> dynamically in runtime. Guardrails are useful as such but here we might
> reuse them so we do not need to code it twice.
>
> 3) I am curious how complex this detection framework would be, it can be
> complicated pretty fast I guess. What would be desirable is to act on it in
> such a way that you will not put that node under even more pressure. In
> other words, your detection system should work in such a way that there
> will not be any "doom loop" whereby mere throttling of various parts of
> Cassandra you make it even worse for other nodes in the cluster. For
> example, if a particular node starts to be overwhelmed and you detect this
> and requests start to be rejected, is it not possible that Java driver
> would start to see this node as "erroneous" with delayed response time etc
> and it would start to prefer other nodes in the cluster when deciding what
> node t

RE: [DISCUSS] CASSANDRASC-92: Adding S3 dependencies to Cassandra Sidecar

2024-01-17 Thread Saranya Krishnakumar
Hi,

Since there is no objection from the community, we are proceeding to merge
this patch.

Best,
Saranya Krishnakumar

On 2024/01/12 18:40:24 Saranya Krishnakumar wrote:
> Hi,
>
> We would like to add following dependencies into Sidecar
>
> - software.amazon/awssdk:bom:2.20.43
> - software.amazon.awssdk:s3
> - software.amazon.awssdk:netty-nio-client
> -  org.lz4:lz4-java:1.8.0
>
> Through this JIRA
> https://issues.apache.org/jira/browse/CASSANDRASC-92?filter=-1, we would
> like to introduce restore SSTable from S3 into Cassandra feature to
> Sidecar. Briefly about this feature, users of Sidecar can place a restore
> request. Sidecar, while processing the request, will download SSTables
from
> S3 and import them into Cassandra using *SSTableImporter*. As part of this
> feature, we have added endpoints for creating,  updating restore jobs and
> also background tasks for completing requests. Above mentioned
dependencies
> are needed for Sidecar’s communication with S3.
>
> Best,
> Saranya Krishnakumar
>