CASSANDRA-19776 - question about not returning expired SSTables on CANONICAL / not spinning on metrics

2025-05-26 Thread Štefan Miklošovič
I would like to discuss (1). The problem is that we sometimes see that when
a metric like EstimatedPartitionCount is called while a compaction is in
progress, it might spin endlessly until compaction finishes.

The reason it spins is that (summarized here (2)) when compaction evaluates
some SSTable as expired / to be dropped, that SSTable will not be
physically removed until the very end of compaction and its SSTable
"tidier" is set which will eventually remove the files on disk after
transaction is finished etc.

When nobody references it, if EstimatedPartitionCount calls
selectAndReference on an SSTable, it will spin, because it waits for a
reference which is just not there because it was "unreferenced" already,
just not deleted. It is in some kind of a limbo.

Branimir Lambov suggested that it is probably not a good idea to reference
expired SSTables on CANONICAL (3)

My idea was to do this (4), isMarkedCompacted does

public boolean isMarkedCompacted()
{
return tidy.global.obsoletion != null;
}

which is not null when it is going to be removed from disk / nobody
references it. So, we will filter such SSTables out.

Jaydeepkumar Chovatia suggested that this approach might lead to "serious
repercussions" (5) and we should not touch it and we should do this instead
(6). However, that is not possible, because as Branimir mentioned:

"The selectAndReference call in estimatedPartitionCount was added recently
to fix a race that caused node failures when an sstable disappears while
it's being processed.".

Worth to say that the usage of selectAndReference seems to be not used
consistently across the metrics. That also opens an issue of whether we
should not approach this more holistically and cover all cases like this.

Do you also see (4) as risky? I built it for 4.0 and CI seems to pass minus
one test where we are testing this very CANONICAL functionality.

What are your takes here?

Regards

(1) https://issues.apache.org/jira/browse/CASSANDRA-19776
(2)
https://issues.apache.org/jira/browse/CASSANDRA-19776?focusedCommentId=17950873&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17950873
(3)
https://issues.apache.org/jira/browse/CASSANDRA-19776?focusedCommentId=17950979&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17950979
(4)
https://github.com/apache/cassandra/pull/4156/files#diff-92c8e689de9c33eb580a18eef6d7db02d1fb089183c32c8c8d99344d0964326c
(5)
https://issues.apache.org/jira/browse/CASSANDRA-19776?focusedCommentId=17952394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17952394
(6)
https://issues.apache.org/jira/browse/CASSANDRA-19776?focusedCommentId=17952747&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17952747
(7)
https://app.circleci.com/pipelines/github/instaclustr/cassandra/5803/workflows/0935b05f-e246-463f-95fc-6dcc3822d611


[DISCUSS][CASSANDRA-20681] Mark JDK 17 as production ready for Cassandra 5.0

2025-05-26 Thread Dmitry Konstantinov
Hi all,

I've created a task to mark JDK 17 as production-ready for Cassandra 5.0 in
our documentation - CASSANDRA-20681


Reasons:

   - Cassandra 5.0.x has had four bugfix releases and is stable (5.0.0 was
   released in September 2024, so it's been out for eight months).
   - I'm not aware of any open Cassandra issues specific to JDK 17.
   - Our CI has been running tests with JDK 17 on every commit for over a
   year.
   - JDK 17 is a mature LTS version. We already have a newer LTS (JDK 21),
   and JDK 24 has already been released.
   - There might be a vicious cycle here: we’re waiting for more user
   feedback, while users are waiting for the feature to be marked as
   non-experimental before adopting it more widely.


Any objections to marking JDK 17 as production-ready for 5.0?

Related threads where the topic about JDK17 status has been raised:

   - https://the-asf.slack.com/archives/CK23JSY2K/p1744313849439569
   - https://the-asf.slack.com/archives/CJZLTM05A/p1746787244618429
   - https://lists.apache.org/thread/np70b8ck21k0ojsjnotg3j9p2rrj29dp
   -
   https://stackoverflow.com/questions/79563058/java-17-support-for-cassandra-5



-- 
Dmitry Konstantinov


Re: [DISCUSS][CASSANDRA-20681] Mark JDK 17 as production ready for Cassandra 5.0

2025-05-26 Thread Jon Haddad
I have had great results with it. Shenandoah plus off heap trie memtables
can improve throughput by more than 2x when you have lower latency
requirements.

Lets do it.


On Mon, May 26, 2025 at 12:00 PM Dmitry Konstantinov 
wrote:

> Hi all,
>
> I've created a task to mark JDK 17 as production-ready for Cassandra 5.0
> in our documentation - CASSANDRA-20681
> 
>
> Reasons:
>
>- Cassandra 5.0.x has had four bugfix releases and is stable (5.0.0
>was released in September 2024, so it's been out for eight months).
>- I'm not aware of any open Cassandra issues specific to JDK 17.
>- Our CI has been running tests with JDK 17 on every commit for over a
>year.
>- JDK 17 is a mature LTS version. We already have a newer LTS (JDK
>21), and JDK 24 has already been released.
>- There might be a vicious cycle here: we’re waiting for more user
>feedback, while users are waiting for the feature to be marked as
>non-experimental before adopting it more widely.
>
>
> Any objections to marking JDK 17 as production-ready for 5.0?
>
> Related threads where the topic about JDK17 status has been raised:
>
>- https://the-asf.slack.com/archives/CK23JSY2K/p1744313849439569
>- https://the-asf.slack.com/archives/CJZLTM05A/p1746787244618429
>- https://lists.apache.org/thread/np70b8ck21k0ojsjnotg3j9p2rrj29dp
>-
>
> https://stackoverflow.com/questions/79563058/java-17-support-for-cassandra-5
>
>
>
> --
> Dmitry Konstantinov
>