Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan
Hi,

we need to be on the same page here and this is crucial to get right.

We evaluated that Corretto is a subset of what is in SunJCE provider (bundled 
in JRE). It is not true that Corretto is just "a drop-in replacement". That 
means that if somebody is on 4.0 and they upgrade to 5.0, if they use some 
ciphers / protocols / algorithms which are not in Corretto, it might break 
their upgrade.

I asked Corretto team here (1) and they told that is truly a subset of what is 
in JCE and the diff is relatively large. There is also enumeration of all 
services in Corretto and default provider so we can see the difference.

On the other hand, they say that services which are considered "weak" are not 
there so by moving to Corretto, we are actually making Cassandra safer but as I 
mentioned the cost is that we will drop the support of all other stuff and we 
might break things.

So, with all this information we have two choices:

1) to make Corretto default and make it opt-out
2) to not make Corretto default and make it opt-in

Jordan's opinion is added as the last comment in (2)

What is the preference of the community? We need to be sure we are aligned here.

(1) https://github.com/corretto/amazon-corretto-crypto-provider/issues/315
(2) https://issues.apache.org/jira/browse/CASSANDRA-18624


From: Miklosovic, Stefan 
Sent: Friday, July 21, 2023 18:17
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




We gave it the second look and I came up with this (1)

In a nutshell, we download both arch libs to libs/corretto and then 
cassandra.in.sh will dynamically resolve the architecture and OS. Based on 
that, it will add respective jar to the class path. If it went wrong and it is 
not added to CP, we just skip the installation / healthchecks as if nothing 
happened (by default).

We are also adding the dependency to Maven's pom.xml based on the architecture 
the build is invoked on so there is a possibility to create 
architecture-specific artifact. This is achieved by Maven profiles which are 
activated based on what architecture it is run.

Hence, we covered both aspects, Maven build / dependencies as well as runtime 
library resolution.

There is also flag added, "fail_on_missing_provider", which is by default 
false, if set to true, in case it was not on CP or if we by mistake installed 
different architecture, it will fail the startup.

We could definitely use some review here, especially from people who run on ARM 
so we are sure that it works there as well as intended.

(1) https://github.com/apache/cassandra/pull/2505/files


From: Mick Semb Wever 
Sent: Friday, July 21, 2023 7:18
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



As I am on x86 and I wanted to simulate what would happen to users on ARM, I 
just did it other way around - I introduced the dependency with classifier 
linux-aarch_64.

…
Surprisingly, the installation step succeeded on x86 even the dependency was 
for aarch. However, the startup check went to else branch (2) and I saw that 
the provider was not Corretto provider but the default - SunJCE. So that tells 
me that it basically falls back to the default which is what we want.


I raised concerns about this because we have no other dependencies that use the 
classifier in the pom file to bind us to a particular arch.  The loading of the 
native code isn't my concern.

I'm uneasy (without further investigation) with publishing cassandra pom files 
that classify us to " x86_64".  For example, how the jar files differ between 
classifiers for this project.

I'm also curious if there's a way to bundle the native files for all arch, like 
we do for other libraries, with runtime just loading what's correct.




Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Mick Semb Wever
That means that if somebody is on 4.0 and they upgrade to 5.0, if they use
> some ciphers / protocols / algorithms which are not in Corretto, it might
> break their upgrade.
>



If there's any risk of breaking upgrades we have to go with (2).  We
support a variation of JCE configurations, and I don't see we have the test
coverage in place to de-risk it other than going with (2).

Once the yaml configuration is in place we can then change the default in
the next major version 6.0.


Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-26 Thread Jeremy Hanna
Thanks Caleb and Mike and Zhao and Andres and Piotr and everyone else involved with the SAI implementation!On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe  wrote:Just a quick update...With CASSANDRA-18670 complete, and all remaining items in the category of performance optimizations and further testing, the process of merging to trunk will likely start today, beginning with a final rebase on the current trunk and J11 and J17 test runs.On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe  wrote:Hello there!After much toil, the first phase of CEP-7 is nearing completion (see CASSANDRA-16052). There are presently two issues to resolve before we'd like to merge the cep-7-sai feature branch and all its goodness to trunk:CASSANDRA-18670 - Importer should build SSTable indexes successfully before making new SSTables readable (in review)CASSANDRA-18673 - Reduce size of per-SSTable index components (in progress)(We've been getting clean CircleCI runs for a while now, and have been using the multiplexer to sniff out as much flakiness as possible up front.)Once merged to trunk, the next steps are:1.) Finish a Harry model that we can use to further fuzz test SAI before 5.0 releases (see CASSANDRA-18275). We've done a fair amount of fuzz/randomized testing at the component level, but I'd still consider Harry (at least around single-partition query use-cases) a critical item for us to have confidence before release.2.) Start pursuing Phase 2 items as time and our needs allow. (see CASSANDRA-18473)A reminder, SAI is a secondary index, and therefore is by definition an opt-in feature, and has no explicit "feature flag". However, its availability to users is still subject to the secondary_indexes_enabled guardrail, which currently defaults to allowing creation.Any thoughts, questions, or comments on the pre-merge plan here?



Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread J. D. Jordan
I thought the crypto providers were supposed to “ask the next one down the 
line” if something is not supported?  Have you tried some unsupported thing and 
seen it break?  My understanding of the providers being an ordered list was 
that isn’t supposed to happen.

-Jeremiah

> On Jul 26, 2023, at 3:23 AM, Mick Semb Wever  wrote:
> 
> 
> 
> 
>  
> 
>> That means that if somebody is on 4.0 and they upgrade to 5.0, if they use 
>> some ciphers / protocols / algorithms which are not in Corretto, it might 
>> break their upgrade.
> 
> 
> 
> If there's any risk of breaking upgrades we have to go with (2).  We support 
> a variation of JCE configurations, and I don't see we have the test coverage 
> in place to de-risk it other than going with (2).  
> 
> Once the yaml configuration is in place we can then change the default in the 
> next major version 6.0.
> 
> 


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread C. Scott Andreas
Jeremiah, that’s my understanding as well. ACCP accelerates a subset of 
functions and delegates the rest.

In years of using ACCP with Cassandra, I have yet to see an issue - or any case 
in which adopting ACCP was anything other than a strict benefit.

- Scott

> On Jul 26, 2023, at 5:33 AM, J. D. Jordan  wrote:
> 
> 
> I thought the crypto providers were supposed to “ask the next one down the 
> line” if something is not supported?  Have you tried some unsupported thing 
> and seen it break?  My understanding of the providers being an ordered list 
> was that isn’t supposed to happen.
> 
> -Jeremiah
> 
>>> On Jul 26, 2023, at 3:23 AM, Mick Semb Wever  wrote:
>>> 
>> 
>> 
>> 
>>  
>> 
>>> That means that if somebody is on 4.0 and they upgrade to 5.0, if they use 
>>> some ciphers / protocols / algorithms which are not in Corretto, it might 
>>> break their upgrade.
>> 
>> 
>> 
>> If there's any risk of breaking upgrades we have to go with (2).  We support 
>> a variation of JCE configurations, and I don't see we have the test coverage 
>> in place to de-risk it other than going with (2).  
>> 
>> Once the yaml configuration is in place we can then change the default in 
>> the next major version 6.0.
>> 
>> 


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Peter George
PLEASE REMOVE ME FROM THIS EMAIL



From: "C. Scott Andreas" 
Reply-To: "dev@cassandra.apache.org" 
Date: Wednesday, July 26, 2023 at 6:19 AM
To: "dev@cassandra.apache.org" 
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

Jeremiah, that’s my understanding as well. ACCP accelerates a subset of 
functions and delegates the rest.

In years of using ACCP with Cassandra, I have yet to see an issue - or any case 
in which adopting ACCP was anything other than a strict benefit.

- Scott


On Jul 26, 2023, at 5:33 AM, J. D. Jordan  wrote:
I thought the crypto providers were supposed to “ask the next one down the 
line” if something is not supported?  Have you tried some unsupported thing and 
seen it break?  My understanding of the providers being an ordered list was 
that isn’t supposed to happen.

-Jeremiah


On Jul 26, 2023, at 3:23 AM, Mick Semb Wever  wrote:




That means that if somebody is on 4.0 and they upgrade to 5.0, if they use some 
ciphers / protocols / algorithms which are not in Corretto, it might break 
their upgrade.



If there's any risk of breaking upgrades we have to go with (2).  We support a 
variation of JCE configurations, and I don't see we have the test coverage in 
place to de-risk it other than going with (2).

Once the yaml configuration is in place we can then change the default in the 
next major version 6.0.





Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan
Yes, you are right. I know the providers have their preference and we are 
installing Corretto as the first one.

So if a service is not there it will just search where it is next. I completely 
forgot this aspect of it ... Folks from Corretto forgot to mention this 
behavior as well, interesting. It is not as we are going to use this _as the 
only provider_.

In that case I think we can set it as default.

We just need to be cautious to not use e.g Cipher.getInstance("algorithm", 
"provider") - provider being "AmazonCorrettoCryptoProvider" or anything like 
that. In other words, as long as we are not specifying a concrete provider to 
get an instance from, we should be safe. I looked over the codebase and we are 
not using it anywhere.


From: J. D. Jordan 
Sent: Wednesday, July 26, 2023 14:32
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I thought the crypto providers were supposed to “ask the next one down the 
line” if something is not supported?  Have you tried some unsupported thing and 
seen it break?  My understanding of the providers being an ordered list was 
that isn’t supposed to happen.

-Jeremiah

On Jul 26, 2023, at 3:23 AM, Mick Semb Wever  wrote:






That means that if somebody is on 4.0 and they upgrade to 5.0, if they use some 
ciphers / protocols / algorithms which are not in Corretto, it might break 
their upgrade.



If there's any risk of breaking upgrades we have to go with (2).  We support a 
variation of JCE configurations, and I don't see we have the test coverage in 
place to de-risk it other than going with (2).

Once the yaml configuration is in place we can then change the default in the 
next major version 6.0.




Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Mick Semb Wever
>
> So if a service is not there it will just search where it is next. I
> completely forgot this aspect of it ... Folks from Corretto forgot to
> mention this behavior as well, interesting. It is not as we are going to
> use this _as the only provider_.
>


I'm still uncomfortable assuming upgrades work without having the
appropriate tests in place.  That's the crux for me.  Existing JCE tests
(with and without accp) should cover this?


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread C. Scott Andreas
Can you say more about the shape of your concern?JCA/JCE conformance and correctness of the functions implemented are a responsibility of the ACCP/Corretto test suite (link). These are thoroughly exercised by Amazon and bundled into the Corretto JDK distribution Amazon ships as well.With regard to Cassandra, the hash and cryptographic functions utilized in ACCP are also thoroughly exercised by Cassandra’s unit and in-JVM dtest suite.I wouldn’t propose fragmenting our build into a matrix of JDK x arch x ACCP/no, in the same way that we wouldn’t for tcnative vs. not.- ScottOn Jul 26, 2023, at 6:48 AM, Mick Semb Wever  wrote:So if a service is not there it will just search where it is next. I completely forgot this aspect of it ... Folks from Corretto forgot to mention this behavior as well, interesting. It is not as we are going to use this _as the only provider_.I'm still uncomfortable assuming upgrades work without having the appropriate tests in place.  That's the crux for me.  Existing JCE tests (with and without accp) should cover this?


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread C. Scott Andreas

Peter, thanks for your message.You are receiving these emails because your address is subscribed to 
the Apache Cassandra "dev@" developer mailing list. You can unsubscribe from this list by 
sending an email to dev-unsubscr...@cassandra.apache.org. Subscribers to the mailing list are not 
able to take this action on others' behalf.More information on the project's mailing lists and how to 
join/leave them is here: https://cassandra.apache.org/_/community.htmlCheers,– ScottOn Jul 26, 2023, 
at 7:11 AM, C. Scott Andreas  wrote:Can you say more about the shape of 
your concern?JCA/JCE conformance and correctness of the functions implemented are a responsibility of 
the ACCP/Corretto test suite (link). These are thoroughly exercised by Amazon and bundled into the 
Corretto JDK distribution Amazon ships as well.With regard to Cassandra, the hash and cryptographic 
functions utilized in ACCP are also thoroughly exercised by Cassandra’s unit and in-JVM dtest suite.I 
wouldn’t propose fragmenting our build into a matrix of JDK x arch x ACCP/no, in the same way that we 
wouldn’t for tcnative vs. not.- ScottOn Jul 26, 2023, at 6:48 AM, Mick Semb Wever 
 wrote:So if a service is not there it will just search where it is next. I 
completely forgot this aspect of it ... Folks from Corretto forgot to mention this behavior as well, 
interesting. It is not as we are going to use this _as the only provider_.I'm still uncomfortable 
assuming upgrades work without having the appropriate tests in place.  That's the crux for me.  
Existing JCE tests (with and without accp) should cover this?

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Jordan West
I left my comments on the JIRA itself but generally they mirror Scott and
Joeys thoughts.

Jordan

On Wed, Jul 26, 2023 at 07:26 C. Scott Andreas  wrote:

> Peter, thanks for your message.
>
> You are receiving these emails because your address is subscribed to the
> Apache Cassandra "dev@" developer mailing list. You can unsubscribe from
> this list by sending an email to dev-unsubscr...@cassandra.apache.org.
> Subscribers to the mailing list are not able to take this action on others'
> behalf.
>
> More information on the project's mailing lists and how to join/leave them
> is here: https://cassandra.apache.org/_/community.html
>
> Cheers,
>
> – Scott
>
> On Jul 26, 2023, at 7:11 AM, C. Scott Andreas 
> wrote:
>
>
> Can you say more about the shape of your concern?
>
> JCA/JCE conformance and correctness of the functions implemented are a
> responsibility of the ACCP/Corretto test suite (link
> ).
> These are thoroughly exercised by Amazon and bundled into the Corretto JDK
> distribution Amazon ships as well.
>
> With regard to Cassandra, the hash and cryptographic functions utilized in
> ACCP are also thoroughly exercised by Cassandra’s unit and in-JVM dtest
> suite.
>
> I wouldn’t propose fragmenting our build into a matrix of JDK x arch x
> ACCP/no, in the same way that we wouldn’t for tcnative vs. not.
>
> - Scott
>
> On Jul 26, 2023, at 6:48 AM, Mick Semb Wever  wrote:
>
> 
>
>> So if a service is not there it will just search where it is next. I
>> completely forgot this aspect of it ... Folks from Corretto forgot to
>> mention this behavior as well, interesting. It is not as we are going to
>> use this _as the only provider_.
>>
>
>
> I'm still uncomfortable assuming upgrades work without having the
> appropriate tests in place.  That's the crux for me.  Existing JCE tests
> (with and without accp) should cover this?
>
>
>


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Mick Semb Wever
Can you say more about the shape of your concern?
>


Integration testing where some nodes are running JCE and others accp, and
various configurations that are and are not accp compatible/native.

I'm not referring to (re-) unit testing accp or jce themselves, or matrix
testing over them, but our commitment to always-on upgrades against all
possible configurations that integrate.  We've history with config changes
breaking upgrades, for as simple as they are.


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Jordan West
We do and I’m sensitive to that 100% but there is no reason ACCP should
break upgrades afaik. The algorithms it implements are identical and for
the ones it doesn’t the JRE implementation is used — ACCP is the higher
priority implementation. Do we have any examples of it breaking anything?
Or that it’s problematic?

We recently did a 4.1 upgrade that was mixed JRE / ACCP and it worked fine.
It’s how we figured out ACCP was missing because 4.1 was noticeably slower
(graph in JIRA) and the JRE crypto library dominated the flamegraph (can
try to dig up a screenshot maybe).

Jordan

On Wed, Jul 26, 2023 at 08:35 Mick Semb Wever  wrote:

>
>
> Can you say more about the shape of your concern?
>>
>
>
> Integration testing where some nodes are running JCE and others accp, and
> various configurations that are and are not accp compatible/native.
>
> I'm not referring to (re-) unit testing accp or jce themselves, or matrix
> testing over them, but our commitment to always-on upgrades against all
> possible configurations that integrate.  We've history with config changes
> breaking upgrades, for as simple as they are.
>


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan
Am I understanding it correctly that tests you are talking about are only 
required in case we make ACCP to be default provider?

I can live with not making it default and still deliver it if tests are not 
required. I do not think that these kind of tests were required couple mails 
ago when opt-in was on the table.

While I tend to agree with people here who seem to consider testing this 
scenario to be unnecessary exercise, I am afraid that I will not be able to 
deliver that as testing something like this is quite complicated matter. There 
is a lot of aspects which could be tested I can not even enumerate right now 
... so I try to meet you somewhere in the middle.


From: Mick Semb Wever 
Sent: Wednesday, July 26, 2023 17:34
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.





Can you say more about the shape of your concern?


Integration testing where some nodes are running JCE and others accp, and 
various configurations that are and are not accp compatible/native.

I'm not referring to (re-) unit testing accp or jce themselves, or matrix 
testing over them, but our commitment to always-on upgrades against all 
possible configurations that integrate.  We've history with config changes 
breaking upgrades, for as simple as they are.


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread J. D. Jordan
Enabling ssl for the upgrade dtests would cover this use case. If those don’t 
currently exist I see no reason it won’t work so I would be fine for someone to 
figure it out post merge if there is a concern.  What JCE provider you use 
should have no upgrade concerns.

-Jeremiah

> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan 
>  wrote:
> 
> Am I understanding it correctly that tests you are talking about are only 
> required in case we make ACCP to be default provider?
> 
> I can live with not making it default and still deliver it if tests are not 
> required. I do not think that these kind of tests were required couple mails 
> ago when opt-in was on the table.
> 
> While I tend to agree with people here who seem to consider testing this 
> scenario to be unnecessary exercise, I am afraid that I will not be able to 
> deliver that as testing something like this is quite complicated matter. 
> There is a lot of aspects which could be tested I can not even enumerate 
> right now ... so I try to meet you somewhere in the middle.
> 
> 
> From: Mick Semb Wever 
> Sent: Wednesday, July 26, 2023 17:34
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> 
> 
> Can you say more about the shape of your concern?
> 
> 
> Integration testing where some nodes are running JCE and others accp, and 
> various configurations that are and are not accp compatible/native.
> 
> I'm not referring to (re-) unit testing accp or jce themselves, or matrix 
> testing over them, but our commitment to always-on upgrades against all 
> possible configurations that integrate.  We've history with config changes 
> breaking upgrades, for as simple as they are.


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Mick Semb Wever
What comes to mind is how we brought down people clusters and made sstables
unreadable with the introduction of the chunk_length configuration in 1.0.
It wasn't about how tested the compression libraries were, but about the
new configuration itself.  Introducing silent defaults has more surface
area for bugs than introducing explicit defaults that only apply to new
clusters and are so opt-in for existing clusters.



On Wed, 26 Jul 2023 at 20:13, J. D. Jordan 
wrote:

> Enabling ssl for the upgrade dtests would cover this use case. If those
> don’t currently exist I see no reason it won’t work so I would be fine for
> someone to figure it out post merge if there is a concern.  What JCE
> provider you use should have no upgrade concerns.
>
> -Jeremiah
>
> > On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
> >
> > Am I understanding it correctly that tests you are talking about are
> only required in case we make ACCP to be default provider?
> >
> > I can live with not making it default and still deliver it if tests are
> not required. I do not think that these kind of tests were required couple
> mails ago when opt-in was on the table.
> >
> > While I tend to agree with people here who seem to consider testing this
> scenario to be unnecessary exercise, I am afraid that I will not be able to
> deliver that as testing something like this is quite complicated matter.
> There is a lot of aspects which could be tested I can not even enumerate
> right now ... so I try to meet you somewhere in the middle.
> >
> > 
> > From: Mick Semb Wever 
> > Sent: Wednesday, July 26, 2023 17:34
> > To: dev@cassandra.apache.org
> > Subject: Re: [DISCUSS] Using ACCP or tc-native by default
> >
> > NetApp Security WARNING: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
> >
> >
> >
> >
> >
> > Can you say more about the shape of your concern?
> >
> >
> > Integration testing where some nodes are running JCE and others accp,
> and various configurations that are and are not accp compatible/native.
> >
> > I'm not referring to (re-) unit testing accp or jce themselves, or
> matrix testing over them, but our commitment to always-on upgrades against
> all possible configurations that integrate.  We've history with config
> changes breaking upgrades, for as simple as they are.
>


Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-26 Thread Caleb Rackliffe
Alright, the cep-7-sai branch is now merged to trunk!

Now we move to addressing the most urgent items from "Phase 2" (
CASSANDRA-18473 )
before (and in the case of some testing after) the 5.0 freeze...

On Wed, Jul 26, 2023 at 6:07 AM Jeremy Hanna 
wrote:

> Thanks Caleb and Mike and Zhao and Andres and Piotr and everyone else
> involved with the SAI implementation!
>
> On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe 
> wrote:
>
> 
> Just a quick update...
>
> With CASSANDRA-18670
>  complete, and all
> remaining items in the category of performance optimizations and further
> testing, the process of merging to trunk will likely start today, beginning
> with a final rebase on the current trunk and J11 and J17 test runs.
>
> On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe 
> wrote:
>
>> Hello there!
>>
>> After much toil, the first phase of CEP-7 is nearing completion (see
>> CASSANDRA-16052 ).
>> There are presently two issues to resolve before we'd like to merge the
>> cep-7-sai feature branch and all its goodness to trunk:
>>
>> CASSANDRA-18670 
>> - Importer should build SSTable indexes successfully before making new
>> SSTables readable (in review)
>>
>> CASSANDRA-18673 
>> - Reduce size of per-SSTable index components (in progress)
>>
>> (We've been getting clean CircleCI runs for a while now, and have been
>> using the multiplexer to sniff out as much flakiness as possible up front.)
>>
>> Once merged to trunk, the next steps are:
>>
>> 1.) Finish a Harry model that we can use to further fuzz test SAI before
>> 5.0 releases (see CASSANDRA-18275
>> ). We've done a
>> fair amount of fuzz/randomized testing at the component level, but I'd
>> still consider Harry (at least around single-partition query use-cases) a
>> critical item for us to have confidence before release.
>>
>> 2.) Start pursuing Phase 2 items as time and our needs allow. (see
>> CASSANDRA-18473 )
>>
>> A reminder, SAI is a secondary index, and therefore is by definition an
>> opt-in feature, and has no explicit "feature flag". However, its
>> availability to users is still subject to the secondary_indexes_enabled
>> guardrail, which currently defaults to allowing creation.
>>
>> Any thoughts, questions, or comments on the pre-merge plan here?
>>
>


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan
We can make it opt-in, wait one major to see what bugs pop up and we might do 
that opt-out eventually. We do not need to hurry up with this. I understand 
everybody's expectations and excitement but it really boils down to one line 
change in yaml. People who are so much after the performance will be definitely 
aware of this knob to turn on to squeeze even more perf ...

I look around dtests Jeremiah mentioned but I would just moved on and make it 
opt-in if we are not 100% persuaded about it _yet_.


From: Mick Semb Wever 
Sent: Wednesday, July 26, 2023 20:48
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




What comes to mind is how we brought down people clusters and made sstables 
unreadable with the introduction of the chunk_length configuration in 1.0.  It 
wasn't about how tested the compression libraries were, but about the new 
configuration itself.  Introducing silent defaults has more surface area for 
bugs than introducing explicit defaults that only apply to new clusters and are 
so opt-in for existing clusters.



On Wed, 26 Jul 2023 at 20:13, J. D. Jordan 
mailto:jeremiah.jor...@gmail.com>> wrote:
Enabling ssl for the upgrade dtests would cover this use case. If those don’t 
currently exist I see no reason it won’t work so I would be fine for someone to 
figure it out post merge if there is a concern.  What JCE provider you use 
should have no upgrade concerns.

-Jeremiah

> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan 
> mailto:stefan.mikloso...@netapp.com>> wrote:
>
> Am I understanding it correctly that tests you are talking about are only 
> required in case we make ACCP to be default provider?
>
> I can live with not making it default and still deliver it if tests are not 
> required. I do not think that these kind of tests were required couple mails 
> ago when opt-in was on the table.
>
> While I tend to agree with people here who seem to consider testing this 
> scenario to be unnecessary exercise, I am afraid that I will not be able to 
> deliver that as testing something like this is quite complicated matter. 
> There is a lot of aspects which could be tested I can not even enumerate 
> right now ... so I try to meet you somewhere in the middle.
>
> 
> From: Mick Semb Wever mailto:m...@apache.org>>
> Sent: Wednesday, July 26, 2023 17:34
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
>
> Can you say more about the shape of your concern?
>
>
> Integration testing where some nodes are running JCE and others accp, and 
> various configurations that are and are not accp compatible/native.
>
> I'm not referring to (re-) unit testing accp or jce themselves, or matrix 
> testing over them, but our commitment to always-on upgrades against all 
> possible configurations that integrate.  We've history with config changes 
> breaking upgrades, for as simple as they are.


Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-26 Thread J. D. Jordan
Thanks for all the work here!On Jul 26, 2023, at 1:57 PM, Caleb Rackliffe  wrote:Alright, the cep-7-sai branch is now merged to trunk!Now we move to addressing the most urgent items from "Phase 2" (CASSANDRA-18473) before (and in the case of some testing after) the 5.0 freeze...On Wed, Jul 26, 2023 at 6:07 AM Jeremy Hanna  wrote:Thanks Caleb and Mike and Zhao and Andres and Piotr and everyone else involved with the SAI implementation!On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe  wrote:Just a quick update...With CASSANDRA-18670 complete, and all remaining items in the category of performance optimizations and further testing, the process of merging to trunk will likely start today, beginning with a final rebase on the current trunk and J11 and J17 test runs.On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe  wrote:Hello there!After much toil, the first phase of CEP-7 is nearing completion (see CASSANDRA-16052). There are presently two issues to resolve before we'd like to merge the cep-7-sai feature branch and all its goodness to trunk:CASSANDRA-18670 - Importer should build SSTable indexes successfully before making new SSTables readable (in review)CASSANDRA-18673 - Reduce size of per-SSTable index components (in progress)(We've been getting clean CircleCI runs for a while now, and have been using the multiplexer to sniff out as much flakiness as possible up front.)Once merged to trunk, the next steps are:1.) Finish a Harry model that we can use to further fuzz test SAI before 5.0 releases (see CASSANDRA-18275). We've done a fair amount of fuzz/randomized testing at the component level, but I'd still consider Harry (at least around single-partition query use-cases) a critical item for us to have confidence before release.2.) Start pursuing Phase 2 items as time and our needs allow. (see CASSANDRA-18473)A reminder, SAI is a secondary index, and therefore is by definition an opt-in feature, and has no explicit "feature flag". However, its availability to users is still subject to the secondary_indexes_enabled guardrail, which currently defaults to allowing creation.Any thoughts, questions, or comments on the pre-merge plan here?




Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-26 Thread Ekaterina Dimitrova
Thanks Caleb!
Great  job everyone! 🚀👏🏻

On Wed, 26 Jul 2023 at 15:07, J. D. Jordan 
wrote:

> Thanks for all the work here!
>
> On Jul 26, 2023, at 1:57 PM, Caleb Rackliffe 
> wrote:
>
> 
>
> Alright, the cep-7-sai branch is now merged to trunk!
>
> Now we move to addressing the most urgent items from "Phase 2" (
> CASSANDRA-18473 )
> before (and in the case of some testing after) the 5.0 freeze...
>
> On Wed, Jul 26, 2023 at 6:07 AM Jeremy Hanna 
> wrote:
>
>> Thanks Caleb and Mike and Zhao and Andres and Piotr and everyone else
>> involved with the SAI implementation!
>>
>> On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Just a quick update...
>>
>> With CASSANDRA-18670
>>  complete, and
>> all remaining items in the category of performance optimizations and
>> further testing, the process of merging to trunk will likely start today,
>> beginning with a final rebase on the current trunk and J11 and J17 test
>> runs.
>>
>> On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe 
>> wrote:
>>
>>> Hello there!
>>>
>>> After much toil, the first phase of CEP-7 is nearing completion (see
>>> CASSANDRA-16052 ).
>>> There are presently two issues to resolve before we'd like to merge the
>>> cep-7-sai feature branch and all its goodness to trunk:
>>>
>>> CASSANDRA-18670 
>>> - Importer should build SSTable indexes successfully before making new
>>> SSTables readable (in review)
>>>
>>> CASSANDRA-18673 
>>> - Reduce size of per-SSTable index components (in progress)
>>>
>>> (We've been getting clean CircleCI runs for a while now, and have been
>>> using the multiplexer to sniff out as much flakiness as possible up front.)
>>>
>>> Once merged to trunk, the next steps are:
>>>
>>> 1.) Finish a Harry model that we can use to further fuzz test SAI before
>>> 5.0 releases (see CASSANDRA-18275
>>> ). We've done a
>>> fair amount of fuzz/randomized testing at the component level, but I'd
>>> still consider Harry (at least around single-partition query use-cases) a
>>> critical item for us to have confidence before release.
>>>
>>> 2.) Start pursuing Phase 2 items as time and our needs allow. (see
>>> CASSANDRA-18473 )
>>>
>>> A reminder, SAI is a secondary index, and therefore is by definition an
>>> opt-in feature, and has no explicit "feature flag". However, its
>>> availability to users is still subject to the secondary_indexes_enabled
>>> guardrail, which currently defaults to allowing creation.
>>>
>>> Any thoughts, questions, or comments on the pre-merge plan here?
>>>
>>


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread C. Scott Andreas

I think these concerns are well-intended, but they feel rooted in uncertainty rather than in factual examples of areas where risk is present. I 
would appreciate elaboration on the specific areas of risk that folks imagine.I would encourage those who express skepticism to try the patch, 
and I endorse Ayushi's proposal to enable it by default.– ScottOn Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" 
 wrote:We can make it opt-in, wait one major to see what bugs pop up and we might do that opt-out 
eventually. We do not need to hurry up with this. I understand everybody's expectations and excitement but it really boils down to one line 
change in yaml. People who are so much after the performance will be definitely aware of this knob to turn on to squeeze even more perf ...I 
look around dtests Jeremiah mentioned but I would just moved on and make it opt-in if we are not 100% persuaded about it 
_yet_.From: Mick Semb Wever Sent: Wednesday, July 26, 2023 20:48To: 
dev@cassandra.apache.orgSubject: Re: [DISCUSS] Using ACCP or tc-native by defaultNetApp Security WARNING: This is an external email. Do not 
click links or open attachments unless you recognize the sender and know the content is safe.What comes to mind is how we brought down people 
clusters and made sstables unreadable with the introduction of the chunk_length configuration in 1.0.  It wasn't about how tested the 
compression libraries were, but about the new configuration itself.  Introducing silent defaults has more surface area for bugs than 
introducing explicit defaults that only apply to new clusters and are so opt-in for existing clusters.On Wed, 26 Jul 2023 at 20:13, J. D. 
Jordan mailto:jeremiah.jor...@gmail.com>> wrote:Enabling ssl for the upgrade dtests would cover this use 
case. If those don’t currently exist I see no reason it won’t work so I would be fine for someone to figure it out post merge if there is a 
concern.  What JCE provider you use should have no upgrade concerns.-JeremiahOn Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:Am I understanding it correctly that tests you are 
talking about are only required in case we make ACCP to be default provider?I can live with not making it default and still deliver it if tests 
are not required. I do not think that these kind of tests were required couple mails ago when opt-in was on the table.While I tend to agree 
with people here who seem to consider testing this scenario to be unnecessary exercise, I am afraid that I will not be able to deliver that as 
testing something like this is quite complicated matter. There is a lot of aspects which could be tested I can not even enumerate right now ... 
so I try to meet you somewhere in the middle.From: Mick Semb Wever 
mailto:m...@apache.org>>Sent: Wednesday, July 26, 2023 17:34To: 
dev@cassandra.apache.orgSubject: Re: [DISCUSS] Using ACCP or tc-native by defaultNetApp Security 
WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.Can you 
say more about the shape of your concern?Integration testing where some nodes are running JCE and others accp, and various configurations that 
are and are not accp compatible/native.I'm not referring to (re-) unit testing accp or jce themselves, or matrix testing over them, but our 
commitment to always-on upgrades against all possible configurations that integrate.  We've history with config changes breaking upgrades, for 
as simple as they are.

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Jordan West
+1 Scott. And agreed all involved are looking out for the best interests of
C* users. And I appreciate those with concerns contributing to addressing
them.

I’m all for making upgrades smooth bc I do them so often. A huge portion of
our 4.1 qualification is “will it break on upgrade”? Because of that I’m
confident in this patch and concerned about many other areas. I think it’s
commedable to want to reach a point where teams have the trust in the
community to have done that for them but that starts w better test coverage
and concrete evidence.

Given all that, I think we should move forward w Ayushi’s proposal to make
it on by default.

Jordan

On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas  wrote:

> I think these concerns are well-intended, but they feel rooted in
> uncertainty rather than in factual examples of areas where risk is present.
> I would appreciate elaboration on the specific areas of risk that folks
> imagine.
>
> I would encourage those who express skepticism to try the patch, and I
> endorse Ayushi's proposal to enable it by default.
>
>
> – Scott
>
> On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" <
> stefan.mikloso...@netapp.com> wrote:
>
>
> We can make it opt-in, wait one major to see what bugs pop up and we might
> do that opt-out eventually. We do not need to hurry up with this. I
> understand everybody's expectations and excitement but it really boils down
> to one line change in yaml. People who are so much after the performance
> will be definitely aware of this knob to turn on to squeeze even more perf
> ...
>
> I look around dtests Jeremiah mentioned but I would just moved on and make
> it opt-in if we are not 100% persuaded about it _yet_.
>
> 
> From: Mick Semb Wever 
> Sent: Wednesday, July 26, 2023 20:48
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> What comes to mind is how we brought down people clusters and made
> sstables unreadable with the introduction of the chunk_length configuration
> in 1.0. It wasn't about how tested the compression libraries were, but
> about the new configuration itself. Introducing silent defaults has more
> surface area for bugs than introducing explicit defaults that only apply to
> new clusters and are so opt-in for existing clusters.
>
>
>
> On Wed, 26 Jul 2023 at 20:13, J. D. Jordan  > wrote:
> Enabling ssl for the upgrade dtests would cover this use case. If those
> don’t currently exist I see no reason it won’t work so I would be fine for
> someone to figure it out post merge if there is a concern. What JCE
> provider you use should have no upgrade concerns.
>
> -Jeremiah
>
> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
> Am I understanding it correctly that tests you are talking about are only
> required in case we make ACCP to be default provider?
>
> I can live with not making it default and still deliver it if tests are
> not required. I do not think that these kind of tests were required couple
> mails ago when opt-in was on the table.
>
> While I tend to agree with people here who seem to consider testing this
> scenario to be unnecessary exercise, I am afraid that I will not be able to
> deliver that as testing something like this is quite complicated matter.
> There is a lot of aspects which could be tested I can not even enumerate
> right now ... so I try to meet you somewhere in the middle.
>
> 
> From: Mick Semb Wever mailto:m...@apache.org>>
> Sent: Wednesday, July 26, 2023 17:34
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
>
> Can you say more about the shape of your concern?
>
>
> Integration testing where some nodes are running JCE and others accp, and
> various configurations that are and are not accp compatible/native.
>
> I'm not referring to (re-) unit testing accp or jce themselves, or matrix
> testing over them, but our commitment to always-on upgrades against all
> possible configurations that integrate. We've history with config changes
> breaking upgrades, for as simple as they are.
>
>
>
>


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Jeremiah Jordan
I had a discussion with Mick on slack.  His concern is not with enabling
ACCP.  His concern is around the testing of the new C* yaml config code
which is included in the patch that is used to decide if ACCP should be
enabled or not, and if startup should fail if it can’t be enabled.

I agree.  We should make sure that the new C* yaml config code is solid
before we commit this patch, especially when it has the possibility of
cause node startup to fail on purpose.  But that should be a discussion for
the ticket I think, not for this thread.

So I think we are back to the original question.  Should ACCP be used by
default in trunk.  From what I have seen I do not see anyone who is against
that?

-Jeremiah


On Jul 26, 2023 at 2:53:02 PM, Jordan West  wrote:

> +1 Scott. And agreed all involved are looking out for the best interests
> of C* users. And I appreciate those with concerns contributing to
> addressing them.
>
> I’m all for making upgrades smooth bc I do them so often. A huge portion
> of our 4.1 qualification is “will it break on upgrade”? Because of that I’m
> confident in this patch and concerned about many other areas. I think it’s
> commedable to want to reach a point where teams have the trust in the
> community to have done that for them but that starts w better test coverage
> and concrete evidence.
>
> Given all that, I think we should move forward w Ayushi’s proposal to make
> it on by default.
>
> Jordan
>
> On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas 
> wrote:
>
>> I think these concerns are well-intended, but they feel rooted in
>> uncertainty rather than in factual examples of areas where risk is present.
>> I would appreciate elaboration on the specific areas of risk that folks
>> imagine.
>>
>> I would encourage those who express skepticism to try the patch, and I
>> endorse Ayushi's proposal to enable it by default.
>>
>>
>> – Scott
>>
>> On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" <
>> stefan.mikloso...@netapp.com> wrote:
>>
>>
>> We can make it opt-in, wait one major to see what bugs pop up and we
>> might do that opt-out eventually. We do not need to hurry up with this. I
>> understand everybody's expectations and excitement but it really boils down
>> to one line change in yaml. People who are so much after the performance
>> will be definitely aware of this knob to turn on to squeeze even more perf
>> ...
>>
>> I look around dtests Jeremiah mentioned but I would just moved on and
>> make it opt-in if we are not 100% persuaded about it _yet_.
>>
>> 
>> From: Mick Semb Wever 
>> Sent: Wednesday, July 26, 2023 20:48
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> What comes to mind is how we brought down people clusters and made
>> sstables unreadable with the introduction of the chunk_length configuration
>> in 1.0. It wasn't about how tested the compression libraries were, but
>> about the new configuration itself. Introducing silent defaults has more
>> surface area for bugs than introducing explicit defaults that only apply to
>> new clusters and are so opt-in for existing clusters.
>>
>>
>>
>> On Wed, 26 Jul 2023 at 20:13, J. D. Jordan > > wrote:
>> Enabling ssl for the upgrade dtests would cover this use case. If those
>> don’t currently exist I see no reason it won’t work so I would be fine for
>> someone to figure it out post merge if there is a concern. What JCE
>> provider you use should have no upgrade concerns.
>>
>> -Jeremiah
>>
>> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan <
>> stefan.mikloso...@netapp.com> wrote:
>>
>> Am I understanding it correctly that tests you are talking about are
>> only required in case we make ACCP to be default provider?
>>
>> I can live with not making it default and still deliver it if tests are
>> not required. I do not think that these kind of tests were required couple
>> mails ago when opt-in was on the table.
>>
>> While I tend to agree with people here who seem to consider testing this
>> scenario to be unnecessary exercise, I am afraid that I will not be able to
>> deliver that as testing something like this is quite complicated matter.
>> There is a lot of aspects which could be tested I can not even enumerate
>> right now ... so I try to meet you somewhere in the middle.
>>
>> 
>> From: Mick Semb Wever mailto:m...@apache.org>>
>> Sent: Wednesday, July 26, 2023 17:34
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>>

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Jordan West
It sounds like some of the concerns have shifted then. I would like to
better understand the YAML one. Like Jeremiah said it may be a better topic
for the ticket. Would appreciate an example exception or error people are
concerned about.

If the issue is the “fail fast” on start I’m sure we can find a solution
everyone accepts and move forward.

If we are agreed “on by default” is the way to go that’s awesome!

Jordan

On Wed, Jul 26, 2023 at 12:59 Jeremiah Jordan 
wrote:

> I had a discussion with Mick on slack.  His concern is not with enabling
> ACCP.  His concern is around the testing of the new C* yaml config code
> which is included in the patch that is used to decide if ACCP should be
> enabled or not, and if startup should fail if it can’t be enabled.
>
> I agree.  We should make sure that the new C* yaml config code is solid
> before we commit this patch, especially when it has the possibility of
> cause node startup to fail on purpose.  But that should be a discussion for
> the ticket I think, not for this thread.
>
> So I think we are back to the original question.  Should ACCP be used by
> default in trunk.  From what I have seen I do not see anyone who is against
> that?
>
> -Jeremiah
>
>
> On Jul 26, 2023 at 2:53:02 PM, Jordan West  wrote:
>
>> +1 Scott. And agreed all involved are looking out for the best interests
>> of C* users. And I appreciate those with concerns contributing to
>> addressing them.
>>
>> I’m all for making upgrades smooth bc I do them so often. A huge portion
>> of our 4.1 qualification is “will it break on upgrade”? Because of that I’m
>> confident in this patch and concerned about many other areas. I think it’s
>> commedable to want to reach a point where teams have the trust in the
>> community to have done that for them but that starts w better test coverage
>> and concrete evidence.
>>
>> Given all that, I think we should move forward w Ayushi’s proposal to
>> make it on by default.
>>
>> Jordan
>>
>> On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas 
>> wrote:
>>
>>> I think these concerns are well-intended, but they feel rooted in
>>> uncertainty rather than in factual examples of areas where risk is present.
>>> I would appreciate elaboration on the specific areas of risk that folks
>>> imagine.
>>>
>>> I would encourage those who express skepticism to try the patch, and I
>>> endorse Ayushi's proposal to enable it by default.
>>>
>>>
>>> – Scott
>>>
>>> On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" <
>>> stefan.mikloso...@netapp.com> wrote:
>>>
>>>
>>> We can make it opt-in, wait one major to see what bugs pop up and we
>>> might do that opt-out eventually. We do not need to hurry up with this. I
>>> understand everybody's expectations and excitement but it really boils down
>>> to one line change in yaml. People who are so much after the performance
>>> will be definitely aware of this knob to turn on to squeeze even more perf
>>> ...
>>>
>>> I look around dtests Jeremiah mentioned but I would just moved on and
>>> make it opt-in if we are not 100% persuaded about it _yet_.
>>>
>>> 
>>> From: Mick Semb Wever 
>>> Sent: Wednesday, July 26, 2023 20:48
>>> To: dev@cassandra.apache.org
>>> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>>>
>>> NetApp Security WARNING: This is an external email. Do not click links
>>> or open attachments unless you recognize the sender and know the content is
>>> safe.
>>>
>>>
>>>
>>>
>>> What comes to mind is how we brought down people clusters and made
>>> sstables unreadable with the introduction of the chunk_length configuration
>>> in 1.0. It wasn't about how tested the compression libraries were, but
>>> about the new configuration itself. Introducing silent defaults has more
>>> surface area for bugs than introducing explicit defaults that only apply to
>>> new clusters and are so opt-in for existing clusters.
>>>
>>>
>>>
>>> On Wed, 26 Jul 2023 at 20:13, J. D. Jordan >> > wrote:
>>> Enabling ssl for the upgrade dtests would cover this use case. If those
>>> don’t currently exist I see no reason it won’t work so I would be fine for
>>> someone to figure it out post merge if there is a concern. What JCE
>>> provider you use should have no upgrade concerns.
>>>
>>> -Jeremiah
>>>
>>> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan <
>>> stefan.mikloso...@netapp.com>
>>> wrote:
>>>
>>> Am I understanding it correctly that tests you are talking about are
>>> only required in case we make ACCP to be default provider?
>>>
>>> I can live with not making it default and still deliver it if tests are
>>> not required. I do not think that these kind of tests were required couple
>>> mails ago when opt-in was on the table.
>>>
>>> While I tend to agree with people here who seem to consider testing this
>>> scenario to be unnecessary exercise, I am afraid that I will not be able to
>>> deliver that as testing somethi

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Josh McKenzie
+1 to the "on by default" camp.

> What comes to mind is how we brought down people clusters and made sstables 
> unreadable with the introduction of the chunk_length configuration in 1.0
I think a key difference here is that changing chunk length is something that 
materially changes behavior and expectations w/a coupled system, whereas 
switching crypto providers has the much smaller failure mode of "the 
implementations aren't binary compatible even though they're supposed to be, 
and are very heavily tested TO be".

Totally agree that a "surprise! it didn't load so now your nodes won't start" 
approach would be a Very Bad Experience for users. Falling back from ACCP and 
squawking about the lack might actually be nice to help folks where it doesn't 
load / work / etc know to look into it. It really makes a material difference.

On Wed, Jul 26, 2023, at 4:02 PM, Jordan West wrote:
> It sounds like some of the concerns have shifted then. I would like to better 
> understand the YAML one. Like Jeremiah said it may be a better topic for the 
> ticket. Would appreciate an example exception or error people are concerned 
> about. 
> 
> If the issue is the “fail fast” on start I’m sure we can find a solution 
> everyone accepts and move forward. 
> 
> If we are agreed “on by default” is the way to go that’s awesome! 
> 
> Jordan 
> 
> On Wed, Jul 26, 2023 at 12:59 Jeremiah Jordan  
> wrote:
>> I had a discussion with Mick on slack.  His concern is not with enabling 
>> ACCP.  His concern is around the testing of the new C* yaml config code 
>> which is included in the patch that is used to decide if ACCP should be 
>> enabled or not, and if startup should fail if it can’t be enabled.
>> 
>> I agree.  We should make sure that the new C* yaml config code is solid 
>> before we commit this patch, especially when it has the possibility of cause 
>> node startup to fail on purpose.  But that should be a discussion for the 
>> ticket I think, not for this thread.
>> 
>> So I think we are back to the original question.  Should ACCP be used by 
>> default in trunk.  From what I have seen I do not see anyone who is against 
>> that?
>> 
>> -Jeremiah
>> 
>> 
>> On Jul 26, 2023 at 2:53:02 PM, Jordan West  wrote:
>>> +1 Scott. And agreed all involved are looking out for the best interests of 
>>> C* users. And I appreciate those with concerns contributing to addressing 
>>> them. 
>>> 
>>> I’m all for making upgrades smooth bc I do them so often. A huge portion of 
>>> our 4.1 qualification is “will it break on upgrade”? Because of that I’m 
>>> confident in this patch and concerned about many other areas. I think it’s 
>>> commedable to want to reach a point where teams have the trust in the 
>>> community to have done that for them but that starts w better test coverage 
>>> and concrete evidence. 
>>> 
>>> Given all that, I think we should move forward w Ayushi’s proposal to make 
>>> it on by default. 
>>> 
>>> Jordan 
>>> 
>>> On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas  wrote:
 I think these concerns are well-intended, but they feel rooted in 
 uncertainty rather than in factual examples of areas where risk is 
 present. I would appreciate elaboration on the specific areas of risk that 
 folks imagine.
 
 I would encourage those who express skepticism to try the patch, and I 
 endorse Ayushi's proposal to enable it by default.
 
 
 – Scott
 
> On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" 
>  wrote:
> 
> 
> We can make it opt-in, wait one major to see what bugs pop up and we 
> might do that opt-out eventually. We do not need to hurry up with this. I 
> understand everybody's expectations and excitement but it really boils 
> down to one line change in yaml. People who are so much after the 
> performance will be definitely aware of this knob to turn on to squeeze 
> even more perf ...
> 
> I look around dtests Jeremiah mentioned but I would just moved on and 
> make it opt-in if we are not 100% persuaded about it _yet_.
> 
> 
> From: Mick Semb Wever 
> Sent: Wednesday, July 26, 2023 20:48
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is 
> safe.
> 
> 
> 
> 
> What comes to mind is how we brought down people clusters and made 
> sstables unreadable with the introduction of the chunk_length 
> configuration in 1.0. It wasn't about how tested the compression 
> libraries were, but about the new configuration itself. Introducing 
> silent defaults has more surface area for bugs than introducing explicit 
> defaults that only apply to new clusters and are so opt-in for existing 
> clusters.
> 

Re: [Discuss] Repair inside C*

2023-07-26 Thread David Capwell
+0 to sidecar, in order to make that work well we need to expose state that the 
node has so the sidecar can make good calls, if it runs in the node then 
nothing has to be exposed.  One thing to flesh out is where do the “smarts” 
live?  If the range has too many partitions, which system knows to subdivide 
the range and sequence the repairs (else you OOM)?  “Should” repair itself be 
better and take all input and make sure it works correctly, so the caller just 
worries about scheduling?  “Should” the scheduler understand limitations with 
repair and work around them?

> On Jul 25, 2023, at 11:26 AM, Jeremiah Jordan  
> wrote:
> 
> +1 for the side car being the right location.
> 
> -Jeremiah
> 
> On Jul 25, 2023 at 1:16:14 PM, Chris Lohfink  > wrote:
>> I think a CEP is the next step. Considering the number of companies 
>> involved, this might necessitate several drafts and rounds of discussions. I 
>> appreciate your initiative in starting this process, and I'm eager to 
>> contribute to the ensuing discussions. Maybe in a google docs or something 
>> initially for more interactive feedback?
>> 
>> In regards to https://issues.apache.org/jira/browse/CASSANDRA-14346 we at 
>> Netflix are actually putting effort currently to move this into the sidecar 
>> as the idea was to start moving non-read/write path things into different 
>> process and jvms to not impact each other.
>> 
>> I think the sidecar/in process discussion might be a bit contentious as I 
>> know even things like compaction some feel should be moved out of process in 
>> future. On a personal note, my primary interest lies in seeing the 
>> implementation realized, so I am willing to support whatever consensus 
>> emerges. Whichever direction these go we will help with the implementation.
>> 
>> Chris
>> 
>> On Tue, Jul 25, 2023 at 1:09 PM Jaydeep Chovatia > > wrote:
>>> Sounds good, German. Feel free to let me know if you need my help in filing 
>>> CEP, adding supporting content to the CEP, etc.
>>> As I mentioned previously, I have already been working (going through an 
>>> internal review) on creating a one-pager doc, code, etc., that has been 
>>> working for us for the last six years at an immense scale, and I will share 
>>> it soon on a private fork.
>>> 
>>> Thanks,
>>> Jaydeep
>>> 
>>> On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev 
>>> mailto:dev@cassandra.apache.org>> wrote:
 In [2] we suggested that the next step should be a CEP.
 
 I am happy to lend a hand to this effort as well.
 
 Thanks Jaydeep and David - really appreciated.
 
 German
 
 From: David Capwell mailto:dcapw...@apple.com>>
 Sent: Tuesday, July 25, 2023 8:32 AM
 To: dev mailto:dev@cassandra.apache.org>>
 Cc: German Eichberger >>> >
 Subject: [EXTERNAL] Re: [Discuss] Repair inside C*
  
 As someone who has done a lot of work trying to make repair stable, I 
 approve of this message ^_^
 
 More than glad to help mentor this work
 
 On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia >>> > wrote:
 
 To clarify the repair solution timing, the one we have listed in the 
 article is not the recently developed one. We were hitting some 
 high-priority production challenges back in early 2018, and to address 
 that, we developed and rolled out the solution in production in just a few 
 months. The timing-wise, the solution was developed and productized by Q3 
 2018, of course, continued to evolve thereafter. Usually, we explore the 
 existing solutions we can leverage, but when we started our journey in 
 early 2018, most of the solutions were based on sidecar solutions. There 
 is nothing against the sidecar solution; it was just a pure business 
 decision, and in that, we wanted to avoid the sidecar to avoid a 
 dependency on the control plane. Every solution developed has its deep 
 context, merits, and pros and cons; they are all great solutions! 
 
 An appeal to the community members is to think one more time about having 
 repairs in the Open Source Cassandra itself. As mentioned in my previous 
 email, any solution getting adopted is fine; the important aspect is to 
 have a repair solution in the OSS Cassandra itself!
 
 Yours Faithfully,
 Jaydeep
 
 On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia 
 mailto:chovatia.jayd...@gmail.com>> wrote:
 Hi German,
 
 The goal is always to backport our learnings back to the community. For 
 example, I have already successfully backported the following two 
 enhancements/bug fixes back to the Open Source Cassandra, which are 
 described in the article. I am already currently working on open-source a 
 few more enhancements mentioned in the article back to the open-source.
 h

Re: [Discuss] Repair inside C*

2023-07-26 Thread Jon Haddad
I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
current (and past) state of things where running the DB correctly *requires* 
running a separate process (either community maintained or official C* sidecar) 
is incredibly painful for folks.  The idea that your data integrity needs to be 
opt-in has never made sense to me from the perspective of either the product or 
the end user.

I've worked with way too many teams that have either configured this 
incorrectly or not at all.  

Ideally Cassandra would ship with repair built in and on by default.  Power 
users can disable if they want to continue to maintain their own repair tooling 
for some reason. 

Jon

On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> All,
> 
> We had a brief discussion in [2] about the Uber article [1] where they talk 
> about having integrated repair into Cassandra and how great that is. I 
> expressed my disappointment that they didn't work with the community on that 
> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
> already had the idea and wrote the code [3] - so I wanted to start a 
> discussion to gauge interest and maybe how to revive that effort.
> 
> Thanks,
> German
> 
> [1] 
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> 


August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-26 Thread Mick Semb Wever
The previous thread¹ on when to freeze 5.0 landed on freezing the first
week of August, with a waiver in place for TCM and Accord to land later
(but before October).

With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work
that hasn't landed is Vector search (CEP-30).

Are there any objections to a waiver on Vector search?  All the groundwork:
SAI and the vector type; has been merged, with all remaining work expected
to land in August.

I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0
and a long list of flakies.  It takes time and patience to triage and
identify the bugs that hit us before GA.  The freeze is about being "mostly
feature complete",  so we have room for things before our first beta
(precedence is to ask).   If we hope for a GA by December, account for the
6 weeks turnaround time for cutting and voting on one alpha, one beta, and
one rc release, and the quiet period that August is, we really only have
September and October left.

I already feel this is asking a bit of a miracle from us given how 4.1 went
(and I'm hoping I will be proven wrong).

In addition, are there any objections to cutting an 5.0-alpha1 release as
soon as we freeze?

This is on the understanding vector, tcm and accord will become available
in later alphas.  Originally the discussion¹ was waiting for Accord for
alpha1, but a number of folk off-list have requested earlier alphas to help
with testing.


¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3


Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-26 Thread J. D. Jordan
I think this plan seems reasonable to me. +1

-Jeremiah

> On Jul 26, 2023, at 5:28 PM, Mick Semb Wever  wrote:
> 
> 
> 
> The previous thread¹ on when to freeze 5.0 landed on freezing the first week 
> of August, with a waiver in place for TCM and Accord to land later (but 
> before October).
> 
> With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work that 
> hasn't landed is Vector search (CEP-30).  
> 
> Are there any objections to a waiver on Vector search?  All the groundwork: 
> SAI and the vector type; has been merged, with all remaining work expected to 
> land in August.
> 
> I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0 
> and a long list of flakies.  It takes time and patience to triage and 
> identify the bugs that hit us before GA.  The freeze is about being "mostly 
> feature complete",  so we have room for things before our first beta 
> (precedence is to ask).   If we hope for a GA by December, account for the 6 
> weeks turnaround time for cutting and voting on one alpha, one beta, and one 
> rc release, and the quiet period that August is, we really only have 
> September and October left.  
> 
> I already feel this is asking a bit of a miracle from us given how 4.1 went 
> (and I'm hoping I will be proven wrong). 
> 
> In addition, are there any objections to cutting an 5.0-alpha1 release as 
> soon as we freeze?  
> 
> This is on the understanding vector, tcm and accord will become available in 
> later alphas.  Originally the discussion¹ was waiting for Accord for alpha1, 
> but a number of folk off-list have requested earlier alphas to help with 
> testing.
> 
> 
> ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3


Re: [Discuss] Repair inside C*

2023-07-26 Thread Dinesh Joshi
I concur, repair is an intrinsic part of the database and belongs inside it. We 
can certainly expose a REST control plane API via the sidecar for triggering it 
on demand, scheduling, etc.

That said, there are various implementation of repair scheduling and 
orchestration that a lot of organizations maintain in their proprietary 
sidecars. It would be beneficial in the interim to consolidate on a common 
solution in the sidecar. Eventually we need a version of repair in the database 
that just works without the need of any operator intervention.


> On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> 
> I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
> current (and past) state of things where running the DB correctly *requires* 
> running a separate process (either community maintained or official C* 
> sidecar) is incredibly painful for folks.  The idea that your data integrity 
> needs to be opt-in has never made sense to me from the perspective of either 
> the product or the end user.
> 
> I've worked with way too many teams that have either configured this 
> incorrectly or not at all.  
> 
> Ideally Cassandra would ship with repair built in and on by default.  Power 
> users can disable if they want to continue to maintain their own repair 
> tooling for some reason. 
> 
> Jon
> 
> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> All,
>> 
>> We had a brief discussion in [2] about the Uber article [1] where they talk 
>> about having integrated repair into Cassandra and how great that is. I 
>> expressed my disappointment that they didn't work with the community on that 
>> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
>> already had the idea and wrote the code [3] - so I wanted to start a 
>> discussion to gauge interest and maybe how to revive that effort.
>> 
>> Thanks,
>> German
>> 
>> [1] 
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>> 



Re: [Discuss] Repair inside C*

2023-07-26 Thread C. Scott Andreas
I agree that it would be ideal for Cassandra to have a repair scheduler in-DB.

That said I would happily support an effort to bring repair scheduling to the 
sidecar immediately. This has nothing blocking it, and would potentially enable 
the sidecar to provide an official repair scheduling solution that is 
compatible with current or even previous versions of the database.

Once TCM has landed, we’ll have much stronger primitives for repair 
orchestration in the database itself. But I don’t think that should block 
progress on a repair scheduling solution in the sidecar, and there is nothing 
that would prevent someone from continuing to use a sidecar-based solution in 
perpetuity if they preferred.

- Scott

> On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> 
> I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
> current (and past) state of things where running the DB correctly *requires* 
> running a separate process (either community maintained or official C* 
> sidecar) is incredibly painful for folks.  The idea that your data integrity 
> needs to be opt-in has never made sense to me from the perspective of either 
> the product or the end user.
> 
> I've worked with way too many teams that have either configured this 
> incorrectly or not at all.  
> 
> Ideally Cassandra would ship with repair built in and on by default.  Power 
> users can disable if they want to continue to maintain their own repair 
> tooling for some reason.
> 
> Jon
> 
>> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> All,
>> We had a brief discussion in [2] about the Uber article [1] where they talk 
>> about having integrated repair into Cassandra and how great that is. I 
>> expressed my disappointment that they didn't work with the community on that 
>> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
>> already had the idea and wrote the code [3] - so I wanted to start a 
>> discussion to gauge interest and maybe how to revive that effort.
>> Thanks,
>> German
>> [1] 
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346


Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-26 Thread Dinesh Joshi
Mick,

This sounds like a good plan. CEP-33 and 34 are ready to go. We're running into 
CI related issues but once they clear up we'll merge them. I anticipate we'll 
be done in a week's time.

Thanks,

Dinesh

> On Jul 26, 2023, at 3:27 PM, Mick Semb Wever  wrote:
> 
> 
> The previous thread¹ on when to freeze 5.0 landed on freezing the first week 
> of August, with a waiver in place for TCM and Accord to land later (but 
> before October).
> 
> With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work that 
> hasn't landed is Vector search (CEP-30).  
> 
> Are there any objections to a waiver on Vector search?  All the groundwork: 
> SAI and the vector type; has been merged, with all remaining work expected to 
> land in August.
> 
> I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0 
> and a long list of flakies.  It takes time and patience to triage and 
> identify the bugs that hit us before GA.  The freeze is about being "mostly 
> feature complete",  so we have room for things before our first beta 
> (precedence is to ask).   If we hope for a GA by December, account for the 6 
> weeks turnaround time for cutting and voting on one alpha, one beta, and one 
> rc release, and the quiet period that August is, we really only have 
> September and October left.  
> 
> I already feel this is asking a bit of a miracle from us given how 4.1 went 
> (and I'm hoping I will be proven wrong). 
> 
> In addition, are there any objections to cutting an 5.0-alpha1 release as 
> soon as we freeze?  
> 
> This is on the understanding vector, tcm and accord will become available in 
> later alphas.  Originally the discussion¹ was waiting for Accord for alpha1, 
> but a number of folk off-list have requested earlier alphas to help with 
> testing.
> 
> 
> ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3



Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-26 Thread Berenguer Blasi

Nice one!

On 26/7/23 21:11, Ekaterina Dimitrova wrote:

Thanks Caleb!
Great  job everyone! 🚀👏🏻

On Wed, 26 Jul 2023 at 15:07, J. D. Jordan  
wrote:


Thanks for all the work here!


On Jul 26, 2023, at 1:57 PM, Caleb Rackliffe
 wrote:


Alright, the cep-7-sai branch is now merged to trunk!

Now we move to addressing the most urgent items from "Phase 2"
(CASSANDRA-18473
) before
(and in the case of some testing after) the 5.0 freeze...

On Wed, Jul 26, 2023 at 6:07 AM Jeremy Hanna
 wrote:

Thanks Caleb and Mike and Zhao and Andres and Piotr and
everyone else involved with the SAI implementation!


On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe
 wrote:


Just a quick update...

With CASSANDRA-18670
 complete,
and all remaining items in the category of performance
optimizations and further testing, the process of merging to
trunk will likely start today, beginning with a final rebase
on the current trunk and J11 and J17 test runs.

On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe
 wrote:

Hello there!

After much toil, the first phase of CEP-7 is nearing
completion (see CASSANDRA-16052
).
There are presently two issues to resolve before we'd
like to merge the cep-7-saifeature branch and all its
goodness to trunk:

CASSANDRA-18670

- Importer should build SSTable indexes successfully
before making new SSTables readable (in review)

CASSANDRA-18673

- Reduce size of per-SSTable index components (in progress)

(We've been getting clean CircleCI runs for a while now,
and have been using the multiplexer to sniff out as much
flakiness as possible up front.)

Once merged to trunk, the next steps are:

1.) Finish a Harry model that we can use to further fuzz
test SAI before 5.0 releases (see CASSANDRA-18275
).
We've done a fair amount of fuzz/randomized testing at
the component level, but I'd still consider Harry (at
least around single-partition query use-cases) a
critical item for us to have confidence before release.

2.) Start pursuing Phase 2 items as time and our needs
allow. (see CASSANDRA-18473
)

A reminder, SAI is a secondary index, and therefore is
by definition an opt-in feature, and has no explicit
"feature flag". However, its availability to users is
still subject to the secondary_indexes_enabled
guardrail, which currently defaults to allowing creation.

Any thoughts, questions, or comments on the pre-merge
plan here?


Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-26 Thread Berenguer Blasi

SGTM +1

On 27/7/23 6:39, Dinesh Joshi wrote:

Mick,

This sounds like a good plan. CEP-33 and 34 are ready to go. We're 
running into CI related issues but once they clear up we'll merge 
them. I anticipate we'll be done in a week's time.


Thanks,

Dinesh


On Jul 26, 2023, at 3:27 PM, Mick Semb Wever  wrote:


The previous thread¹ on when to freeze 5.0 landed on freezing the 
first week of August, with a waiver in place for TCM and Accord to 
land later (but before October).


With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 
work that hasn't landed is Vector search (CEP-30).


Are there any objections to a waiver on Vector search?  All the 
groundwork: SAI and the vector type; has been merged, with all 
remaining work expected to land in August.


I'm keen to freeze and see us shift gears – there's already SO MUCH 
in 5.0 and a long list of flakies.  It takes time and patience to 
triage and identify the bugs that hit us before GA.  The freeze is 
about being "mostly feature complete",  so we have room for things 
before our first beta (precedence is to ask).   If we hope for a GA 
by December, account for the 6 weeks turnaround time for cutting and 
voting on one alpha, one beta, and one rc release, and the quiet 
period that August is, we really only have September and October left.


I already feel this is asking a bit of a miracle from us given how 
4.1 went (and I'm hoping I will be proven wrong).


In addition, are there any objections to cutting an 5.0-alpha1 
release as soon as we freeze?


This is on the understanding vector, tcm and accord will become 
available in later alphas.  Originally the discussion¹ was waiting 
for Accord for alpha1, but a number of folk off-list have requested 
earlier alphas to help with testing.



¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3