Re: [DISCUSSION] New dependencies for SAI CEP-7

Benedict Wed, 14 Dec 2022 02:34:39 -0800

I don’t believe we are ready to be prescriptive about how our randomised tests are written.

1) We want as many people to write randomised tests as possible, so do not want to create impediments.

2) We don’t, I expect, all agree on what a good randomised test looks like.

I think Mike should include carrot search’s utils, and we can see how they look. I certainly do not think Mike should use QuickTheories or CassandraGenerators, as I would not like us to settle on these tools, so it might be a wasted effort.

If we want to start discussions about how we might like to coalesce towards a best practice, with tools that support that, we could form a working group - but I think it is not a simple matter right now.

On 14 Dec 2022, at 09:47, Mike Adamson <madam...@datastax.com> wrote:

Thanks for your detailed response to this. I am definitely not fixed on using carrot for this so am happy to look at a replacement. I wasn't aware of the addition of QuickTheories or CassandraGenerators. A combination of these could easily supply the functionality we need for the SAI testing. The Generators could definitely replace the functionality in SAIRandomizedTest.

I will take a look at these and see if we can work without the carrot generators and will report back in a couple of days on this thread if I can do this easily.

As an aside, Caleb and me have already spoken about adding support to Harry for SAI and using this for more large-scale randomized testing of SAI.

On Tue, 13 Dec 2022 at 18:24, Josh McKenzie <jmcken...@apache.org> wrote:
Whatever we decide on, let's make sure we document it so newcomers on the project (or really anyone new to property based testing) can better discover those things.

https://cassandra.apache.org/_/development/testing.html

On Tue, Dec 13, 2022, at 1:08 PM, David Capwell wrote:
Speaking to Caleb in Slack, so putting the main comments I have there here…

I am not -1 on this new dependency, but more asking what we should use for random testing moving forward…. ATM we have the following:

1) QuickTheories - I feel like I am the only user at this point…
2) 1-off - many reinvent random testing for a specific class; using Random, ThreadLocalRandom, UUID.randomUUID(), and lang3 classes (such as org.apache.commons.lang3.RandomUtils)
3) Harry - even though the main API is for cluster testing, this is built on-top of random generation so could be used for low level random testing (just less fleshed out for this use-case)
4) Simulator - same as Harry, built on top of a random generator and not fleshed out for low level random testing

Another reason I ask this is I have a fuzz testing that I have developed for Accord testing that generates random valid CQL statements to make sure we “do the right thing” and have been struggling with the question “where do I put this” and “what random do I use?”. I built this off QuickTheories as I have a lot of utilities for building all supported Tables and Types so really quick do bootstrap, and every other random testing thing we have are less fleshed out… so if we add yet another random testing library what “should” we be using? Do we build on-top of it to get to the same level QuickTheory is (see org.apache.cassandra.utils.Generators, org.apache.cassandra.utils.CassandraGenerators, and org.apache.cassandra.utils.AbstractTypeGenerators)?

On Dec 13, 2022, at 9:21 AM, Caleb Rackliffe <calebrackli...@gmail.com> wrote:

We need random generators no matter what for these tests, so I think what we need to decide is whether to continue to use Carrot or migrate those to QuickTheories, along the lines of what we have now in org.apache.cassandra.utils.Generators.

When it comes to a library like this, the thing I would optimize for is how much it already provides (and therefore how much we need to write and maintain ourselves). If you look at something like NumericTypeSortingTest in the 18058 branch, it's pretty compact w/ Carrot's RandomizedTest in use, but I suppose it could also use IntegersDSL from QT...

(Not that it matters, but just for reference, we do use com.carrotsearch.hppc already.)

On Tue, Dec 13, 2022 at 10:14 AM Mike Adamson <madam...@datastax.com> wrote:
Can you talk more about why? There are several ways to do random testing in-tree ATM, so wondering why we need another one

I can see one mechanism for random testing in-tree. That is the Simulator but that seems primarily involved in the random orchestration of operations. My apologies if I have simplified its significance. Apart from that, I can only see different usages of Random in unit tests. I admit I have not looked beyond this at dtests.

The random testing in SAI is more focussed on the behaviour of the low-level index structures and flow of data to / from these. Using randomly generated values in tests has proved invaluable in highlighting edge conditions in the code. This above library was only added to provide us with a rich set of random generators. I am happy to look at removing this library if its inclusion is contentious.

On Mon, 12 Dec 2022 at 19:41, David Capwell <dcapw...@apple.com> wrote:
com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test dependency

Can you talk more about why? There are several ways to do random testing in-tree ATM, so wondering why we need another one

On Dec 8, 2022, at 6:51 AM, Mike Adamson <madam...@datastax.com> wrote:

Hi,

I wanted to discuss the addition of the following dependencies for CEP-7. The dependencies are:

org.apache.lucene.lucene-core 7.5.0
org.apache.lucene.lucene-analyzers-common 7.5.0
com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test dependency

Lucene is an apache project so is licensed APL2. Carrotsearch is not an apache project but is licensed APL2

We are also removing the dependency on com.github.rholder.snowball-stemmer. This library is used by SASI stemming filters but a later version of the same library is available in the lucene libraries.

Does anyone have any concerns about these changes?

Mike Adamson

--

Mike Adamson
Engineering
+1 650 389 6000 | datastax.com
Find DataStax Online:

--
Mike Adamson
Engineering

+1 650 389 6000 | datastax.com
Find DataStax Online:

Re: [DISCUSSION] New dependencies for SAI CEP-7

Reply via email to