Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Mick Semb Wever
> While doing some local testing, I noticed that my /tmp drive completely
> filled with test artifact files (e.g. data directories, logs, commit logs,
> etc). Mick pointed out that we do attempt to do some "find" based cleanup
> in CI (
> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
> but I was wondering if it might be better to do the following for direct
> ant builds:
>
> 1. If TMPDIR is set, use it. It does not appear to be honored, currently,
> so I need to do some analysis of what would need to be done here
> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set
> TMPDIR with that directory
> 3. Update the "ant clean" task to delete TMPDIR when we've generated it,
> or attempt the find-based cleanup if TMPDIR was provided
>
> Does anyone know if there are any hard-coded assumptions that test files
> will live directly under /tmp?
>


This will need testing with in-tree scripts, ci-cassandra, and circleci  :(

What comes to mind:
 - TMPDIR works best today with the python and scripting stuff
 - setting TMPDIR can break tests, hence unit test script set instead
$TMP_DIR which is passed to `-Dtmp.dir=…`
 - /tmp is often set up to be a more appropropriate fs (and volume size)
 - it is hard to customise everything
 - it needs to work locally on your machine as well as in docker
containers, as well as CI

If we want something that is wiped by `ant clean` I would suggest using the
build/tmp directory by default.
In-tree scripts do this for unit tests:
https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
 but are not yet doing it for the dtests:
https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58


So I don't think we need (3). If the caller has specified TMPDIR it is then
their responsibility to clean it.

We can probably avoid trying to set TMPDIR, instead defaulting the
`tmp.dir` property to  the build/tmp directory.

The goal of any changes in build.xml should be, in addition to providing
the best dev exp, to simplify the testing and CI layers above it.


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Josh McKenzie
Why not use "/${CASS_BUILD_TMP}/cassandra." on a given run and then on 
subsequent runs "rm -rf f/${CASS_BUILD_TMP}/cassandra.*"? If CASS_BUILD_TMP is 
not defined, default to /tmp.

"ant clean" can also wipe it.

If it's a safe assumption that we only ever need 1 instance of data in that 
space (i.e. we won't have 2 builds / tests running in a single container 
concurrently) it seems the above would solve the problem. Different 
environments (circle, ASF, etc) could define CASS_BUILD_TMP differently if 
needed for their env and problem is solved.

On Sun, Aug 13, 2023, at 10:23 AM, Mick Semb Wever wrote:
> 
>> While doing some local testing, I noticed that my /tmp drive completely 
>> filled with test artifact files (e.g. data directories, logs, commit logs, 
>> etc). Mick pointed out that we do attempt to do some "find" based cleanup in 
>> CI 
>> (https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
>>  but I was wondering if it might be better to do the following for direct 
>> ant builds:
>> 
>> 1. If TMPDIR is set, use it. It does not appear to be honored, currently, so 
>> I need to do some analysis of what would need to be done here
>> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set 
>> TMPDIR with that directory
>> 3. Update the "ant clean" task to delete TMPDIR when we've generated it, or 
>> attempt the find-based cleanup if TMPDIR was provided
>> 
>> Does anyone know if there are any hard-coded assumptions that test files 
>> will live directly under /tmp?
> 
> 
> This will need testing with in-tree scripts, ci-cassandra, and circleci  :(
>  
> What comes to mind:
>  - TMPDIR works best today with the python and scripting stuff
>  - setting TMPDIR can break tests, hence unit test script set instead 
> $TMP_DIR which is passed to `-Dtmp.dir=…`
>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>  - it is hard to customise everything
>  - it needs to work locally on your machine as well as in docker containers, 
> as well as CI
> 
> If we want something that is wiped by `ant clean` I would suggest using the 
> build/tmp directory by default.
> In-tree scripts do this for unit tests: 
> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>  but are not yet doing it for the dtests: 
> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>  
> 
> So I don't think we need (3). If the caller has specified TMPDIR it is then 
> their responsibility to clean it.
> 
> We can probably avoid trying to set TMPDIR, instead defaulting the `tmp.dir` 
> property to  the build/tmp directory.
> 
> The goal of any changes in build.xml should be, in addition to providing the 
> best dev exp, to simplify the testing and CI layers above it.


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Mick Semb Wever
On Sun, 13 Aug 2023 at 16:48, Josh McKenzie  wrote:

> Why not use "/${CASS_BUILD_TMP}/cassandra." on a given run and then
> on subsequent runs "rm -rf f/${CASS_BUILD_TMP}/cassandra.*"? If
> CASS_BUILD_TMP is not defined, default to /tmp.
>


I think we want/need relative paths, e.g. "build/tmp", and if the path is
in a mounted volume there can be another container still running.

No objections in theory, and this isn't difficult stuff, just a few
variations to deal with (that we don't have automated CI over).


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Derek Chen-Becker
Cool,

I'm a little confused. Is "tmp.dir" a custom Java property that we expose?
I thought that the standard "property was "java.io.tmpdir". Let me take a
stab at setting tmp.dir to build/tmp and see if I run into any issues (or
still see any files in /tmp).

Cheers,

Derek

On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever  wrote:

>
> While doing some local testing, I noticed that my /tmp drive completely
>> filled with test artifact files (e.g. data directories, logs, commit logs,
>> etc). Mick pointed out that we do attempt to do some "find" based cleanup
>> in CI (
>> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
>> but I was wondering if it might be better to do the following for direct
>> ant builds:
>>
>> 1. If TMPDIR is set, use it. It does not appear to be honored, currently,
>> so I need to do some analysis of what would need to be done here
>> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set
>> TMPDIR with that directory
>> 3. Update the "ant clean" task to delete TMPDIR when we've generated it,
>> or attempt the find-based cleanup if TMPDIR was provided
>>
>> Does anyone know if there are any hard-coded assumptions that test files
>> will live directly under /tmp?
>>
>
>
> This will need testing with in-tree scripts, ci-cassandra, and circleci  :(
>
> What comes to mind:
>  - TMPDIR works best today with the python and scripting stuff
>  - setting TMPDIR can break tests, hence unit test script set instead
> $TMP_DIR which is passed to `-Dtmp.dir=…`
>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>  - it is hard to customise everything
>  - it needs to work locally on your machine as well as in docker
> containers, as well as CI
>
> If we want something that is wiped by `ant clean` I would suggest using
> the build/tmp directory by default.
> In-tree scripts do this for unit tests:
> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>  but are not yet doing it for the dtests:
> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>
>
> So I don't think we need (3). If the caller has specified TMPDIR it is
> then their responsibility to clean it.
>
> We can probably avoid trying to set TMPDIR, instead defaulting the
> `tmp.dir` property to  the build/tmp directory.
>
> The goal of any changes in build.xml should be, in addition to providing
> the best dev exp, to simplify the testing and CI layers above it.
>
>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Derek Chen-Becker
Nevermind, I found "tmp.dir"

On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker 
wrote:

> Cool,
>
> I'm a little confused. Is "tmp.dir" a custom Java property that we expose?
> I thought that the standard "property was "java.io.tmpdir". Let me take a
> stab at setting tmp.dir to build/tmp and see if I run into any issues (or
> still see any files in /tmp).
>
> Cheers,
>
> Derek
>
> On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever  wrote:
>
>>
>> While doing some local testing, I noticed that my /tmp drive completely
>>> filled with test artifact files (e.g. data directories, logs, commit logs,
>>> etc). Mick pointed out that we do attempt to do some "find" based cleanup
>>> in CI (
>>> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
>>> but I was wondering if it might be better to do the following for direct
>>> ant builds:
>>>
>>> 1. If TMPDIR is set, use it. It does not appear to be honored,
>>> currently, so I need to do some analysis of what would need to be done here
>>> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set
>>> TMPDIR with that directory
>>> 3. Update the "ant clean" task to delete TMPDIR when we've generated it,
>>> or attempt the find-based cleanup if TMPDIR was provided
>>>
>>> Does anyone know if there are any hard-coded assumptions that test files
>>> will live directly under /tmp?
>>>
>>
>>
>> This will need testing with in-tree scripts, ci-cassandra, and circleci
>>  :(
>>
>> What comes to mind:
>>  - TMPDIR works best today with the python and scripting stuff
>>  - setting TMPDIR can break tests, hence unit test script set instead
>> $TMP_DIR which is passed to `-Dtmp.dir=…`
>>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>>  - it is hard to customise everything
>>  - it needs to work locally on your machine as well as in docker
>> containers, as well as CI
>>
>> If we want something that is wiped by `ant clean` I would suggest using
>> the build/tmp directory by default.
>> In-tree scripts do this for unit tests:
>> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>>  but are not yet doing it for the dtests:
>> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>>
>>
>> So I don't think we need (3). If the caller has specified TMPDIR it is
>> then their responsibility to clean it.
>>
>> We can probably avoid trying to set TMPDIR, instead defaulting the
>> `tmp.dir` property to  the build/tmp directory.
>>
>> The goal of any changes in build.xml should be, in addition to providing
>> the best dev exp, to simplify the testing and CI layers above it.
>>
>>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Josh McKenzie
> I think we want/need relative paths, e.g. "build/tmp", and if the path is in 
> a mounted volume there can be another container still running.
Sure. The specifics of *what* path isn't interesting to me.

The pattern of:
1. Let env declare where TEMP lives
2. Write things to TEMP
3. Delete things from TEMP every time we run a new suite or do "ant clean"

Is.

Could also take it a step further and let env declare RESULTS_PATH for things 
they want to be durable and add an "ant clean-results" target.

On Sun, Aug 13, 2023, at 11:33 AM, Derek Chen-Becker wrote:
> Nevermind, I found "tmp.dir"
> 
> On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker  
> wrote:
>> Cool,
>> 
>> I'm a little confused. Is "tmp.dir" a custom Java property that we expose? I 
>> thought that the standard "property was "java.io.tmpdir". Let me take a stab 
>> at setting tmp.dir to build/tmp and see if I run into any issues (or still 
>> see any files in /tmp).
>> 
>> Cheers,
>> 
>> Derek
>> 
>> On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever  wrote:
>>> 
 While doing some local testing, I noticed that my /tmp drive completely 
 filled with test artifact files (e.g. data directories, logs, commit logs, 
 etc). Mick pointed out that we do attempt to do some "find" based cleanup 
 in CI 
 (https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
  but I was wondering if it might be better to do the following for direct 
 ant builds:
 
 1. If TMPDIR is set, use it. It does not appear to be honored, currently, 
 so I need to do some analysis of what would need to be done here
 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set 
 TMPDIR with that directory
 3. Update the "ant clean" task to delete TMPDIR when we've generated it, 
 or attempt the find-based cleanup if TMPDIR was provided
 
 Does anyone know if there are any hard-coded assumptions that test files 
 will live directly under /tmp?
>>> 
>>> 
>>> This will need testing with in-tree scripts, ci-cassandra, and circleci  :(
>>>  
>>> What comes to mind:
>>>  - TMPDIR works best today with the python and scripting stuff
>>>  - setting TMPDIR can break tests, hence unit test script set instead 
>>> $TMP_DIR which is passed to `-Dtmp.dir=…`
>>>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>>>  - it is hard to customise everything
>>>  - it needs to work locally on your machine as well as in docker 
>>> containers, as well as CI
>>> 
>>> If we want something that is wiped by `ant clean` I would suggest using the 
>>> build/tmp directory by default.
>>> In-tree scripts do this for unit tests: 
>>> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>>>  but are not yet doing it for the dtests: 
>>> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>>>  
>>> 
>>> So I don't think we need (3). If the caller has specified TMPDIR it is then 
>>> their responsibility to clean it.
>>> 
>>> We can probably avoid trying to set TMPDIR, instead defaulting the 
>>> `tmp.dir` property to  the build/tmp directory.
>>> 
>>> The goal of any changes in build.xml should be, in addition to providing 
>>> the best dev exp, to simplify the testing and CI layers above it.
>>> 
>> 
>> 
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>> 
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 


Re: Tokenization and SAI query syntax

2023-08-13 Thread Jon Haddad
Functions make sense to me too.  In addition to the reasons listed, I if we 
acknowledge that functions in predicates are inevitable, then it makes total 
sense to use them here.  I think this is the most forward thinking approach.

Assuming this happens, one thing that would be great down the line would be if 
the CQL parser was broken out into a subproject with an artifact published so 
the soon to be additional complexity of parsing CQL didn't have to be pushed to 
every single end user like it does today.  I'm not trying to expand the scope 
right now, just laying an idea down for the future.  

Jon

On 2023/08/07 21:26:40 Josh McKenzie wrote:
> Been chatting a bit w/Caleb about this offline and poking around to better 
> educate myself.
> 
> > using functions (ignoring the implementation complexity) at least removes 
> > ambiguity. 
> This, plus using functions lets us kick the can down the road a bit in terms 
> of landing on an integrated grammar we agree on. It seems to me there's a 
> tension between:
>  1. "SQL-like" (i.e. postgres-like)
>  2. "Indexing and Search domain-specific-like" (i.e. lucene syntax which, as 
> Benedict points out, doesn't really jell w/what we have in CQL at this 
> point), and
>  3. ??? Some other YOLO CQL / C* specific thing where we go our own road
> I don't think we're really going to know what our feature-set in terms of 
> indexing is going to look like or the shape it's going to take for awhile, so 
> backing ourselves into any of the 3 corners above right now feels very 
> premature to me.
> 
> So I'm coming around to the expr / method call approach to preserve that 
> flexibility. It's maximally explicit and preserves optionality at the expense 
> of being clunky. For now.
> 
> On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote:
> > > I do not think we should start using lucene syntax for it, it will make 
> > > people think they can do everything else lucene allows.
> > 
> > I'm sure we won't be supporting everything Lucene allows, but this is going 
> > to evolve. Right off the bat, if you introduce support for tokenization and 
> > filtering, someone is, for example, going to ask for phrase queries. ("John 
> > Smith landed in Virginia" is tokenized, but someone wants to match exactly 
> > on "John Smith".) The whole point of the Vector project is to do relevance, 
> > right? Are we going to do term boosting? Do we need queries like "field: 
> > quick brown +fox -news" where fox must be present, news cannot be present, 
> > and quick and brown increase relevance?
> > 
> > SASI uses "=" and "LIKE" in a way that assumes the user understands the 
> > tokenization scheme in use on the target field. I understand that's a bit 
> > ambiguous.
> > 
> > If we object to allowing expr embedding of a subset of the Lucene syntax, I 
> > can't imagine we're okay w/ then jamming a subset of that syntax into the 
> > main CQL grammar.
> > 
> > If we want to do this in non-expr CQL space, I think using functions 
> > (ignoring the implementation complexity) at least removes ambiguity. 
> > "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be 
> > pretty clear, although there may be other problems. For instance, what 
> > happens when I try to use "token_match" on an indexed field whose analyzer 
> > does not tokenize? We obviously can't use the index, so we'd be reduced to 
> > requiring a filtering query, but maybe that's fine. My point is that, if 
> > we're going to make write and read analyzers symmetrical, there's really no 
> > way to make the semantics of our queries totally independent of analysis. 
> > (ex. "field : foo bar" behaves differently w/ read tokenization than it 
> > does without. It could even be an OR or AND query w/ tokenization, 
> > depending on our defaults.)
> > 
> > On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma  wrote:
> >> Why not start with SQLish operators supported by many databases (LIKE and 
> >> CONTAINS)?
> >> 
> >> On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan  
> >> wrote:
> >>> 
> >>> I am also -1 on directly exposing lucene like syntax here. Besides being 
> >>> ugly, SAI is not lucene, I do not think we should start using lucene 
> >>> syntax for it, it will make people think they can do everything else 
> >>> lucene allows.
> >>> 
>  On Aug 7, 2023, at 5:13 AM, Benedict  wrote:
>  
>  
>  I’m strongly opposed to : 
>  
>  It is very dissimilar to our current operators. CQL is already not the 
>  prettiest language, but let’s not make it a total mish mash.
>  
>  
>  
>  
> > On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
> > 
> > I am also in agreement with 'column : token' in that 'I don't hate it' 
> > but I'd like to offer an alternative to this in 'column HAS token'. HAS 
> > is currently not a keyword that we use so wouldn't cause any brain 
> > conflicts.
> > 
> > While I don't hate ':' I have a particular dislike of the lucene sear

Re: Tokenization and SAI query syntax

2023-08-13 Thread Caleb Rackliffe
We’ve already started down the path of using a git sub-module for the Accord 
library. That could be an option at some point.

> On Aug 13, 2023, at 12:53 PM, Jon Haddad  wrote:
> 
> Functions make sense to me too.  In addition to the reasons listed, I if we 
> acknowledge that functions in predicates are inevitable, then it makes total 
> sense to use them here.  I think this is the most forward thinking approach.
> 
> Assuming this happens, one thing that would be great down the line would be 
> if the CQL parser was broken out into a subproject with an artifact published 
> so the soon to be additional complexity of parsing CQL didn't have to be 
> pushed to every single end user like it does today.  I'm not trying to expand 
> the scope right now, just laying an idea down for the future.  
> 
> Jon
> 
>> On 2023/08/07 21:26:40 Josh McKenzie wrote:
>> Been chatting a bit w/Caleb about this offline and poking around to better 
>> educate myself.
>> 
>>> using functions (ignoring the implementation complexity) at least removes 
>>> ambiguity. 
>> This, plus using functions lets us kick the can down the road a bit in terms 
>> of landing on an integrated grammar we agree on. It seems to me there's a 
>> tension between:
>> 1. "SQL-like" (i.e. postgres-like)
>> 2. "Indexing and Search domain-specific-like" (i.e. lucene syntax which, as 
>> Benedict points out, doesn't really jell w/what we have in CQL at this 
>> point), and
>> 3. ??? Some other YOLO CQL / C* specific thing where we go our own road
>> I don't think we're really going to know what our feature-set in terms of 
>> indexing is going to look like or the shape it's going to take for awhile, 
>> so backing ourselves into any of the 3 corners above right now feels very 
>> premature to me.
>> 
>> So I'm coming around to the expr / method call approach to preserve that 
>> flexibility. It's maximally explicit and preserves optionality at the 
>> expense of being clunky. For now.
>> 
>> On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote:
 I do not think we should start using lucene syntax for it, it will make 
 people think they can do everything else lucene allows.
>>> 
>>> I'm sure we won't be supporting everything Lucene allows, but this is going 
>>> to evolve. Right off the bat, if you introduce support for tokenization and 
>>> filtering, someone is, for example, going to ask for phrase queries. ("John 
>>> Smith landed in Virginia" is tokenized, but someone wants to match exactly 
>>> on "John Smith".) The whole point of the Vector project is to do relevance, 
>>> right? Are we going to do term boosting? Do we need queries like "field: 
>>> quick brown +fox -news" where fox must be present, news cannot be present, 
>>> and quick and brown increase relevance?
>>> 
>>> SASI uses "=" and "LIKE" in a way that assumes the user understands the 
>>> tokenization scheme in use on the target field. I understand that's a bit 
>>> ambiguous.
>>> 
>>> If we object to allowing expr embedding of a subset of the Lucene syntax, I 
>>> can't imagine we're okay w/ then jamming a subset of that syntax into the 
>>> main CQL grammar.
>>> 
>>> If we want to do this in non-expr CQL space, I think using functions 
>>> (ignoring the implementation complexity) at least removes ambiguity. 
>>> "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be 
>>> pretty clear, although there may be other problems. For instance, what 
>>> happens when I try to use "token_match" on an indexed field whose analyzer 
>>> does not tokenize? We obviously can't use the index, so we'd be reduced to 
>>> requiring a filtering query, but maybe that's fine. My point is that, if 
>>> we're going to make write and read analyzers symmetrical, there's really no 
>>> way to make the semantics of our queries totally independent of analysis. 
>>> (ex. "field : foo bar" behaves differently w/ read tokenization than it 
>>> does without. It could even be an OR or AND query w/ tokenization, 
>>> depending on our defaults.)
>>> 
>>> On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma  wrote:
 Why not start with SQLish operators supported by many databases (LIKE and 
 CONTAINS)?
 
 On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan  
 wrote:
> 
> I am also -1 on directly exposing lucene like syntax here. Besides being 
> ugly, SAI is not lucene, I do not think we should start using lucene 
> syntax for it, it will make people think they can do everything else 
> lucene allows.
> 
>> On Aug 7, 2023, at 5:13 AM, Benedict  wrote:
>> 
>> 
>> I’m strongly opposed to : 
>> 
>> It is very dissimilar to our current operators. CQL is already not the 
>> prettiest language, but let’s not make it a total mish mash.
>> 
>> 
>> 
>> 
>>> On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
>>> 
>>> I am also in agreement with 'column : token' in that 'I don't hate it' 
>>> but I'd like to offer an alte

Re: [VOTE] Release Apache Cassandra 3.11.16

2023-08-13 Thread Brandon Williams
+1

Kind Regards,
Brandon

On Thu, Aug 10, 2023 at 1:43 AM Miklosovic, Stefan
 wrote:
>
> Proposing the test build of Cassandra 3.11.16 for release.
>
> sha1: f86929eae086aa108cf58ee0164c3d12a59ad4af
> Git: https://github.com/apache/cassandra/tree/3.11.16-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1305/org/apache/cassandra/cassandra-all/3.11.16/
>
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/3.11.16/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/3.11.16-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/3.11.16-tentative/NEWS.txt


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Derek Chen-Becker
OK, I already found some places where "/tmp" is hard-coded:

https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719
https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759

Can I open a ticket to track fixes for these and any other issues I run
into while moving to using "build/tmp"?

Thanks,

Derek

On Sun, Aug 13, 2023 at 10:13 AM Josh McKenzie  wrote:

> I think we want/need relative paths, e.g. "build/tmp", and if the path is
> in a mounted volume there can be another container still running.
>
> Sure. The specifics of *what* path isn't interesting to me.
>
> The pattern of:
> 1. Let env declare where TEMP lives
> 2. Write things to TEMP
> 3. Delete things from TEMP every time we run a new suite or do "ant clean"
>
> Is.
>
> Could also take it a step further and let env declare RESULTS_PATH for
> things they want to be durable and add an "ant clean-results" target.
>
> On Sun, Aug 13, 2023, at 11:33 AM, Derek Chen-Becker wrote:
>
> Nevermind, I found "tmp.dir"
>
> On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker 
> wrote:
>
> Cool,
>
> I'm a little confused. Is "tmp.dir" a custom Java property that we expose?
> I thought that the standard "property was "java.io.tmpdir". Let me take a
> stab at setting tmp.dir to build/tmp and see if I run into any issues (or
> still see any files in /tmp).
>
> Cheers,
>
> Derek
>
> On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever  wrote:
>
>
> While doing some local testing, I noticed that my /tmp drive completely
> filled with test artifact files (e.g. data directories, logs, commit logs,
> etc). Mick pointed out that we do attempt to do some "find" based cleanup
> in CI (
> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
> but I was wondering if it might be better to do the following for direct
> ant builds:
>
> 1. If TMPDIR is set, use it. It does not appear to be honored, currently,
> so I need to do some analysis of what would need to be done here
> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set
> TMPDIR with that directory
> 3. Update the "ant clean" task to delete TMPDIR when we've generated it,
> or attempt the find-based cleanup if TMPDIR was provided
>
> Does anyone know if there are any hard-coded assumptions that test files
> will live directly under /tmp?
>
>
>
> This will need testing with in-tree scripts, ci-cassandra, and circleci  :(
>
> What comes to mind:
>  - TMPDIR works best today with the python and scripting stuff
>  - setting TMPDIR can break tests, hence unit test script set instead
> $TMP_DIR which is passed to `-Dtmp.dir=…`
>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>  - it is hard to customise everything
>  - it needs to work locally on your machine as well as in docker
> containers, as well as CI
>
> If we want something that is wiped by `ant clean` I would suggest using
> the build/tmp directory by default.
> In-tree scripts do this for unit tests:
> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>  but are not yet doing it for the dtests:
> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>
>
> So I don't think we need (3). If the caller has specified TMPDIR it is
> then their responsibility to clean it.
>
> We can probably avoid trying to set TMPDIR, instead defaulting the
> `tmp.dir` property to  the build/tmp directory.
>
> The goal of any changes in build.xml should be, in addition to providing
> the best dev exp, to simplify the testing and CI layers above it.
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>
>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Mick Semb Wever
>
>
>
> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719
>
> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759
>
> Can I open a ticket to track fixes for these and any other issues I run
> into while moving to using "build/tmp"?
>


Go for it. :-)
There's also tests that hardcode other paths that breaks the use of
`build.dir`


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Josh McKenzie
> There's also tests that hardcode
I started mentally twitching when I hit that point in the sentence.

**Kill them with fire.**

On Sun, Aug 13, 2023, at 4:51 PM, Mick Semb Wever wrote:
>> 
>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719
>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759
>> 
>> Can I open a ticket to track fixes for these and any other issues I run into 
>> while moving to using "build/tmp"?
> 
> 
> Go for it. :-) 
> There's also tests that hardcode other paths that breaks the use of 
> `build.dir`