Re: [Discuss] cleaning up build temp files
> While doing some local testing, I noticed that my /tmp drive completely > filled with test artifact files (e.g. data directories, logs, commit logs, > etc). Mick pointed out that we do attempt to do some "find" based cleanup > in CI ( > https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439), > but I was wondering if it might be better to do the following for direct > ant builds: > > 1. If TMPDIR is set, use it. It does not appear to be honored, currently, > so I need to do some analysis of what would need to be done here > 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set > TMPDIR with that directory > 3. Update the "ant clean" task to delete TMPDIR when we've generated it, > or attempt the find-based cleanup if TMPDIR was provided > > Does anyone know if there are any hard-coded assumptions that test files > will live directly under /tmp? > This will need testing with in-tree scripts, ci-cassandra, and circleci :( What comes to mind: - TMPDIR works best today with the python and scripting stuff - setting TMPDIR can break tests, hence unit test script set instead $TMP_DIR which is passed to `-Dtmp.dir=…` - /tmp is often set up to be a more appropropriate fs (and volume size) - it is hard to customise everything - it needs to work locally on your machine as well as in docker containers, as well as CI If we want something that is wiped by `ant clean` I would suggest using the build/tmp directory by default. In-tree scripts do this for unit tests: https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160 but are not yet doing it for the dtests: https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58 So I don't think we need (3). If the caller has specified TMPDIR it is then their responsibility to clean it. We can probably avoid trying to set TMPDIR, instead defaulting the `tmp.dir` property to the build/tmp directory. The goal of any changes in build.xml should be, in addition to providing the best dev exp, to simplify the testing and CI layers above it.
Re: [Discuss] cleaning up build temp files
Why not use "/${CASS_BUILD_TMP}/cassandra." on a given run and then on subsequent runs "rm -rf f/${CASS_BUILD_TMP}/cassandra.*"? If CASS_BUILD_TMP is not defined, default to /tmp. "ant clean" can also wipe it. If it's a safe assumption that we only ever need 1 instance of data in that space (i.e. we won't have 2 builds / tests running in a single container concurrently) it seems the above would solve the problem. Different environments (circle, ASF, etc) could define CASS_BUILD_TMP differently if needed for their env and problem is solved. On Sun, Aug 13, 2023, at 10:23 AM, Mick Semb Wever wrote: > >> While doing some local testing, I noticed that my /tmp drive completely >> filled with test artifact files (e.g. data directories, logs, commit logs, >> etc). Mick pointed out that we do attempt to do some "find" based cleanup in >> CI >> (https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439), >> but I was wondering if it might be better to do the following for direct >> ant builds: >> >> 1. If TMPDIR is set, use it. It does not appear to be honored, currently, so >> I need to do some analysis of what would need to be done here >> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set >> TMPDIR with that directory >> 3. Update the "ant clean" task to delete TMPDIR when we've generated it, or >> attempt the find-based cleanup if TMPDIR was provided >> >> Does anyone know if there are any hard-coded assumptions that test files >> will live directly under /tmp? > > > This will need testing with in-tree scripts, ci-cassandra, and circleci :( > > What comes to mind: > - TMPDIR works best today with the python and scripting stuff > - setting TMPDIR can break tests, hence unit test script set instead > $TMP_DIR which is passed to `-Dtmp.dir=…` > - /tmp is often set up to be a more appropropriate fs (and volume size) > - it is hard to customise everything > - it needs to work locally on your machine as well as in docker containers, > as well as CI > > If we want something that is wiped by `ant clean` I would suggest using the > build/tmp directory by default. > In-tree scripts do this for unit tests: > https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160 > but are not yet doing it for the dtests: > https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58 > > > So I don't think we need (3). If the caller has specified TMPDIR it is then > their responsibility to clean it. > > We can probably avoid trying to set TMPDIR, instead defaulting the `tmp.dir` > property to the build/tmp directory. > > The goal of any changes in build.xml should be, in addition to providing the > best dev exp, to simplify the testing and CI layers above it.
Re: [Discuss] cleaning up build temp files
On Sun, 13 Aug 2023 at 16:48, Josh McKenzie wrote: > Why not use "/${CASS_BUILD_TMP}/cassandra." on a given run and then > on subsequent runs "rm -rf f/${CASS_BUILD_TMP}/cassandra.*"? If > CASS_BUILD_TMP is not defined, default to /tmp. > I think we want/need relative paths, e.g. "build/tmp", and if the path is in a mounted volume there can be another container still running. No objections in theory, and this isn't difficult stuff, just a few variations to deal with (that we don't have automated CI over).
Re: [Discuss] cleaning up build temp files
Cool, I'm a little confused. Is "tmp.dir" a custom Java property that we expose? I thought that the standard "property was "java.io.tmpdir". Let me take a stab at setting tmp.dir to build/tmp and see if I run into any issues (or still see any files in /tmp). Cheers, Derek On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever wrote: > > While doing some local testing, I noticed that my /tmp drive completely >> filled with test artifact files (e.g. data directories, logs, commit logs, >> etc). Mick pointed out that we do attempt to do some "find" based cleanup >> in CI ( >> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439), >> but I was wondering if it might be better to do the following for direct >> ant builds: >> >> 1. If TMPDIR is set, use it. It does not appear to be honored, currently, >> so I need to do some analysis of what would need to be done here >> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set >> TMPDIR with that directory >> 3. Update the "ant clean" task to delete TMPDIR when we've generated it, >> or attempt the find-based cleanup if TMPDIR was provided >> >> Does anyone know if there are any hard-coded assumptions that test files >> will live directly under /tmp? >> > > > This will need testing with in-tree scripts, ci-cassandra, and circleci :( > > What comes to mind: > - TMPDIR works best today with the python and scripting stuff > - setting TMPDIR can break tests, hence unit test script set instead > $TMP_DIR which is passed to `-Dtmp.dir=…` > - /tmp is often set up to be a more appropropriate fs (and volume size) > - it is hard to customise everything > - it needs to work locally on your machine as well as in docker > containers, as well as CI > > If we want something that is wiped by `ant clean` I would suggest using > the build/tmp directory by default. > In-tree scripts do this for unit tests: > https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160 > but are not yet doing it for the dtests: > https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58 > > > So I don't think we need (3). If the caller has specified TMPDIR it is > then their responsibility to clean it. > > We can probably avoid trying to set TMPDIR, instead defaulting the > `tmp.dir` property to the build/tmp directory. > > The goal of any changes in build.xml should be, in addition to providing > the best dev exp, to simplify the testing and CI layers above it. > > -- +---+ | Derek Chen-Becker | | GPG Key available at https://keybase.io/dchenbecker and | | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | +---+
Re: [Discuss] cleaning up build temp files
Nevermind, I found "tmp.dir" On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker wrote: > Cool, > > I'm a little confused. Is "tmp.dir" a custom Java property that we expose? > I thought that the standard "property was "java.io.tmpdir". Let me take a > stab at setting tmp.dir to build/tmp and see if I run into any issues (or > still see any files in /tmp). > > Cheers, > > Derek > > On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever wrote: > >> >> While doing some local testing, I noticed that my /tmp drive completely >>> filled with test artifact files (e.g. data directories, logs, commit logs, >>> etc). Mick pointed out that we do attempt to do some "find" based cleanup >>> in CI ( >>> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439), >>> but I was wondering if it might be better to do the following for direct >>> ant builds: >>> >>> 1. If TMPDIR is set, use it. It does not appear to be honored, >>> currently, so I need to do some analysis of what would need to be done here >>> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set >>> TMPDIR with that directory >>> 3. Update the "ant clean" task to delete TMPDIR when we've generated it, >>> or attempt the find-based cleanup if TMPDIR was provided >>> >>> Does anyone know if there are any hard-coded assumptions that test files >>> will live directly under /tmp? >>> >> >> >> This will need testing with in-tree scripts, ci-cassandra, and circleci >> :( >> >> What comes to mind: >> - TMPDIR works best today with the python and scripting stuff >> - setting TMPDIR can break tests, hence unit test script set instead >> $TMP_DIR which is passed to `-Dtmp.dir=…` >> - /tmp is often set up to be a more appropropriate fs (and volume size) >> - it is hard to customise everything >> - it needs to work locally on your machine as well as in docker >> containers, as well as CI >> >> If we want something that is wiped by `ant clean` I would suggest using >> the build/tmp directory by default. >> In-tree scripts do this for unit tests: >> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160 >> but are not yet doing it for the dtests: >> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58 >> >> >> So I don't think we need (3). If the caller has specified TMPDIR it is >> then their responsibility to clean it. >> >> We can probably avoid trying to set TMPDIR, instead defaulting the >> `tmp.dir` property to the build/tmp directory. >> >> The goal of any changes in build.xml should be, in addition to providing >> the best dev exp, to simplify the testing and CI layers above it. >> >> > > -- > +---+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---+ > > -- +---+ | Derek Chen-Becker | | GPG Key available at https://keybase.io/dchenbecker and | | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | +---+
Re: [Discuss] cleaning up build temp files
> I think we want/need relative paths, e.g. "build/tmp", and if the path is in > a mounted volume there can be another container still running. Sure. The specifics of *what* path isn't interesting to me. The pattern of: 1. Let env declare where TEMP lives 2. Write things to TEMP 3. Delete things from TEMP every time we run a new suite or do "ant clean" Is. Could also take it a step further and let env declare RESULTS_PATH for things they want to be durable and add an "ant clean-results" target. On Sun, Aug 13, 2023, at 11:33 AM, Derek Chen-Becker wrote: > Nevermind, I found "tmp.dir" > > On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker > wrote: >> Cool, >> >> I'm a little confused. Is "tmp.dir" a custom Java property that we expose? I >> thought that the standard "property was "java.io.tmpdir". Let me take a stab >> at setting tmp.dir to build/tmp and see if I run into any issues (or still >> see any files in /tmp). >> >> Cheers, >> >> Derek >> >> On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever wrote: >>> While doing some local testing, I noticed that my /tmp drive completely filled with test artifact files (e.g. data directories, logs, commit logs, etc). Mick pointed out that we do attempt to do some "find" based cleanup in CI (https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439), but I was wondering if it might be better to do the following for direct ant builds: 1. If TMPDIR is set, use it. It does not appear to be honored, currently, so I need to do some analysis of what would need to be done here 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set TMPDIR with that directory 3. Update the "ant clean" task to delete TMPDIR when we've generated it, or attempt the find-based cleanup if TMPDIR was provided Does anyone know if there are any hard-coded assumptions that test files will live directly under /tmp? >>> >>> >>> This will need testing with in-tree scripts, ci-cassandra, and circleci :( >>> >>> What comes to mind: >>> - TMPDIR works best today with the python and scripting stuff >>> - setting TMPDIR can break tests, hence unit test script set instead >>> $TMP_DIR which is passed to `-Dtmp.dir=…` >>> - /tmp is often set up to be a more appropropriate fs (and volume size) >>> - it is hard to customise everything >>> - it needs to work locally on your machine as well as in docker >>> containers, as well as CI >>> >>> If we want something that is wiped by `ant clean` I would suggest using the >>> build/tmp directory by default. >>> In-tree scripts do this for unit tests: >>> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160 >>> but are not yet doing it for the dtests: >>> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58 >>> >>> >>> So I don't think we need (3). If the caller has specified TMPDIR it is then >>> their responsibility to clean it. >>> >>> We can probably avoid trying to set TMPDIR, instead defaulting the >>> `tmp.dir` property to the build/tmp directory. >>> >>> The goal of any changes in build.xml should be, in addition to providing >>> the best dev exp, to simplify the testing and CI layers above it. >>> >> >> >> -- >> +---+ >> | Derek Chen-Becker | >> | GPG Key available at https://keybase.io/dchenbecker and | >> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >> +---+ >> > > > -- > +---+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---+ >
Re: Tokenization and SAI query syntax
Functions make sense to me too. In addition to the reasons listed, I if we acknowledge that functions in predicates are inevitable, then it makes total sense to use them here. I think this is the most forward thinking approach. Assuming this happens, one thing that would be great down the line would be if the CQL parser was broken out into a subproject with an artifact published so the soon to be additional complexity of parsing CQL didn't have to be pushed to every single end user like it does today. I'm not trying to expand the scope right now, just laying an idea down for the future. Jon On 2023/08/07 21:26:40 Josh McKenzie wrote: > Been chatting a bit w/Caleb about this offline and poking around to better > educate myself. > > > using functions (ignoring the implementation complexity) at least removes > > ambiguity. > This, plus using functions lets us kick the can down the road a bit in terms > of landing on an integrated grammar we agree on. It seems to me there's a > tension between: > 1. "SQL-like" (i.e. postgres-like) > 2. "Indexing and Search domain-specific-like" (i.e. lucene syntax which, as > Benedict points out, doesn't really jell w/what we have in CQL at this > point), and > 3. ??? Some other YOLO CQL / C* specific thing where we go our own road > I don't think we're really going to know what our feature-set in terms of > indexing is going to look like or the shape it's going to take for awhile, so > backing ourselves into any of the 3 corners above right now feels very > premature to me. > > So I'm coming around to the expr / method call approach to preserve that > flexibility. It's maximally explicit and preserves optionality at the expense > of being clunky. For now. > > On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote: > > > I do not think we should start using lucene syntax for it, it will make > > > people think they can do everything else lucene allows. > > > > I'm sure we won't be supporting everything Lucene allows, but this is going > > to evolve. Right off the bat, if you introduce support for tokenization and > > filtering, someone is, for example, going to ask for phrase queries. ("John > > Smith landed in Virginia" is tokenized, but someone wants to match exactly > > on "John Smith".) The whole point of the Vector project is to do relevance, > > right? Are we going to do term boosting? Do we need queries like "field: > > quick brown +fox -news" where fox must be present, news cannot be present, > > and quick and brown increase relevance? > > > > SASI uses "=" and "LIKE" in a way that assumes the user understands the > > tokenization scheme in use on the target field. I understand that's a bit > > ambiguous. > > > > If we object to allowing expr embedding of a subset of the Lucene syntax, I > > can't imagine we're okay w/ then jamming a subset of that syntax into the > > main CQL grammar. > > > > If we want to do this in non-expr CQL space, I think using functions > > (ignoring the implementation complexity) at least removes ambiguity. > > "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be > > pretty clear, although there may be other problems. For instance, what > > happens when I try to use "token_match" on an indexed field whose analyzer > > does not tokenize? We obviously can't use the index, so we'd be reduced to > > requiring a filtering query, but maybe that's fine. My point is that, if > > we're going to make write and read analyzers symmetrical, there's really no > > way to make the semantics of our queries totally independent of analysis. > > (ex. "field : foo bar" behaves differently w/ read tokenization than it > > does without. It could even be an OR or AND query w/ tokenization, > > depending on our defaults.) > > > > On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma wrote: > >> Why not start with SQLish operators supported by many databases (LIKE and > >> CONTAINS)? > >> > >> On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan > >> wrote: > >>> > >>> I am also -1 on directly exposing lucene like syntax here. Besides being > >>> ugly, SAI is not lucene, I do not think we should start using lucene > >>> syntax for it, it will make people think they can do everything else > >>> lucene allows. > >>> > On Aug 7, 2023, at 5:13 AM, Benedict wrote: > > > I’m strongly opposed to : > > It is very dissimilar to our current operators. CQL is already not the > prettiest language, but let’s not make it a total mish mash. > > > > > > On 7 Aug 2023, at 10:59, Mike Adamson wrote: > > > > I am also in agreement with 'column : token' in that 'I don't hate it' > > but I'd like to offer an alternative to this in 'column HAS token'. HAS > > is currently not a keyword that we use so wouldn't cause any brain > > conflicts. > > > > While I don't hate ':' I have a particular dislike of the lucene sear
Re: Tokenization and SAI query syntax
We’ve already started down the path of using a git sub-module for the Accord library. That could be an option at some point. > On Aug 13, 2023, at 12:53 PM, Jon Haddad wrote: > > Functions make sense to me too. In addition to the reasons listed, I if we > acknowledge that functions in predicates are inevitable, then it makes total > sense to use them here. I think this is the most forward thinking approach. > > Assuming this happens, one thing that would be great down the line would be > if the CQL parser was broken out into a subproject with an artifact published > so the soon to be additional complexity of parsing CQL didn't have to be > pushed to every single end user like it does today. I'm not trying to expand > the scope right now, just laying an idea down for the future. > > Jon > >> On 2023/08/07 21:26:40 Josh McKenzie wrote: >> Been chatting a bit w/Caleb about this offline and poking around to better >> educate myself. >> >>> using functions (ignoring the implementation complexity) at least removes >>> ambiguity. >> This, plus using functions lets us kick the can down the road a bit in terms >> of landing on an integrated grammar we agree on. It seems to me there's a >> tension between: >> 1. "SQL-like" (i.e. postgres-like) >> 2. "Indexing and Search domain-specific-like" (i.e. lucene syntax which, as >> Benedict points out, doesn't really jell w/what we have in CQL at this >> point), and >> 3. ??? Some other YOLO CQL / C* specific thing where we go our own road >> I don't think we're really going to know what our feature-set in terms of >> indexing is going to look like or the shape it's going to take for awhile, >> so backing ourselves into any of the 3 corners above right now feels very >> premature to me. >> >> So I'm coming around to the expr / method call approach to preserve that >> flexibility. It's maximally explicit and preserves optionality at the >> expense of being clunky. For now. >> >> On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote: I do not think we should start using lucene syntax for it, it will make people think they can do everything else lucene allows. >>> >>> I'm sure we won't be supporting everything Lucene allows, but this is going >>> to evolve. Right off the bat, if you introduce support for tokenization and >>> filtering, someone is, for example, going to ask for phrase queries. ("John >>> Smith landed in Virginia" is tokenized, but someone wants to match exactly >>> on "John Smith".) The whole point of the Vector project is to do relevance, >>> right? Are we going to do term boosting? Do we need queries like "field: >>> quick brown +fox -news" where fox must be present, news cannot be present, >>> and quick and brown increase relevance? >>> >>> SASI uses "=" and "LIKE" in a way that assumes the user understands the >>> tokenization scheme in use on the target field. I understand that's a bit >>> ambiguous. >>> >>> If we object to allowing expr embedding of a subset of the Lucene syntax, I >>> can't imagine we're okay w/ then jamming a subset of that syntax into the >>> main CQL grammar. >>> >>> If we want to do this in non-expr CQL space, I think using functions >>> (ignoring the implementation complexity) at least removes ambiguity. >>> "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be >>> pretty clear, although there may be other problems. For instance, what >>> happens when I try to use "token_match" on an indexed field whose analyzer >>> does not tokenize? We obviously can't use the index, so we'd be reduced to >>> requiring a filtering query, but maybe that's fine. My point is that, if >>> we're going to make write and read analyzers symmetrical, there's really no >>> way to make the semantics of our queries totally independent of analysis. >>> (ex. "field : foo bar" behaves differently w/ read tokenization than it >>> does without. It could even be an OR or AND query w/ tokenization, >>> depending on our defaults.) >>> >>> On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma wrote: Why not start with SQLish operators supported by many databases (LIKE and CONTAINS)? On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan wrote: > > I am also -1 on directly exposing lucene like syntax here. Besides being > ugly, SAI is not lucene, I do not think we should start using lucene > syntax for it, it will make people think they can do everything else > lucene allows. > >> On Aug 7, 2023, at 5:13 AM, Benedict wrote: >> >> >> I’m strongly opposed to : >> >> It is very dissimilar to our current operators. CQL is already not the >> prettiest language, but let’s not make it a total mish mash. >> >> >> >> >>> On 7 Aug 2023, at 10:59, Mike Adamson wrote: >>> >>> I am also in agreement with 'column : token' in that 'I don't hate it' >>> but I'd like to offer an alte
Re: [VOTE] Release Apache Cassandra 3.11.16
+1 Kind Regards, Brandon On Thu, Aug 10, 2023 at 1:43 AM Miklosovic, Stefan wrote: > > Proposing the test build of Cassandra 3.11.16 for release. > > sha1: f86929eae086aa108cf58ee0164c3d12a59ad4af > Git: https://github.com/apache/cassandra/tree/3.11.16-tentative > Maven Artifacts: > https://repository.apache.org/content/repositories/orgapachecassandra-1305/org/apache/cassandra/cassandra-all/3.11.16/ > > The Source and Build Artifacts, and the Debian and RPM packages and > repositories, are available here: > https://dist.apache.org/repos/dist/dev/cassandra/3.11.16/ > > The vote will be open for 72 hours (longer if needed). Everyone who has > tested the build is invited to vote. Votes by PMC members are considered > binding. A vote passes if there are at least three binding +1s and no -1's. > > [1]: CHANGES.txt: > https://github.com/apache/cassandra/blob/3.11.16-tentative/CHANGES.txt > [2]: NEWS.txt: > https://github.com/apache/cassandra/blob/3.11.16-tentative/NEWS.txt
Re: [Discuss] cleaning up build temp files
OK, I already found some places where "/tmp" is hard-coded: https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719 https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759 Can I open a ticket to track fixes for these and any other issues I run into while moving to using "build/tmp"? Thanks, Derek On Sun, Aug 13, 2023 at 10:13 AM Josh McKenzie wrote: > I think we want/need relative paths, e.g. "build/tmp", and if the path is > in a mounted volume there can be another container still running. > > Sure. The specifics of *what* path isn't interesting to me. > > The pattern of: > 1. Let env declare where TEMP lives > 2. Write things to TEMP > 3. Delete things from TEMP every time we run a new suite or do "ant clean" > > Is. > > Could also take it a step further and let env declare RESULTS_PATH for > things they want to be durable and add an "ant clean-results" target. > > On Sun, Aug 13, 2023, at 11:33 AM, Derek Chen-Becker wrote: > > Nevermind, I found "tmp.dir" > > On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker > wrote: > > Cool, > > I'm a little confused. Is "tmp.dir" a custom Java property that we expose? > I thought that the standard "property was "java.io.tmpdir". Let me take a > stab at setting tmp.dir to build/tmp and see if I run into any issues (or > still see any files in /tmp). > > Cheers, > > Derek > > On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever wrote: > > > While doing some local testing, I noticed that my /tmp drive completely > filled with test artifact files (e.g. data directories, logs, commit logs, > etc). Mick pointed out that we do attempt to do some "find" based cleanup > in CI ( > https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439), > but I was wondering if it might be better to do the following for direct > ant builds: > > 1. If TMPDIR is set, use it. It does not appear to be honored, currently, > so I need to do some analysis of what would need to be done here > 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set > TMPDIR with that directory > 3. Update the "ant clean" task to delete TMPDIR when we've generated it, > or attempt the find-based cleanup if TMPDIR was provided > > Does anyone know if there are any hard-coded assumptions that test files > will live directly under /tmp? > > > > This will need testing with in-tree scripts, ci-cassandra, and circleci :( > > What comes to mind: > - TMPDIR works best today with the python and scripting stuff > - setting TMPDIR can break tests, hence unit test script set instead > $TMP_DIR which is passed to `-Dtmp.dir=…` > - /tmp is often set up to be a more appropropriate fs (and volume size) > - it is hard to customise everything > - it needs to work locally on your machine as well as in docker > containers, as well as CI > > If we want something that is wiped by `ant clean` I would suggest using > the build/tmp directory by default. > In-tree scripts do this for unit tests: > https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160 > but are not yet doing it for the dtests: > https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58 > > > So I don't think we need (3). If the caller has specified TMPDIR it is > then their responsibility to clean it. > > We can probably avoid trying to set TMPDIR, instead defaulting the > `tmp.dir` property to the build/tmp directory. > > The goal of any changes in build.xml should be, in addition to providing > the best dev exp, to simplify the testing and CI layers above it. > > > > -- > +---+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---+ > > > > -- > +---+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---+ > > > -- +---+ | Derek Chen-Becker | | GPG Key available at https://keybase.io/dchenbecker and | | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | +---+
Re: [Discuss] cleaning up build temp files
> > > > https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719 > > https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759 > > Can I open a ticket to track fixes for these and any other issues I run > into while moving to using "build/tmp"? > Go for it. :-) There's also tests that hardcode other paths that breaks the use of `build.dir`
Re: [Discuss] cleaning up build temp files
> There's also tests that hardcode I started mentally twitching when I hit that point in the sentence. **Kill them with fire.** On Sun, Aug 13, 2023, at 4:51 PM, Mick Semb Wever wrote: >> >> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719 >> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759 >> >> Can I open a ticket to track fixes for these and any other issues I run into >> while moving to using "build/tmp"? > > > Go for it. :-) > There's also tests that hardcode other paths that breaks the use of > `build.dir`