[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434267#comment-17434267
 ] 

Dawid Weiss edited comment on LUCENE-10195 at 10/26/21, 11:02 AM:
------------------------------------------------------------------

I did take a look at the patch. Thank you - some of the things look like nice 
improvements. This command is not what should be the benchmark though:
{code}
./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false 
-Ptests.neverUpToDate=false --scan
{code}
instead, the second invocation of this (full incremental build) should be 
measured instead (tests are a separate story):
{code}
./gradlew check -x test
{code}

I understand you're advocating for gradle cache and I think it's great but I 
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a bunch of 
corporate CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused 
much. And we do want folks
to run stuff in their own environments because this is a good regression test 
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced 
down their throats.

As for the recomendations, here are my thoughts.

bq. Fixing tests.seed obviously help to benefit from up-to-date check, I get 
the point about randomization, but this is a trade-off with expensive cost of 
resources

Yes, it is a tradeoff we're willing to take. Again - if somebody wants a 
locally fixed seed, they can do it. You'd
be surprised how frequently those tests fail on boundary conditions in only 
certain environment combinations.

bq. Use the standard Gradle wrapper

There is a reason why non-standard wrapper is used - please look up the 
relevant issue in Jira (source release shouldn't ship
a binary artifact).

bq. Setup local gradle.properties in Gradle home folder rather than having an 
automatic generation from gradle/generation/local-settings.gradle

There is a reason why gradle.properties is generated (it adjusts the defaults 
to the local machine). I wish gradle had
a mechanism of tuning, say, max-workers dynamically, but I don't think it does.

bq. Do not override Gradle daemon TTL to 15mn, this is way to short

If you run the build regularily switching VMs then background gradle daemons 
eat up all your memory.
So no, I don't think it's too short.

bq. Do not create / commit generated test files to src directory 
(frenchArticles.txt, Top50KWiki.utf8, 
bq. CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi)

You don't understand why they're there. One of these generated files requires 
16GB memory and over 15 
minutes on a decent server to generate. Even if you use the gradle cache, the 
first run on your 
old-ish laptop will kill your build with an OOM. Some of these resources 
require specific 
environments (like Linux toolchain). I don't think there is a mechanism in 
gradle which would allow
only regenerating these resources if their source input triggers actually 
change.

I'll go through the changes you suggested and will cherry-pick some of the 
improvements, thank you.


was (Author: dweiss):
I did take a look at the patch. Thank you - some of the things look like nice 
improvements. This command is not what should be the benchmark though:
{code}
./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false 
-Ptests.neverUpToDate=false --scan
{code}
instead, the second invocation of this (full incremental build) should be 
measured instead (tests are a separate story):
{code}
./gradlew check -x test
{code}

I understand you're advocating for gradle cache and I think it's great but I 
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a corporate 
CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused 
much. And we do want folks
to run stuff in their own environments because this is a good regression test 
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced 
down their throats.

As for the recomendations, here are my thoughts.

> Fixing tests.seed obviously help to benefit from up-to-date check, I get the 
> point about randomization, but this is a trade-off with expensive cost of 
> resources

Yes, it is a tradeoff we're willing to take. Again - if somebody wants a 
locally fixed seed, they can do it. You'd
be surprised how frequently those tests fail on boundary conditions in only 
certain environment combinations.

> Use the standard Gradle wrapper

There is a reason why non-standard wrapper is used - please look up the 
relevant issue in Jira (source release shouldn't ship
a binary artifact).

> Setup local gradle.properties in Gradle home folder rather than having an 
> automatic generation from gradle/generation/local-settings.gradle

There is a reason why gradle.properties is generated (it adjusts the defaults 
to the local machine). I wish gradle had
a mechanism of tuning, say, max-workers dynamically, but I don't think it does.

> Do not override Gradle daemon TTL to 15mn, this is way to short

If you run the build regularily switching VMs then background gradle daemons 
eat up all your memory.
So no, I don't think it's too short.

> Do not create / commit generated test files to src directory 
> (frenchArticles.txt, Top50KWiki.utf8, 
> CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi)

You don't understand why they're there. One of these generated files requires 
16GB memory and over 15 
minutes on a decent server to generate. Even if you use the gradle cache, the 
first run on your 
old-ish laptop will kill your build with an OOM. Some of these resources 
require specific 
environments (like Linux toolchain). I don't think there is a mechanism in 
gradle which would allow
only regenerating these resources if their source input triggers actually 
change.

I'll go through the changes you suggested and will cherry-pick some of the 
improvements, thank you.

> Gradle build speed improvement
> ------------------------------
>
>                 Key: LUCENE-10195
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10195
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Jerome Prinet
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to