dweiss commented on PR #14824: URL: https://github.com/apache/lucene/pull/14824#issuecomment-2993560243
Here are some benchmarks from my Linux system. This runs the patch with varying number of workers and the default ```-XX:ActiveProcessorCount=1``` in gradle.properties. This is also the worst-case scenario of reformatting all files from scratch, no incremental information. My system is an Ubuntu AMD Ryzen Threadripper 3970X, 32 core. ``` echo "With bg daemon: './gradlew clean checkGoogleJavaFormat'." ./gradlew -q clean ./gradlew -q --stop for workers in 1 2 4 8 16 32; do echo "max-workers: $workers" (for i in `seq 1 3`; do time ./gradlew clean checkGoogleJavaFormat --max-workers $workers ; done ) 2>&1 | grep "real" done ``` results: ``` max-workers: 1 real 0m51.099s real 0m42.128s real 0m41.845s max-workers: 2 real 0m23.382s real 0m23.619s real 0m23.563s max-workers: 4 real 0m15.160s real 0m15.877s real 0m15.169s max-workers: 8 real 0m13.651s real 0m13.534s real 0m13.468s max-workers: 16 real 0m18.783s real 0m18.273s real 0m18.813s max-workers: 32 real 0m26.930s real 0m26.884s real 0m27.259s ``` The CPU is mostly idle during most of these runs. Weird. If I remove ```-XX:ActiveProcessorCount=1``` from gradle.properties, I get these results: ``` max-workers: 1 real 0m43.419s real 0m41.613s real 0m41.829s max-workers: 2 real 0m22.878s real 0m22.745s real 0m22.604s max-workers: 4 real 0m13.625s real 0m13.452s real 0m13.608s max-workers: 8 real 0m8.392s real 0m8.490s real 0m8.319s max-workers: 16 real 0m6.225s real 0m6.929s real 0m6.356s max-workers: 32 real 0m6.052s real 0m6.863s real 0m6.204s ``` so it clearly is a benefit if you have higher core counts. It's also close to the lower limit of manually running google-java-format on all source files (they do have concurrent processing inside). For the incremental case... it's fast enough (well, it doesn't do anything), even from a "cold" start, without any daemon in the background (the first call will show configuration time): ``` echo "With bg daemon, from cold-start, incremental: './gradlew checkGoogleJavaFormat'." ./gradlew checkGoogleJavaFormat for workers in 2 4 8; do echo "max-workers: $workers" ./gradlew --stop -q (for i in `seq 1 3`; do time ./gradlew checkGoogleJavaFormat --max-workers $workers ; done ) 2>&1 | grep "real" done ``` results: ``` max-workers: 2 real 0m8.953s real 0m2.105s real 0m2.002s max-workers: 4 real 0m8.746s real 0m2.061s real 0m2.037s max-workers: 8 real 0m8.856s real 0m2.054s real 0m1.978s ``` Rather internal detail but shows different batch sizes of input files for a constant number of workers (the default is 5): ``` for batchSize in 1 2 4 8 16 32 64; do echo "batch size: $batchSize" (for i in `seq 1 3`; do time ./gradlew clean checkGoogleJavaFormat -Plucene.gjf.batchSize=$batchSize --max-workers 8 ; done ) 2>&1 | grep "real" done ``` results: ``` batch size: 1 real 0m8.958s real 0m8.760s real 0m8.585s batch size: 2 real 0m8.542s real 0m8.401s real 0m8.481s batch size: 4 real 0m8.341s real 0m8.378s real 0m8.469s batch size: 8 real 0m8.671s real 0m8.494s real 0m8.458s batch size: 16 real 0m8.363s real 0m8.408s real 0m8.395s batch size: 32 real 0m8.384s real 0m8.320s real 0m8.423s batch size: 64 real 0m8.578s real 0m8.622s real 0m8.699s ``` Finally, the same check for the previous, spotless-based implementation (main branch). ``` ./gradlew --stop ./gradlew clean for workers in 1 2 4 8 16 32; do echo "max-workers: $workers" (for i in `seq 1 3`; do time ./gradlew clean spotlessJavaCheck --max-workers $workers ; done ) 2>&1 | grep "real" done ``` results: ``` max-workers: 1 real 0m49.843s real 0m47.934s real 0m48.256s max-workers: 2 real 0m28.170s real 0m27.980s real 0m27.851s max-workers: 4 real 0m21.620s real 0m21.475s real 0m21.503s max-workers: 8 real 0m21.192s real 0m20.962s real 0m20.895s max-workers: 16 real 0m20.984s real 0m20.783s real 0m20.773s max-workers: 32 real 0m21.290s real 0m21.077s real 0m21.037s ``` Faster. Note I didn't do anything here - all the heavy lifting is done by the same implementation in google-java-format. The difference is in the long-tail of the longest operation (formatting lucene/core), which is now parallel. I also toyed with removing "-XX:TieredStopAtLevel=1" from gradle.properties, then re-ran the benchmark: ``` ./gradlew -q clean ./gradlew -q --stop for workers in 8 16 32; do echo "max-workers: $workers" (for i in `seq 1 3`; do time ./gradlew clean checkGoogleJavaFormat --max-workers $workers ; done ) 2>&1 | grep "real" done ``` results: ``` max-workers: 8 real 0m16.147s real 0m6.014s real 0m5.939s max-workers: 16 real 0m4.830s real 0m4.700s real 0m4.716s max-workers: 32 real 0m5.271s real 0m5.228s real 0m5.243s ``` So you get that initial "hit" when hotspot compiles all that code in the daemon, then it's a bit faster compared to what we currently have as the default. I don't know what's better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org