jfboeuf commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470657083
Thank you for your feedback. Perhaps I misunderstood your point, but the implementation I propose only calls `Checksum.update(byte[])`. The change resides in how the buffer is fed to avoid byte-by-byte reading from the underlying `IndexInput` and byte-by-byte copy to the buffer. I created and pushed [into a different branch a JMH benchmark](https://github.com/apache/lucene/commit/91482728ea1633213bc064ed5362be041101d1d5) comparing the current implementation to the one I propose. The results show a noticeable improvement: ``` Benchmark (size) Mode Cnt Score Error Units BufferedChecksumIndexInputBenchmark.decodeLongArray 10 thrpt 15 7521.843 ± 167.756 ops/ms BufferedChecksumIndexInputBenchmark.decodeLongArray 1000 thrpt 15 1004.213 ± 3.587 ops/ms BufferedChecksumIndexInputBenchmark.decodeLongArray 100000 thrpt 15 10.993 ± 0.102 ops/ms BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig 10 thrpt 15 3865.018 ± 38.850 ops/ms BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig 1000 thrpt 15 46.381 ± 0.293 ops/ms BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig 100000 thrpt 15 0.475 ± 0.022 ops/ms BufferedChecksumIndexInputBenchmark.decodeSingleLongs 10 thrpt 15 8355.638 ± 52.745 ops/ms BufferedChecksumIndexInputBenchmark.decodeSingleLongs 1000 thrpt 15 212.133 ± 4.296 ops/ms BufferedChecksumIndexInputBenchmark.decodeSingleLongs 100000 thrpt 15 2.744 ± 0.021 ops/ms BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig 10 thrpt 15 3938.857 ± 42.751 ops/ms BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig 1000 thrpt 15 47.246 ± 0.444 ops/ms BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig 100000 thrpt 15 0.460 ± 0.020 ops/ms ``` For large arrays, there can be an improvement of up to 23 times when reading long arrays and 6 times when reading single long values. Transitioning from reading single long values to long arrays for live documents and Bloom Filters— bitsets being commonly large in both scenarios—results in even greater performance enhancements. The benchmark shows the single-long approach performs better on small arrays. This is likely due to the cost of wrapping the `byte[]` to a `ByteBuffer` and `LongBuffer` that is not paid off to copy a few bytes. It can be improved by making `updateLongs(long[], int, int)` switch to a loop over `updateLong(long)` when the length to checksum fits in the buffer. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org