jfboeuf commented on PR #13989:
URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470657083

   Thank you for your feedback. Perhaps I misunderstood your point, but the 
implementation I propose only calls `Checksum.update(byte[])`. The change 
resides in how the buffer is fed to avoid byte-by-byte reading from the 
underlying `IndexInput` and byte-by-byte copy to the buffer. I created and 
pushed [into a different branch a JMH 
benchmark](https://github.com/apache/lucene/commit/91482728ea1633213bc064ed5362be041101d1d5)
 comparing the current implementation to the one I propose. The results show a 
noticeable improvement:
   ```
   Benchmark                                                  (size)   Mode  
Cnt     Score     Error   Units
   BufferedChecksumIndexInputBenchmark.decodeLongArray            10  thrpt   
15  7521.843 ± 167.756  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeLongArray          1000  thrpt   
15  1004.213 ±   3.587  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeLongArray        100000  thrpt   
15    10.993 ±   0.102  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig        10  thrpt   
15  3865.018 ±  38.850  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig      1000  thrpt   
15    46.381 ±   0.293  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeLongArrayOrig    100000  thrpt   
15     0.475 ±   0.022  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeSingleLongs          10  thrpt   
15  8355.638 ±  52.745  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeSingleLongs        1000  thrpt   
15   212.133 ±   4.296  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeSingleLongs      100000  thrpt   
15     2.744 ±   0.021  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig      10  thrpt   
15  3938.857 ±  42.751  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig    1000  thrpt   
15    47.246 ±   0.444  ops/ms
   BufferedChecksumIndexInputBenchmark.decodeSingleLongsOrig  100000  thrpt   
15     0.460 ±   0.020  ops/ms
   ``` 
   For large arrays, there can be an improvement of up to 23 times when reading 
long arrays and 6 times when reading single long values. Transitioning from 
reading single long values to long arrays for live documents and Bloom Filters— 
bitsets being commonly large in both scenarios—results in even greater 
performance enhancements.
   
   The benchmark shows the single-long approach performs better on small 
arrays. This is likely due to the cost of wrapping the `byte[]` to a 
`ByteBuffer` and `LongBuffer` that is not paid off to copy a few bytes. It can 
be improved by making `updateLongs(long[], int, int)` switch to a loop over 
`updateLong(long)` when the length to checksum fits in the buffer. What do you 
think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to