DataInput [lucene]

via GitHub Thu, 07 Dec 2023 10:12:08 -0800


jpountz commented on PR #12841:
URL: https://github.com/apache/lucene/pull/12841#issuecomment-1845870621


   @uschindler I pushed a quick ugly impl of reading group vints via a 
`ByteBuffer` at 
https://github.com/apache/lucene/commit/9f5d9f7ab6777b6331c7e0456b5f7660cb64d55b.
 `DataInput` gets a new `readNBytes(int)` method that produces a `ByteBuffer`. 
`MMapDirectory` overrides it to return a slice of the current memory segment 
when the buffer would not cross boundaries. Then a utility method reads 
sequences of variable-length integers by calling `readNBytes(17)` for each 
group of 4 integers, and rewinds unused bytes.
   
   Unfortunately, the benchmark reports terrible performance, I'll try to look 
more into why but I thought you may have ideas. FWIW I verified that I'm 
actually using the specialized `MemorySegmentIndexInput#readNBytes` impl, and 
not the default impl which is slow as well.
    - `mmap_byteBufferReadGroupVIntBaseline` is how things are currently
    - `mmap_byteBufferReadGroupVInt` is the optimized impl in this PR
    - `mmap_byteBufferReadGroupVInt_viaByteBuffer` is my new tentative impl on 
`ByteBuffer`s.
   
   ```
   Benchmark                                                      (size)   Mode 
 Cnt   Score   Error   Units
   GroupVIntBenchmark.mmap_byteBufferReadGroupVInt                    64  thrpt 
   5  11.198 ± 0.160  ops/us
   GroupVIntBenchmark.mmap_byteBufferReadGroupVIntBaseline            64  thrpt 
   5   6.908 ± 0.385  ops/us
   GroupVIntBenchmark.mmap_byteBufferReadGroupVInt_viaByteBuffer      64  thrpt 
   5   0.155 ± 0.001  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

Reply via email to