jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1845870621
@uschindler I pushed a quick ugly impl of reading group vints via a `ByteBuffer` at https://github.com/apache/lucene/commit/9f5d9f7ab6777b6331c7e0456b5f7660cb64d55b. `DataInput` gets a new `readNBytes(int)` method that produces a `ByteBuffer`. `MMapDirectory` overrides it to return a slice of the current memory segment when the buffer would not cross boundaries. Then a utility method reads sequences of variable-length integers by calling `readNBytes(17)` for each group of 4 integers, and rewinds unused bytes. Unfortunately, the benchmark reports terrible performance, I'll try to look more into why but I thought you may have ideas. FWIW I verified that I'm actually using the specialized `MemorySegmentIndexInput#readNBytes` impl, and not the default impl which is slow as well. - `mmap_byteBufferReadGroupVIntBaseline` is how things are currently - `mmap_byteBufferReadGroupVInt` is the optimized impl in this PR - `mmap_byteBufferReadGroupVInt_viaByteBuffer` is my new tentative impl on `ByteBuffer`s. ``` Benchmark (size) Mode Cnt Score Error Units GroupVIntBenchmark.mmap_byteBufferReadGroupVInt 64 thrpt 5 11.198 ± 0.160 ops/us GroupVIntBenchmark.mmap_byteBufferReadGroupVIntBaseline 64 thrpt 5 6.908 ± 0.385 ops/us GroupVIntBenchmark.mmap_byteBufferReadGroupVInt_viaByteBuffer 64 thrpt 5 0.155 ± 0.001 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org