[PR] Rewrite of the GroupVInt optimization without lamdas, varhandles and no code in subclasses [lucene]

via GitHub Fri, 22 Aug 2025 12:18:33 -0700


uschindler opened a new pull request, #15116:
URL: https://github.com/apache/lucene/pull/15116


   Hi,
   this is a rewrite of the GroupVInt optimization. The results are great for 
MMapDirectory (more than twice as fast than the baseline implementation. The 
code has no risk of varhandles or lambda optimization and no callsite pollution 
issue, because it basically requires no optimized implementation in subclasses.
   
   The idea is the following: Basically for GroupVIntReading in the branchless 
optimization we need random access to the IndexInput. The workaround in the 
currently existing code is to have a lambda or VarHandle to do random access as 
a callback.
   
   But actually we have the correct interface available out of box: 
`RandomAcessInput`.
   
   I simply changed the `GroupVIntUtil.readGroupVInt()` method to check if the 
`DataInput` passed in also implements `RandomAccessInput` and allows seeking 
(`IndexInput`). If theres enough free space till end of file _and_ it 
implements both interfaces the optimized impl is used:
   - it casts the DataInput to RandomAccessInput for random access
   - it casts the DataInput to IndexInput to get access to filesize and file 
pointer and seeking to cleanup position after read.
   
   If any condition does not apply it falls back to the sequential impl with 
branches.
   
   The good thing with this patch;
   - No need to implement anything in custom IndexInputs.
   - Out of box huge speedup without any custom impl in IndexInput if it 
implements random access
   
   Small downsides:
   - NIOFSDir no longer has the optimization (I don't care) because its 
IndexInput does not support random access. You see this in the benchmark - same 
numbers!
   - ByteBuffersDirectory implement RandomAccess, but the code of 
ByteBuffersIndexInput has too many bounds checks in the positional accesses. 
IMHO, we should for now ignore that and rewrite ByteBuffersDirectory in main 
branch to reuse MemorySegmentIndexInput and the try/catch logic. This allows us 
also to have off-heap directory with easy release of memory (the current 
ByteBuffersDirectory has the downside that offheap needs GC to clean up offheap 
memeory as we cannot free it explicit. So when we reimplement 
ByteBuffersDirectory using the same code like MemorySegmentIndexInput just with 
MemorySegments from on-heap or shared off heap arenas we have the great 
performance of MMapDir also for NRTCcachingDirectory - I will open a PR for 
main branch to get this as a new Directoryimpl into core.
   
   Now the numbers:
   ```
   # JMH version: 1.37
   # VM version: JDK 24, OpenJDK 64-Bit Server VM, 24+36-3646
   # VM invoker: C:\Program Files\Java\jdk-24\bin\java.exe
   # VM options: --add-modules=jdk.unsupported
   # Blackhole mode: compiler (auto-detected, use 
-Djmh.blackhole.autoDetect=false to disable)
   # Warmup: 3 iterations, 5 s each
   # Measurement: 5 iterations, 20 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Throughput, ops/time
   # Parameters: (size = 64)
   
   Benchmark                                                            (size)  
 Mode  Cnt  Score   Error   Units
   GroupVIntBenchmark.benchByteArrayDataInput_readGroupVInt                 64  
thrpt    5  2,154 ± 0,356  ops/us
   GroupVIntBenchmark.benchByteArrayDataInput_readVInt                      64  
thrpt    5  1,943 ± 0,170  ops/us
   GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVInt              64  
thrpt    5  0,612 ± 0,024  ops/us
   GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVIntBaseline      64  
thrpt    5  0,763 ± 0,044  ops/us
   GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVInt                64  
thrpt    5  4,983 ± 0,362  ops/us
   GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVIntBaseline        64  
thrpt    5  1,993 ± 0,455  ops/us
   GroupVIntBenchmark.benchMMapDirectoryInputs_readVInt                     64  
thrpt    5  2,544 ± 0,115  ops/us
   GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVInt               64  
thrpt    5  2,461 ± 0,317  ops/us
   GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVIntBaseline       64  
thrpt    5  2,027 ± 0,422  ops/us
   GroupVIntBenchmark.bench_writeGroupVInt                                  64  
thrpt    5  1,347 ± 0,138  ops/us
   ```
   
   Some final TODOs: 
   - DataInput is now free of any GroupVint specialization or method siganures. 
When backporting to 10.x we must keep the old methods as noop for people who 
have subclassed and implemented (OpenSearch does this for their encryption 
directory).
   - DataOutput still has write methods for both GroupVInt variants (long/int). 
We should reallay really remove that and implement it only in GroupVIntUtil!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Rewrite of the GroupVInt optimization without lamdas, varhandles and no code in subclasses [lucene]

Reply via email to