uschindler opened a new pull request, #15116: URL: https://github.com/apache/lucene/pull/15116
Hi, this is a rewrite of the GroupVInt optimization. The results are great for MMapDirectory (more than twice as fast than the baseline implementation. The code has no risk of varhandles or lambda optimization and no callsite pollution issue, because it basically requires no optimized implementation in subclasses. The idea is the following: Basically for GroupVIntReading in the branchless optimization we need random access to the IndexInput. The workaround in the currently existing code is to have a lambda or VarHandle to do random access as a callback. But actually we have the correct interface available out of box: `RandomAcessInput`. I simply changed the `GroupVIntUtil.readGroupVInt()` method to check if the `DataInput` passed in also implements `RandomAccessInput` and allows seeking (`IndexInput`). If theres enough free space till end of file _and_ it implements both interfaces the optimized impl is used: - it casts the DataInput to RandomAccessInput for random access - it casts the DataInput to IndexInput to get access to filesize and file pointer and seeking to cleanup position after read. If any condition does not apply it falls back to the sequential impl with branches. The good thing with this patch; - No need to implement anything in custom IndexInputs. - Out of box huge speedup without any custom impl in IndexInput if it implements random access Small downsides: - NIOFSDir no longer has the optimization (I don't care) because its IndexInput does not support random access. You see this in the benchmark - same numbers! - ByteBuffersDirectory implement RandomAccess, but the code of ByteBuffersIndexInput has too many bounds checks in the positional accesses. IMHO, we should for now ignore that and rewrite ByteBuffersDirectory in main branch to reuse MemorySegmentIndexInput and the try/catch logic. This allows us also to have off-heap directory with easy release of memory (the current ByteBuffersDirectory has the downside that offheap needs GC to clean up offheap memeory as we cannot free it explicit. So when we reimplement ByteBuffersDirectory using the same code like MemorySegmentIndexInput just with MemorySegments from on-heap or shared off heap arenas we have the great performance of MMapDir also for NRTCcachingDirectory - I will open a PR for main branch to get this as a new Directoryimpl into core. Now the numbers: ``` # JMH version: 1.37 # VM version: JDK 24, OpenJDK 64-Bit Server VM, 24+36-3646 # VM invoker: C:\Program Files\Java\jdk-24\bin\java.exe # VM options: --add-modules=jdk.unsupported # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 3 iterations, 5 s each # Measurement: 5 iterations, 20 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Parameters: (size = 64) Benchmark (size) Mode Cnt Score Error Units GroupVIntBenchmark.benchByteArrayDataInput_readGroupVInt 64 thrpt 5 2,154 ± 0,356 ops/us GroupVIntBenchmark.benchByteArrayDataInput_readVInt 64 thrpt 5 1,943 ± 0,170 ops/us GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVInt 64 thrpt 5 0,612 ± 0,024 ops/us GroupVIntBenchmark.benchByteBuffersIndexInput_readGroupVIntBaseline 64 thrpt 5 0,763 ± 0,044 ops/us GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVInt 64 thrpt 5 4,983 ± 0,362 ops/us GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVIntBaseline 64 thrpt 5 1,993 ± 0,455 ops/us GroupVIntBenchmark.benchMMapDirectoryInputs_readVInt 64 thrpt 5 2,544 ± 0,115 ops/us GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVInt 64 thrpt 5 2,461 ± 0,317 ops/us GroupVIntBenchmark.benchNIOFSDirectoryInputs_readGroupVIntBaseline 64 thrpt 5 2,027 ± 0,422 ops/us GroupVIntBenchmark.bench_writeGroupVInt 64 thrpt 5 1,347 ± 0,138 ops/us ``` Some final TODOs: - DataInput is now free of any GroupVint specialization or method siganures. When backporting to 10.x we must keep the old methods as noop for people who have subclassed and implemented (OpenSearch does this for their encryption directory). - DataOutput still has write methods for both GroupVInt variants (long/int). We should reallay really remove that and implement it only in GroupVIntUtil! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org