uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1421414875
########## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/GroupVIntBenchmark.java: ########## @@ -140,10 +155,12 @@ public void init() throws Exception { } initByteBufferInput(docs); initArrayInput(docs); + initNioInput(docs); + initByteBuffersInput(docs); } @Benchmark - public void byteBufferReadVInt(Blackhole bh) throws IOException { + public void mmap_byteBufferReadVInt(Blackhole bh) throws IOException { Review Comment: I would rename all those benchmark methods to have better names including the exact implementation's class name. Remove internal details like ByteBuffer vs. something else, it should just like: `benchMMapDirectoryInputs`, `benchBufferedIndexInput`, `benchByteBuffersIndexInput`,... ########## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ########## @@ -324,24 +324,9 @@ private void readGroupVInt(long[] dst, int offset) throws IOException { try { final int flag = curSegment.get(LAYOUT_BYTE, curPosition++) & 0xFF; - - final int n1Minus1 = flag >> 6; - final int n2Minus1 = (flag >> 4) & 0x03; - final int n3Minus1 = (flag >> 2) & 0x03; - final int n4Minus1 = flag & 0x03; - - dst[offset] = - curSegment.get(LAYOUT_LE_INT, curPosition) & GroupVIntUtil.GROUP_VINT_MASKS[n1Minus1]; - curPosition += 1 + n1Minus1; - dst[offset + 1] = - curSegment.get(LAYOUT_LE_INT, curPosition) & GroupVIntUtil.GROUP_VINT_MASKS[n2Minus1]; - curPosition += 1 + n2Minus1; - dst[offset + 2] = - curSegment.get(LAYOUT_LE_INT, curPosition) & GroupVIntUtil.GROUP_VINT_MASKS[n3Minus1]; - curPosition += 1 + n3Minus1; - dst[offset + 3] = - curSegment.get(LAYOUT_LE_INT, curPosition) & GroupVIntUtil.GROUP_VINT_MASKS[n4Minus1]; - curPosition += 1 + n4Minus1; + curPosition += + GroupVIntUtil.readGroupVInt( + flag, p -> curSegment.get(LAYOUT_LE_INT, p), curPosition, dst, offset); Review Comment: Sorry, I missed the `curPosition++`. It also have some problems because it does not wait till after the method has executed until it incremenets, so in the case of Exceptions the curPosition is wrong afterwards (as it won't be rolled back). If you look at `readByte()`, it increments after the method call, which is different (this is tricky, i know, but there's a reason why it is done like that). But to make it simple: Please simply call this.readByte() here, no need to inline the call. ########## lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java: ########## @@ -149,6 +150,29 @@ public final int readInt() throws IOException { } } + @Override Review Comment: Hi, would it not be a good idea now to remove the private method and put all of the logic into the static utility method? Basically we need to pass "this" to the method, so it is able to do `readVInt()` and also `readByte()` to get the flag. It is a bit complicated to describe, if you like I can create a patch (and possbly also push it to your repository). At end, `readGroupVInts(long[] dst, int limit)` should more or less only call the static helper utility method and pass the `IntReader` functional interface. The method should return the number of bytes read using direct access (using the `IntReader`) not the ones read by the normal `read*` methods invoked on `this`. The caller could then increment the file position after it returns. MMapDircetory inputs would wrap the whole method using `try...catch`. This would make it easy to implement it for all directories: - ByteBufferIndexInput (without "s"), used by MMapDirectory in Java<19 - MemorySegmentIndexInput - ByteBuffersIndexInput - BufferedIndexInput Then I would agree to this patch: Directory implementations can choose how to implement it but without needing to know what group vints are. This would also simplify testing (we won't need too many tests as all implementations just can the same method and give (optionally) the lambda to allow random access. Of course the should be a helper static method giving the MAX size of encoded groupVInt based on limit, so code can check before, if remaing size of buffer/segment/array is large enough to work on, so it could fallback to default impl (call super `readGroupVInts`) if needed. I would not care about the fact that for larger limits the blocks must be larger, but for `MMapDirectory` it is very unlikely in production that you hit the boundary (the boundaries are at 16 GiB for Java 19+ and 1 GiB for Java<19). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org