uschindler commented on code in PR #12841:
URL: https://github.com/apache/lucene/pull/12841#discussion_r1421414875

##########
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/GroupVIntBenchmark.java:
##########
@@ -140,10 +155,12 @@ public void init() throws Exception {
     }
     initByteBufferInput(docs);
     initArrayInput(docs);
+    initNioInput(docs);
+    initByteBuffersInput(docs);
   }
 
   @Benchmark
-  public void byteBufferReadVInt(Blackhole bh) throws IOException {
+  public void mmap_byteBufferReadVInt(Blackhole bh) throws IOException {

Review Comment:
   I would rename all those benchmark methods to have better names including 
the exact implementation's class name. Remove internal details like ByteBuffer 
vs. something else, it should just like: `benchMMapDirectoryInputs`, 
`benchBufferedIndexInput`, `benchByteBuffersIndexInput`,...



##########
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java:
##########
@@ -324,24 +324,9 @@ private void readGroupVInt(long[] dst, int offset) throws 
IOException {
 
     try {
       final int flag = curSegment.get(LAYOUT_BYTE, curPosition++) & 0xFF;
-
-      final int n1Minus1 = flag >> 6;
-      final int n2Minus1 = (flag >> 4) & 0x03;
-      final int n3Minus1 = (flag >> 2) & 0x03;
-      final int n4Minus1 = flag & 0x03;
-
-      dst[offset] =
-          curSegment.get(LAYOUT_LE_INT, curPosition) & 
GroupVIntUtil.GROUP_VINT_MASKS[n1Minus1];
-      curPosition += 1 + n1Minus1;
-      dst[offset + 1] =
-          curSegment.get(LAYOUT_LE_INT, curPosition) & 
GroupVIntUtil.GROUP_VINT_MASKS[n2Minus1];
-      curPosition += 1 + n2Minus1;
-      dst[offset + 2] =
-          curSegment.get(LAYOUT_LE_INT, curPosition) & 
GroupVIntUtil.GROUP_VINT_MASKS[n3Minus1];
-      curPosition += 1 + n3Minus1;
-      dst[offset + 3] =
-          curSegment.get(LAYOUT_LE_INT, curPosition) & 
GroupVIntUtil.GROUP_VINT_MASKS[n4Minus1];
-      curPosition += 1 + n4Minus1;
+      curPosition +=
+          GroupVIntUtil.readGroupVInt(
+              flag, p -> curSegment.get(LAYOUT_LE_INT, p), curPosition, dst, 
offset);

Review Comment:
   Sorry, I missed the `curPosition++`. It also have some problems because it 
does not wait till after the method has executed until it incremenets, so in 
the case of Exceptions the curPosition is wrong afterwards (as it won't be 
rolled back).
   If you look at `readByte()`, it increments after the method call, which is 
different (this is tricky, i know, but there's a reason why it is done like 
that).
   
   But to make it simple: Please simply call this.readByte() here, no need to 
inline the call.



##########
lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java:
##########
@@ -149,6 +150,29 @@ public final int readInt() throws IOException {
     }
   }
 
+  @Override

Review Comment:
   Hi, would it not be a good idea now to remove the private method and put all 
of the logic into the static utility method? Basically we need to pass "this" 
to the method, so it is able to do `readVInt()` and also `readByte()` to get 
the flag.
   
   It is a bit complicated to describe, if you like I can create a patch (and 
possbly also push it to your repository). At end, `readGroupVInts(long[] dst, 
int limit)` should more or less only call the static helper utility method and 
pass the `IntReader` functional interface. The method should return the number 
of bytes read using direct access (using the `IntReader`) not the ones read by 
the normal `read*` methods invoked on `this`. The caller could then increment 
the file position after it returns.
   
   MMapDircetory inputs would wrap the whole method using `try...catch`. This 
would make it easy to implement it for all directories:
   - ByteBufferIndexInput (without "s"), used by MMapDirectory in Java<19
   - MemorySegmentIndexInput
   - ByteBuffersIndexInput
   - BufferedIndexInput
   
   Then I would agree to this patch: Directory implementations can choose how 
to implement it but without needing to know what group vints are. This would 
also simplify testing (we won't need too many tests as all implementations just 
can the same method and give (optionally) the lambda to allow random access.
   
   Of course the should be a helper static method giving the MAX size of 
encoded groupVInt based on limit, so code can check before, if remaing size of 
buffer/segment/array is large enough to work on, so it could fallback to 
default impl (call super `readGroupVInts`) if needed. I would not care about 
the fact that for larger limits the blocks must be larger, but for 
`MMapDirectory` it is very unlikely in production that you hit the boundary 
(the boundaries are at 16 GiB for Java 19+ and 1 GiB for Java<19).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to