[GitHub] [lucene] tang-hi commented on a diff in pull request #966: LUCENE-10619: Optimize the writeBytes in TermsHashPerField

GitBox Sat, 18 Jun 2022 19:56:34 -0700


tang-hi commented on code in PR #966:
URL: https://github.com/apache/lucene/pull/966#discussion_r901039371



##########
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java:
##########
@@ -230,9 +230,29 @@ final void writeByte(int stream, byte b) {
   }
 
   final void writeBytes(int stream, byte[] b, int offset, int len) {
-    // TODO: optimize
     final int end = offset + len;
-    for (int i = offset; i < end; i++) writeByte(stream, b[i]);
+    int streamAddress = streamAddressOffset + stream;
+    int upto = termStreamAddressBuffer[streamAddress];
+    byte[] slice = bytePool.buffers[upto >> ByteBlockPool.BYTE_BLOCK_SHIFT];
+    assert slice != null;
+    int sliceOffset = upto & ByteBlockPool.BYTE_BLOCK_MASK;
+
+    while (slice[sliceOffset] == 0 && offset < end) {
+      slice[sliceOffset++] = b[offset++];
+      (termStreamAddressBuffer[streamAddress])++;
+    }
+
+    while (offset < end) {
+      int offsetAndLength = bytePool.allocKnownSizeSlice(slice, sliceOffset);
+      sliceOffset = offsetAndLength >> 8;
+      int sliceLength = offsetAndLength & 0xff;
+      slice = bytePool.buffer;
+      int writeLength = Math.min(sliceLength - 1, end - offset);
+      System.arraycopy(b, offset, slice, sliceOffset, writeLength);
+      sliceOffset += writeLength;
+      offset += writeLength;
+      termStreamAddressBuffer[streamAddress] = sliceOffset + 
bytePool.byteOffset;
+    }

Review Comment:
   > Wow, this is an interesting change! Can you help me better understand this 
optimization?
   > 
   > Is `System.arraycopy()` much faster than manually copying like in your 
while loop above. I would guess it is similar since there are no extra checks 
for primitive or object type [[ref: fast array copy 
comparison](https://www.javaspecialists.eu/archive/Issue124-Copying-Arrays-Fast.html)],
 but I'm not sure. This is probably better than calling `writeByte()` in a loop 
because we don't need to fetch all the pointers every time.
   > 
   > A follow up idea: I think `allocSlice()` (or `allocKnownSizeSlice()` 
here), always returns the next higher sized buffer slice.. Maybe 
`allocKnownSizeSlice` could instead return a slice appropriate for the length 
of pending items we need to write `(end - offset)`? We could then avoid having 
to call alloc multiple times for larger writes.
   
   thanks for your comment
   1. Is System.arraycopy() much faster than manually copying like in your 
while loop above?
       I think it's faster than  a manual for loop  to copy. Maybe you are 
interested in taking a look at this answer[Is it better to use 
System.arraycopy(...) than a for loop for copying 
arrays?](https://stackoverflow.com/a/49269866)
   2. Maybe allocKnownSizeSlice could instead return a slice appropriate for 
the length of pending items we need to write (end - offset)?
       In the begining, I also want to implement in this way.But I found the 
method `nextSlice` in `ByteSliceReader`  always read slice with size 5, 14, 20, 
30 ..., 200,200 which is same with `LEVEL_SIZE_ARRAY` in `ByteBlockPool`. So I 
have to allocate slice with next-level-size instead of `end - offset`,



##########
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java:
##########
@@ -230,9 +230,29 @@ final void writeByte(int stream, byte b) {
   }
 
   final void writeBytes(int stream, byte[] b, int offset, int len) {
-    // TODO: optimize
     final int end = offset + len;
-    for (int i = offset; i < end; i++) writeByte(stream, b[i]);
+    int streamAddress = streamAddressOffset + stream;
+    int upto = termStreamAddressBuffer[streamAddress];
+    byte[] slice = bytePool.buffers[upto >> ByteBlockPool.BYTE_BLOCK_SHIFT];
+    assert slice != null;
+    int sliceOffset = upto & ByteBlockPool.BYTE_BLOCK_MASK;
+
+    while (slice[sliceOffset] == 0 && offset < end) {
+      slice[sliceOffset++] = b[offset++];
+      (termStreamAddressBuffer[streamAddress])++;
+    }
+
+    while (offset < end) {
+      int offsetAndLength = bytePool.allocKnownSizeSlice(slice, sliceOffset);
+      sliceOffset = offsetAndLength >> 8;
+      int sliceLength = offsetAndLength & 0xff;
+      slice = bytePool.buffer;
+      int writeLength = Math.min(sliceLength - 1, end - offset);
+      System.arraycopy(b, offset, slice, sliceOffset, writeLength);
+      sliceOffset += writeLength;
+      offset += writeLength;
+      termStreamAddressBuffer[streamAddress] = sliceOffset + 
bytePool.byteOffset;
+    }

Review Comment:
   thanks for your comment
   1. Is System.arraycopy() much faster than manually copying like in your 
while loop above?
       I think it's faster than  a manual for loop  to copy. Maybe you are 
interested in taking a look at this answer[Is it better to use 
System.arraycopy(...) than a for loop for copying 
arrays?](https://stackoverflow.com/a/49269866)
   2. Maybe allocKnownSizeSlice could instead return a slice appropriate for 
the length of pending items we need to write (end - offset)?
       In the begining, I also want to implement in this way.But I found the 
method `nextSlice` in `ByteSliceReader`  always read slice with size 5, 14, 20, 
30 ..., 200,200 which is same with `LEVEL_SIZE_ARRAY` in `ByteBlockPool`. So I 
have to allocate slice with next-level-size instead of `end - offset`,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] tang-hi commented on a diff in pull request #966: LUCENE-10619: Optimize the writeBytes in TermsHashPerField

Reply via email to