[GitHub] [lucene] luyuncheng commented on a diff in pull request #987: LUCENE-10627: Using CompositeByteBuf to Reduce Memory Copy

GitBox Mon, 18 Jul 2022 01:44:50 -0700


luyuncheng commented on code in PR #987:
URL: https://github.com/apache/lucene/pull/987#discussion_r923100057



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/DeflateWithPresetDictCompressionMode.java:
##########
@@ -163,12 +165,16 @@ private static class DeflateWithPresetDictCompressor 
extends Compressor {
     final Deflater compressor;
     final BugfixDeflater_JDK8252739 deflaterBugfix;
     byte[] compressed;
+    byte[] bufferDict;
+    byte[] bufferBlock;
     boolean closed;
 
     DeflateWithPresetDictCompressor(int level) {
       compressor = new Deflater(level, true);
       deflaterBugfix = BugfixDeflater_JDK8252739.createBugfix(compressor);
       compressed = new byte[64];
+      bufferDict = BytesRef.EMPTY_BYTES;
+      bufferBlock = BytesRef.EMPTY_BYTES;
     }
 
     private void doCompress(byte[] bytes, int off, int len, DataOutput out) 
throws IOException {

Review Comment:
   ++ ++，at 
[commits(ebcbd)](https://github.com/apache/lucene/blob/ebcbdd05fbf114ca15d04cd4a9172eb51fab9a1a/lucene/core/src/java/org/apache/lucene/codecs/lucene90/DeflateWithPresetDictCompressionMode.java)
 i removed it



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/DeflateWithPresetDictCompressionMode.java:
##########
@@ -199,23 +205,65 @@ private void doCompress(byte[] bytes, int off, int len, 
DataOutput out) throws I
       out.writeBytes(compressed, totalCount);
     }
 
+    private void doCompress(ByteBuffer bytes, int len, DataOutput out) throws 
IOException {
+      if (len == 0) {
+        out.writeVInt(0);
+        return;
+      }
+      compressor.setInput(bytes);
+      compressor.finish();
+      if (compressor.needsInput()) {
+        throw new IllegalStateException();
+      }
+
+      int totalCount = 0;
+      for (; ; ) {
+        final int count =
+            compressor.deflate(compressed, totalCount, compressed.length - 
totalCount);
+        totalCount += count;
+        assert totalCount <= compressed.length;
+        if (compressor.finished()) {
+          break;
+        } else {
+          compressed = ArrayUtil.grow(compressed);
+        }
+      }
+
+      out.writeVInt(totalCount);
+      out.writeBytes(compressed, totalCount);
+    }
+
     @Override
-    public void compress(byte[] bytes, int off, int len, DataOutput out) 
throws IOException {
+    public void compress(ByteBuffersDataInput buffersInput, DataOutput out) 
throws IOException {
+      final int len = (int) (buffersInput.size() - buffersInput.position());
+      final int end = (int) buffersInput.size();
       final int dictLength = len / (NUM_SUB_BLOCKS * DICT_SIZE_FACTOR);
       final int blockLength = (len - dictLength + NUM_SUB_BLOCKS - 1) / 
NUM_SUB_BLOCKS;
       out.writeVInt(dictLength);
       out.writeVInt(blockLength);
-      final int end = off + len;
 
       // Compress the dictionary first
       compressor.reset();
-      doCompress(bytes, off, dictLength, out);
+      bufferDict = ArrayUtil.growNoCopy(bufferDict, dictLength);
+      buffersInput.readBytes(bufferDict, 0, dictLength);
+      doCompress(bufferDict, 0, dictLength, out);
 
       // And then sub blocks
-      for (int start = off + dictLength; start < end; start += blockLength) {
+      for (int start = dictLength; start < end; start += blockLength) {
         compressor.reset();
-        deflaterBugfix.setDictionary(bytes, off, dictLength);
-        doCompress(bytes, start, Math.min(blockLength, off + len - start), 
out);
+        deflaterBugfix.setDictionary(bufferDict, 0, dictLength);
+        int l = Math.min(blockLength, len - start);
+        // if [start,start + len] stay in one ByteBuffer, we can ignore memory 
copy
+        // otherwise need to copy bytes into on continuous byte array
+        ByteBuffer bb = buffersInput.sliceOne(start, l);

Review Comment:
   Nice suggestion, at 
[commits(ebcbd...)](https://github.com/apache/lucene/blob/ebcbdd05fbf114ca15d04cd4a9172eb51fab9a1a/lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataInput.java#L175)
 i moved this logic into `ByteBuffersDataInput#readBytes`, in the method 
`readBytes`, it would determine whether use internal buffers or create a new 
buffer



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java:
##########
@@ -519,7 +518,13 @@ private void 
copyOneDoc(Lucene90CompressingStoredFieldsReader reader, int docID)
     assert reader.getVersion() == VERSION_CURRENT;
     SerializedDocument doc = reader.document(docID);
     startDocument();
-    bufferedDocs.copyBytes(doc.in, doc.length);
+
+    if (doc.in instanceof ByteArrayDataInput) {
+      // reuse ByteArrayDataInput to reduce memory copy
+      bufferedDocs.copyBytes((ByteArrayDataInput) doc.in, doc.length);
+    } else {
+      bufferedDocs.copyBytes(doc.in, doc.length);
+    }

Review Comment:
   LGTM, i deleted this code at this PR, and i would open a new issue for 
reduce memory copy in copyOneDoc



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java:
##########
@@ -247,21 +249,18 @@ private void flush(boolean force) throws IOException {
     writeHeader(docBase, numBufferedDocs, numStoredFields, lengths, sliced, 
dirtyChunk);
 
     // compress stored fields to fieldsStream.
-    //
-    // TODO: do we need to slice it since we already have the slices in the 
buffer? Perhaps
-    // we should use max-block-bits restriction on the buffer itself, then we 
won't have to check it
-    // here.
-    byte[] content = bufferedDocs.toArrayCopy();
-    bufferedDocs.reset();
-
     if (sliced) {
-      // big chunk, slice it
-      for (int compressed = 0; compressed < content.length; compressed += 
chunkSize) {
-        compressor.compress(
-            content, compressed, Math.min(chunkSize, content.length - 
compressed), fieldsStream);
+      // big chunk, slice it, using ByteBuffersDataInput ignore memory copy
+      ByteBuffersDataInput bytebuffers = bufferedDocs.toDataInput();
+      final int capacity = (int) bytebuffers.size();
+      for (int compressed = 0; compressed < capacity; compressed += chunkSize) 
{
+        int l = Math.min(chunkSize, capacity - compressed);
+        ByteBuffersDataInput bbdi = bytebuffers.slice(compressed, l);
+        compressor.compress(bbdi, fieldsStream);
       }
     } else {
-      compressor.compress(content, 0, content.length, fieldsStream);
+      ByteBuffersDataInput bytebuffers = bufferedDocs.toDataInput();

Review Comment:
   ++，at 
[commits(ebcbd...)](https://github.com/apache/lucene/blob/ebcbdd05fbf114ca15d04cd4a9172eb51fab9a1a/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L249)
 i fixed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] luyuncheng commented on a diff in pull request #987: LUCENE-10627: Using CompositeByteBuf to Reduce Memory Copy

Reply via email to