Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

via GitHub Tue, 01 Apr 2025 23:16:33 -0700


jainankitk commented on code in PR #14397:
URL: https://github.com/apache/lucene/pull/14397#discussion_r2021750116



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/LZ4WithPresetDictCompressionMode.java:
##########
@@ -98,12 +98,17 @@ public void decompress(DataInput in, int originalLength, 
int offset, int length,
       final int blockLength = in.readVInt();
 
       final int numBlocks = readCompressedLengths(in, originalLength, 
dictLength, blockLength);
-
-      buffer = ArrayUtil.growNoCopy(buffer, dictLength + blockLength);
       bytes.length = 0;
-      // Read the dictionary
-      if (LZ4.decompress(in, dictLength, buffer, 0) != dictLength) {
-        throw new CorruptIndexException("Illegal dict length", in);
+      if (reused) {
+        assert buffer.length >= dictLength + blockLength;
+        in.skipBytes(compressedLengths[0]);
+      } else {
+        // Read the dictionary
+        buffer = ArrayUtil.growNoCopy(buffer, dictLength + blockLength);
+        if (LZ4.decompress(in, dictLength, buffer, 0) != dictLength) {
+          throw new CorruptIndexException("Illegal dict length", in);
+        }
+        reused = true;

Review Comment:
   I am wondering if we should consider exposing metric on how many times we 
could reuse, and how many times had to read from the disk? That would provide 
some useful insights on the usefulness of this change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

Reply via email to