Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

via GitHub Thu, 17 Apr 2025 06:49:32 -0700


kkewwei commented on code in PR #14397:
URL: https://github.com/apache/lucene/pull/14397#discussion_r2048995276



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java:
##########
@@ -512,6 +512,7 @@ private void doReset(int docID) throws IOException {
           bytes.offset = bytes.length = 0;
           for (int decompressed = 0; decompressed < totalLength; ) {
             final int toDecompress = Math.min(totalLength - decompressed, 
chunkSize);
+            decompressor.reset();
             decompressor.decompress(fieldsStream, toDecompress, 0, 
toDecompress, spare);

Review Comment:
   I tried but failed in just relying on outer `reuseIfPossible` to decide 
whether to cache PreSet Dict . In the follow case, outer must call the `reset` 
to clear the cache, we have two chunks:
   1. chunk0 [doc0(length>0)]
   2. chunk1[doc0(length=0), doc0(length=1)]
   
   Steps are as follow:
   1. Reading the chunk0/doc0, `reuseIfPossible`=false
   3. Reading the chunk1/doc0, `reuseIfPossible`=false. As length is 0, lucene 
will not read the `predict`, the PreSet Dict is not cached.
   4. Reading the chunk1/doc1. In the case, doc1 is in the chunk1, 
`reuseIfPossible`=true, but the PreSet Dict is not cached, lucene will throw 
exception.
   
   In the case, we should call `reset` in the step1.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

Reply via email to