zhaih commented on PR #12555:
URL: https://github.com/apache/lucene/pull/12555#issuecomment-1720704189

   Actually I just tried it myself and this will always reproduce the error:
   ```
           actual.seekExact(0);
           actual.seekCeil(new BytesRef(""));
           for (int i = 0; i < TERMS_DICT_BLOCK_LZ4_SIZE; i++) {
             actual.next();
           }
   ```
   What happened is really tricky:
   First, when the bug happens, the `ord` must be 0, otherwise as I previously 
said everything will be nicely reset by `seekExact(0)`. So only when `ord == 
0`, when we call a `seekCeil` with some small value the `ord` and `bytesRef` 
will be out of sync. 
   However at this moment it's totally ok to call `next()` because the next ord 
is 1 (so we don't need to `decompressBlock`) and we have a `blockInput` that is 
caching the decompressed bytes when the last time we call `decompressBlock` 
(remember the `ord` must be 0, so before the `seekCeil`, someone must have 
already called `decompressBlock` in some way, either `next` or `seekXX`)
   The disaster happens when we keep calling `next()` until we have to 
decompress the next block, as we're previously always using cached `blockInput` 
so `bytes` kept untouched, and as @epotyom described we read block length and 
thought it was term length and overfilled the term buffer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to