zhaih commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1720704189
Actually I just tried it myself and this will always reproduce the error: ``` actual.seekExact(0); actual.seekCeil(new BytesRef("")); for (int i = 0; i < TERMS_DICT_BLOCK_LZ4_SIZE; i++) { actual.next(); } ``` What happened is really tricky: First, when the bug happens, the `ord` must be 0, otherwise as I previously said everything will be nicely reset by `seekExact(0)`. So only when `ord == 0`, when we call a `seekCeil` with some small value the `ord` and `bytesRef` will be out of sync. However at this moment it's totally ok to call `next()` because the next ord is 1 (so we don't need to `decompressBlock`) and we have a `blockInput` that is caching the decompressed bytes when the last time we call `decompressBlock` (remember the `ord` must be 0, so before the `seekCeil`, someone must have already called `decompressBlock` in some way, either `next` or `seekXX`) The disaster happens when we keep calling `next()` until we have to decompress the next block, as we're previously always using cached `blockInput` so `bytes` kept untouched, and as @epotyom described we read block length and thought it was term length and overfilled the term buffer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org