Adrien Grand created LUCENE-10616:
-------------------------------------

             Summary: Moving to dictionaries has made stored fields slower at 
skipping
                 Key: LUCENE-10616
                 URL: https://issues.apache.org/jira/browse/LUCENE-10616
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Adrien Grand


[~ywelsch] has been digging into a regression of stored fields retrieval that 
is caused by LUCENE-9486.

Say your documents have two stored fields, one that is 100B and is stored 
first, and the other one that is 100kB, and you are only interested in the 
first one. While the idea behind blocks of stored fields is to store multiple 
documents in the same block to leverage redundancy across documents, sometimes 
documents are larger than the block size. As soon as documents are larger than 
2x the block size, our stored fields format splits such large documents into 
multiple blocks, so that you wouldn't need to decompress everything only to 
retrieve a couple small fields.

Before LUCENE-9486, BEST_SPEED had a block size of 16kB, so only retrieving the 
first field value would only need to decompress 16kB of data. With the move to 
preset dictionaries in LUCENE-9486 and then LUCENE-9917, we now have blocks of 
80kB, so stored fields would now need to decompress 80kB of data, 5x more than 
before.

With dictionaries, our blocks are now split into 10 sub blocks. We happen to 
eagerly decompress all sub blocks that intersect with the stored document, 
which is why we would decompress 80kB of data, but this is an implementation 
detail. It should be possible to decompress these sub blocks lazily so that we 
would only decompress those that intersect with one of the field values that 
the user is interested in retrieving?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to