[ 
https://issues.apache.org/jira/browse/LUCENE-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10536.
-----------------------------------
    Fix Version/s: 9.2
       Resolution: Fixed

> Doc values terms dicts should use the first term of each block as a dictionary
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-10536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10536
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 9.2
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Doc values terms dictionaries split data into blocks of 64 terms, where the 
> first term is written uncompressed (which is useful for binary searches), and 
> the 63 other terms are encoded by taking the difference with the previous 
> term and compressing all suffixes together with LZ4.
> With this format, the suffix of the second term is also unlikely to benefit 
> from any compression, since it doesn't have data to search for duplicate 
> bytes into besides itself. A minor improvement we could make would consist of 
> using the first term as a dictionary for suffixes of terms 2..64.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to