[ https://issues.apache.org/jira/browse/LUCENE-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand resolved LUCENE-10536. ----------------------------------- Fix Version/s: 9.2 Resolution: Fixed > Doc values terms dicts should use the first term of each block as a dictionary > ------------------------------------------------------------------------------ > > Key: LUCENE-10536 > URL: https://issues.apache.org/jira/browse/LUCENE-10536 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: 9.2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Doc values terms dictionaries split data into blocks of 64 terms, where the > first term is written uncompressed (which is useful for binary searches), and > the 63 other terms are encoded by taking the difference with the previous > term and compressing all suffixes together with LZ4. > With this format, the suffix of the second term is also unlikely to benefit > from any compression, since it doesn't have data to search for duplicate > bytes into besides itself. A minor improvement we could make would consist of > using the first term as a dictionary for suffixes of terms 2..64. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org