[ https://issues.apache.org/jira/browse/LUCENE-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403818#comment-17403818 ]
Adrien Grand commented on LUCENE-9917: -------------------------------------- There might be a question of whether we should just revert to the previous approach. One reason why I like keeping using shared dictionaries is because it works extremely well on highly redundant data. For instance here are the results on 1M documents that store a mix of Nginx logs with verbose metadata about the host that produced these logs: || Codec || Index size (MB) || Index time (s) || Avg retrieval time (µs) || | Lucene90 (main) | 86 | 10 | 19 | | Lucene86 | 304 | 9 | 9 | | Lucene90 (patch) | 184 | 10 | 7 | > Reduce block size for BEST_SPEED > -------------------------------- > > Key: LUCENE-9917 > URL: https://issues.apache.org/jira/browse/LUCENE-9917 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > As benchmarks suggested major savings and minor slowdowns with larger block > sizes, I had increased them on LUCENE-9486. However it looks like this > slowdown is still problematic for some users, so I plan to go back to a > smaller block size, something like 10*16kB to get closer to the amount of > data we had to decompress per document when we had 16kB blocks without shared > dictionaries. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org