[ 
https://issues.apache.org/jira/browse/LUCENE-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403818#comment-17403818
 ] 

Adrien Grand commented on LUCENE-9917:
--------------------------------------

There might be a question of whether we should just revert to the previous 
approach. One reason why I like keeping using shared dictionaries is because it 
works extremely well on highly redundant data. For instance here are the 
results on 1M documents that store a mix of Nginx logs with verbose metadata 
about the host that produced these logs:

|| Codec || Index size (MB) || Index time (s) || Avg retrieval time (µs) ||
| Lucene90 (main) | 86 | 10 | 19 |
| Lucene86 | 304 | 9 | 9 |
| Lucene90 (patch) | 184 | 10 | 7 |

> Reduce block size for BEST_SPEED
> --------------------------------
>
>                 Key: LUCENE-9917
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9917
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As benchmarks suggested major savings and minor slowdowns with larger block 
> sizes, I had increased them on LUCENE-9486. However it looks like this 
> slowdown is still problematic for some users, so I plan to go back to a 
> smaller block size, something like 10*16kB to get closer to the amount of 
> data we had to decompress per document when we had 16kB blocks without shared 
> dictionaries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to