[ 
https://issues.apache.org/jira/browse/LUCENE-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293141#comment-17293141
 ] 

Michael McCandless commented on LUCENE-9816:
--------------------------------------------

{quote}[~mikemccand] This is due to how the algorithm looks for duplicates, it 
stores a large hash table that maps 4-bytes sequences to offsets in the input.
{quote}
+1, thanks for the explanation and musings about how we might further optimize 
it [~jpountz]!

> lazy-init LZ4-HC hashtable in blocktreewriter
> ---------------------------------------------
>
>                 Key: LUCENE-9816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9816
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Priority: Major
>             Fix For: master (9.0)
>
>         Attachments: LUCENE-9816.patch
>
>
> Based upon the data for a field, blocktree may compress with LZ4-HC (or with 
> simple lowercase compression or none at all).
> But we currently eagerly initialize HC hashtable (132k) for each field 
> regardless of whether it will be even "tried". This shows up as top cpu and 
> heap hotspot when profiling tests. It creates unnecessary overhead for small 
> flushes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to