[ 
https://issues.apache.org/jira/browse/LUCENE-9996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362960#comment-17362960
 ] 

Adrien Grand commented on LUCENE-9996:
--------------------------------------

Actually I didn't know/remember but IndexingChain is smart to reuse data 
structures across fields for the inverted index so that every new indexed field 
wouldn't actually increment memory usage by 32kB. Maybe we should do the same 
with doc values terms dictionaries?

As far as data structures that can't be shared across fields are concerned like 
doc-value ordinals and points bytes, maybe we could reduce their page sizes a 
little bit in order to be on the safer side. While playing with some data for 
this issue, I noticed that it would only take a couple hundred fields for a 
single document to take more than the default IW buffer size (16MB), in which 
case Lucene would flush new segments on every document.

> Can we improve DWPT's initial memory footprint?
> -----------------------------------------------
>
>                 Key: LUCENE-9996
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9996
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Say you are indexing only keyword fields, that are both indexed and have doc 
> values. The first document that gets added to a DWPT will increase memory 
> usage by about 80kB per field. This is due mostly to:
>  - the {{BytesRefHash}} for the inverted index, which allocates a 32kB page
>  - the {{BytesRefHash}} for the doc values terms dict, which allocates 
> another 32kB page
>  - the {{SortedDocValuesWriter#pending}} buffer that allocates a long[1024]: 
> 8kB
> So if you have 10 actively indexing indices that have 100 fields each and 24 
> indexing threads, this gives a total of 10*100*24*80kB = 1.8GB. If you 
> happened to give less than 1.8GB for your indexing buffers overall, Lucene 
> will likely do very small flushes that have only a few documents, which 
> in-turn will make indexing rather slow.
> Could we improve DWPT so that it more progressively reserves memory as more 
> documents get added?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to