[jira] [Updated] (LUCENE-10676) FieldInfo#name contributes significantly to heap usage at scale

Armin Braun (Jira) Mon, 08 Aug 2022 04:24:08 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Armin Braun updated LUCENE-10676:
---------------------------------
    Attachment: image-2022-08-08-13-23-37-050.png

> FieldInfo#name contributes significantly to heap usage at scale
> ---------------------------------------------------------------
>
>                 Key: LUCENE-10676
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10676
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 9.3
>         Environment: Seen in Lucene 9.3.0 running on Linux using JDK18 but 
> seems independent of environment.
>            Reporter: David Turner
>            Priority: Minor
>              Labels: heap, scalability
>         Attachments: image-2022-08-08-13-23-37-050.png
>
>
> We encountered an Elasticsearch user with high heap usage, a significant 
> proportion of which was down to the contents of `FieldInfo#name`.
> This user was certainly pushing some scalability boundaries: this single 
> process had thousands of active Lucene indices, many with 10k+ fields, and 
> many indices had hundreds of segments due to an excess of flushes, so in 
> total they had an enormous number of `FieldInfo` instances. Still, the bulk 
> of the heap usage was just field names, and the total number of distinct 
> field names was fairly small. That's pretty common, especially for time-based 
> data like logs. Some kind of interning or deduplication of these strings 
> would have reduced their heap usage by many GBs.
> Is there a way we could deduplicate these strings? Deduplicating them across 
> segments within each index would already have helped, but ideally we'd like 
> to deduplicate them across indices too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10676) FieldInfo#name contributes significantly to heap usage at scale

Reply via email to