[
https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577440#comment-17577440
]
Armin Braun commented on LUCENE-10677:
--------------------------------------
[~dweiss] this happened on an Elasticsearch node that had ~150 indices that
were actively indexed into. Each of those had about 2k fields and many of them
ended up with ~100 segments which works out to about the number we're seeing
since the `FieldInfo` objects seem to be duplicated across segments.
Even though we're admittedly dealing with a somewhat excessive number of fields
here, it seems off that the strings from the attributes map are what's causing
the biggest issue here performance wise doesn't it?
> Duplicate strings in FieldInfo#attributes contribute significantly to heap
> usage at scale
> -----------------------------------------------------------------------------------------
>
> Key: LUCENE-10677
> URL: https://issues.apache.org/jira/browse/LUCENE-10677
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/codecs
> Affects Versions: 9.3
> Reporter: Armin Braun
> Priority: Minor
> Labels: heap, scalability
> Attachments: lucene_duplicate_fields.png
>
>
> This has the same origin as issue LUCENE-10676 . Running a single process
> with thousands of fields across many indexes will lead to a lot of duplicate
> strings retained as keys and values in the `attributes` map. This can amount
> to GBs of heap for thousands of fields across a few thousand segments. The
> strings in the below heap dump analysis account for more than half (roughly
> 2/3 and the field names are somewhat unusually long in this example) the
> duplicate strings from `FieldInfo` instances.
> If we could deduplicate theses obvious known strings when reading `FieldInfo`
> we could save GBs of heap for use cases like this.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]