[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale

Armin Braun (Jira) Tue, 09 Aug 2022 08:30:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577484#comment-17577484
 ]


Armin Braun commented on LUCENE-10677:
--------------------------------------

[~dweiss] maybe an alternative solution could be to promote known/common 
attributes to concrete fields in `FieldInfo` instead of using generic strings 
in a map maybe? That would save memory on the attribute keys and not run the 
risk of becoming stale?

> Duplicate strings in FieldInfo#attributes contribute significantly to heap 
> usage at scale
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10677
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10677
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 9.3
>            Reporter: Armin Braun
>            Priority: Minor
>              Labels: heap, scalability
>         Attachments: lucene_duplicate_fields.png
>
>
> This has the same origin as issue LUCENE-10676 . Running a single process 
> with thousands of fields across many indexes will lead to a lot of duplicate 
> strings retained as keys and values in the `attributes` map. This can amount 
> to GBs of heap for thousands of fields across a few thousand segments. The 
> strings in the below heap dump analysis account for more than half  (roughly 
> 2/3 and the field names are somewhat unusually long in this example) the 
> duplicate strings from `FieldInfo` instances.
> If we could deduplicate theses obvious known strings when reading `FieldInfo` 
> we could save GBs of heap for use cases like this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale

Reply via email to