[ https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577468#comment-17577468 ]
Dawid Weiss commented on LUCENE-10677: -------------------------------------- String.intern is evil for many reasons and your use case is indeed, ahem, atypical. I don't think adding "a few known strings" is an elegant solution since hacks like this one tend to become stale quickly... You could try the JVM's UseStringDeduplication option - an ugly workaround but easy one - but I think you'll run into other problems soon enough with this number of concurrent indices/segments/fields. If you have to live with this then it's likely that you'll have to follow Rob's advice sooner or later. > Duplicate strings in FieldInfo#attributes contribute significantly to heap > usage at scale > ----------------------------------------------------------------------------------------- > > Key: LUCENE-10677 > URL: https://issues.apache.org/jira/browse/LUCENE-10677 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs > Affects Versions: 9.3 > Reporter: Armin Braun > Priority: Minor > Labels: heap, scalability > Attachments: lucene_duplicate_fields.png > > > This has the same origin as issue LUCENE-10676 . Running a single process > with thousands of fields across many indexes will lead to a lot of duplicate > strings retained as keys and values in the `attributes` map. This can amount > to GBs of heap for thousands of fields across a few thousand segments. The > strings in the below heap dump analysis account for more than half (roughly > 2/3 and the field names are somewhat unusually long in this example) the > duplicate strings from `FieldInfo` instances. > If we could deduplicate theses obvious known strings when reading `FieldInfo` > we could save GBs of heap for use cases like this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org