Armin Braun created LUCENE-10677: ------------------------------------ Summary: Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale Key: LUCENE-10677 URL: https://issues.apache.org/jira/browse/LUCENE-10677 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 9.3 Reporter: Armin Braun Attachments: lucene_duplicate_fields.png
This has the same origin as issue LUCENE-10676 . Running a single process with thousands of fields across many indexes will lead to a lot of duplicate strings retained as keys and values in the `attributes` map. This can amount to GBs of heap for thousands of fields across a few thousand segments. The strings in the below heap dump analysis account for more than half (roughly 2/3 and the field names are somewhat unusually long in this example) the duplicate strings from `FieldInfo` instances. If we could deduplicate theses obvious known strings when reading `FieldInfo` we could save GBs of heap for use cases like this. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org