stevenzwu commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2651667891
> Given partition summary field names and other snapshot properties are often not reused across different metadata, the interning causes more harm than good. @bryanck I didn't quite get the partition summary field names. were you referring to `PartitionFieldSummaryParser`? it seems to have just 4 field names. String.intern can be helpful for some use cases while harmful for some (like the one you encountered). Disabling interning seems to be a safer option considering diverse scenarios that the code can be used (like REST catalog server). > hash collision check I definitely understand the situation you described. maybe reach out to the Jackson authors too according to the doc? https://github.com/fasterxml/jackson-core/wiki/JsonFactory-Features ``` In unlikely event that the exception is triggered for valid data, it may make sense to either disable this feature, or to disable canonicalization. However, Jackson authors would also like to be notified for such usage as it may point to an issue with hashing scheme -- so please file an issue if you encounter this problem. ``` The doc also mentioned that hash collision check is `Only relevant if canonicalization is enabled`. wondering if `CANONICALIZE_FIELD_NAMES` should be disabled too. I imagined it can cause similar memory footprint issue as String interning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
