stevenzwu commented on PR #12224:
URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2651667891

   >  Given partition summary field names and other snapshot properties are 
often not reused across different metadata, the interning causes more harm than 
good.
   
   @bryanck I didn't quite get the partition summary field names. were you 
referring to `PartitionFieldSummaryParser`? it seems to have just 4 field names.
   
   String.intern can be helpful for some use cases while harmful for some (like 
the one you encountered). Disabling interning seems to be a safer option 
considering diverse scenarios that the code can be used (like REST catalog 
server).
   
   >  hash collision check 
   
   I definitely understand the situation you described. maybe reach out to the 
Jackson authors too according to the doc?
   https://github.com/fasterxml/jackson-core/wiki/JsonFactory-Features
   ```
   In unlikely event that the exception is triggered for valid data, it may 
make sense to either disable this feature, or to disable canonicalization. 
However, Jackson authors would also like to be notified for such usage as it 
may point to an issue with hashing scheme -- so please file an issue if you 
encounter this problem.
   ```
   
   The doc also mentioned that hash collision check is `Only relevant if 
canonicalization is enabled`. wondering if `CANONICALIZE_FIELD_NAMES` should be 
disabled too. I imagined it can cause similar memory footprint issue as String 
interning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to