bryanck commented on PR #12224:
URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2651697343

   > @bryanck I didn't quite get the partition summary field names. were you 
referring to `PartitionFieldSummaryParser`? it seems to have just 4 field names.
   > 
   > String.intern can be helpful for some use cases while harmful for some 
(like the one you encountered). Disabling interning seems to be a safer option 
considering diverse scenarios that the code can be used (like REST catalog 
server).
   
   The information for each partition key has a field name unique to the 
partition (with the prefix `partition.`). There is some discussion around 
intern [here](https://github.com/FasterXML/jackson-core/issues/332) with more 
links. TL;DR is that intern was disabled by default for Jackson 3 (whenever 
that is released).
   
   > I definitely understand the situation you described. maybe reach out to 
the Jackson authors too according to the doc? 
https://github.com/fasterxml/jackson-core/wiki/JsonFactory-Features
   
   Sure sounds good, I'll reach out.
   
   > The doc also mentioned that hash collision check is `Only relevant if 
canonicalization is enabled`. wondering if `CANONICALIZE_FIELD_NAMES` should be 
disabled too. I imagined it can cause similar memory footprint issue as String 
interning.
   
   Canonicalization can help when field names are reused within a single 
metadata file, so that seemed helpful still.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to