[GitHub] [iceberg] haydenflinner commented on pull request #5933: [1.0.x] Core: Increase inferred column metrics limit to 100.

via GitHub Wed, 25 Jan 2023 12:05:58 -0800


haydenflinner commented on PR #5933:
URL: https://github.com/apache/iceberg/pull/5933#issuecomment-1404162903


   If I have a table with more than 100 columns, what are the downsides since 
I'm above this param value? I don't see it documented here -- 
https://iceberg.apache.org/docs/latest/configuration/
   
   I only ask because I have a table that is basically a collection of events. 
Upstream, each event has some metadata in a dict. Using a column per key in 
that metadata dict felt like it would compress better than each row having a 
{"key1": 123}, where the key names are relatively static and the values would 
benefit from columnar compression. The majority of such cols are empty for any 
particular partition which I assume is near 0 storage/runtime overhead. Like, 
file 1's rows will have metadata dict {"abc": 1234} repeated in virtually the 
whole GB of data. File 2 may have metadata in most rows of {"def": "foo"} 
instead. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] haydenflinner commented on pull request #5933: [1.0.x] Core: Increase inferred column metrics limit to 100.

Reply via email to