haydenflinner commented on PR #5933: URL: https://github.com/apache/iceberg/pull/5933#issuecomment-1404162903
If I have a table with more than 100 columns, what are the downsides since I'm above this param value? I don't see it documented here -- https://iceberg.apache.org/docs/latest/configuration/ I only ask because I have a table that is basically a collection of events. Upstream, each event has some metadata in a dict. Using a column per key in that metadata dict felt like it would compress better than each row having a {"key1": 123}, where the key names are relatively static and the values would benefit from columnar compression. The majority of such cols are empty for any particular partition which I assume is near 0 storage/runtime overhead. Like, file 1's rows will have metadata dict {"abc": 1234} repeated in virtually the whole GB of data. File 2 may have metadata in most rows of {"def": "foo"} instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org