rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1732750057
@fokko The 1MB was really just a guess. I think that `configureBlockSize()` represents the largest block of the map/array that will be buffered in memory before being written, of course an array or map can consist of multiple blocks. Thinking of my use cases this is how I came up with my guess that 1 MB is a reasonable size. The largest maps I common encounter are the maps from the `field_id` to the highest or lowest value for a column in a particular file. The highest or lowest value is byte array which can be variable length, lets commonly lets bound those values at 256 bytes. The `field_id` also won't be greater than 8 bytes in length (commonly it will be shorter due to zigzag encoding). So for a table of 200 columns lets try: 8 bytes (field_id) * 256 bytes (value length) * 200 (column count) = 409,600 bytes. I'm happy to hear your thoughts on this, but 1 MB seems like a reasonable first guess, until we make it a table property. Do we want to make it a table property? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org