rustyconover commented on PR #8625:
URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1732750057

   @fokko The 1MB was really just a guess.
   
   I think that `configureBlockSize()` represents the largest block of the 
map/array that will be buffered in memory before being written, of course an 
array or map can consist of multiple blocks.  
   
   Thinking of my use cases this is how I came up with my guess that 1 MB is a 
reasonable size.
   
   The largest maps I common encounter are the maps from the `field_id` to the 
highest or lowest value for a column in a particular file.  The highest or 
lowest value is byte array which can be variable length, lets commonly lets 
bound those values at 256 bytes.  The `field_id` also won't be greater than 8 
bytes in length (commonly it will be shorter due to zigzag encoding).  So for a 
table of 200 columns lets try:
   
   8 bytes (field_id) * 256 bytes (value length) * 200 (column count) = 409,600 
bytes.
   
   I'm happy to hear your thoughts on this, but 1 MB seems like a reasonable 
first guess, until we make it a table property.  Do we want to make it a table 
property?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to