ifxchris commented on issue #1994:
URL: 
https://github.com/apache/iceberg-python/issues/1994#issuecomment-2897769050

   Hi all,
   
   we are experiencing the same issue but a bit more severe:
   
   ~~~
       avg_row_size_bytes = tbl.nbytes / tbl.num_rows
       target_rows_per_file = target_file_size // avg_row_size_bytes
       batches = tbl.to_batches(max_chunksize=target_rows_per_file)
   ~~~
   
   Our data is loaded from a parquet file with the following metadata:
   `Row group 0:  count: 1  4.163 MB records  start: 4  total(compressed): 
4.163 MB total(uncompressed):40.739 MB`
   So we only have one record in the table.
   
   According to tbl.nbytes this is around 600MB in memory.
   Since this one record is bigger than 512Mb in memory, `target_rows_per_file` 
is calculated to be zero.
   As a result the `max_chunksize` is set to 0 and pyiceberg will crash due to 
that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to