ifxchris commented on issue #1994: URL: https://github.com/apache/iceberg-python/issues/1994#issuecomment-2897769050
Hi all, we are experiencing the same issue but a bit more severe: ~~~ avg_row_size_bytes = tbl.nbytes / tbl.num_rows target_rows_per_file = target_file_size // avg_row_size_bytes batches = tbl.to_batches(max_chunksize=target_rows_per_file) ~~~ Our data is loaded from a parquet file with the following metadata: `Row group 0: count: 1 4.163 MB records start: 4 total(compressed): 4.163 MB total(uncompressed):40.739 MB` So we only have one record in the table. According to tbl.nbytes this is around 600MB in memory. Since this one record is bigger than 512Mb in memory, `target_rows_per_file` is calculated to be zero. As a result the `max_chunksize` is set to 0 and pyiceberg will crash due to that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org