kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1951395305
It seems like there's an upper bound to the size of the RecordBatch produced by `to_batches`. I tried setting `max_chunksize` from `16 MB` to `256 MB`. All the batches produced are around 45MB in size. I guess is this what you mean by bin-packing. We can bin-pack these batches into 512 MB parquet files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org