Fokko commented on PR #444: URL: https://github.com/apache/iceberg-python/pull/444#issuecomment-2025890254
@kevinjqliu Thanks for adding the examples. I think in general we want to have slightly bigger files. A simple heuristic I can think of is that we put an upper bound on the number of files, equal to the number of threads. This way we still get decent parallelization, but avoid creating many small files (and avoid the overhead of opening new files). We can do this in a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org