Fokko commented on PR #444:
URL: https://github.com/apache/iceberg-python/pull/444#issuecomment-2025890254

   @kevinjqliu Thanks for adding the examples. I think in general we want to 
have slightly bigger files. 
   
   A simple heuristic I can think of is that we put an upper bound on the 
number of files, equal to the number of threads. This way we still get decent 
parallelization, but avoid creating many small files (and avoid the overhead of 
opening new files). We can do this in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to