aokolnychyi opened a new issue, #7465: URL: https://github.com/apache/iceberg/issues/7465
### Feature Request / Improvement I wonder whether we can pick the split size automatically instead of relying on users to supply a correct value. For instance, if the scheduler is FIFO, can we use the default cluster parallelism and the size of the data to be processed to come up with an optimal split size? We first find matching files and then plan splits so the split size can be dynamic, we just need a good way to estimate it correctly. It would be great if someone could investigate this a bit more. ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
