[GitHub] [iceberg] aokolnychyi opened a new issue, #7465: Pick split size automatically

via GitHub Fri, 28 Apr 2023 12:39:22 -0700


aokolnychyi opened a new issue, #7465:
URL: https://github.com/apache/iceberg/issues/7465


   ### Feature Request / Improvement
   
   I wonder whether we can pick the split size automatically instead of relying 
on users to supply a correct value. For instance, if the scheduler is FIFO, can 
we use the default cluster parallelism and the size of the data to be processed 
to come up with an optimal split size? We first find matching files and then 
plan splits so the split size can be dynamic, we just need a good way to 
estimate it correctly. It would be great if someone could investigate this a 
bit more.
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi opened a new issue, #7465: Pick split size automatically

Reply via email to