rdblue opened a new pull request, #7731:
URL: https://github.com/apache/iceberg/pull/7731

   This adds utility methods for adaptive split planning to `TableScanUtil` in 
core.
   
   The adaptive size is determined by eagerly loading file tasks up to the 
target parallelism * the requested split size. If that size is reached, then 
the requested split size is used. Otherwise, all files in the scan were loaded 
and the total size of the scan is used to create the split size by dividing the 
total size by the parallelism. If the resulting split size is too low, a 
minimum of 16MB is applied.
   
   This differs from #7688 because that PR uses the total table size rather 
than a size specific to the scan. It also differs from #7714 because this is in 
core and is not specific to Spark. Any engine could set the scan parallelism 
and use adaptive split planning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to