rdblue opened a new pull request, #7731: URL: https://github.com/apache/iceberg/pull/7731
This adds utility methods for adaptive split planning to `TableScanUtil` in core. The adaptive size is determined by eagerly loading file tasks up to the target parallelism * the requested split size. If that size is reached, then the requested split size is used. Otherwise, all files in the scan were loaded and the total size of the scan is used to create the split size by dividing the total size by the parallelism. If the resulting split size is too low, a minimum of 16MB is applied. This differs from #7688 because that PR uses the total table size rather than a size specific to the scan. It also differs from #7714 because this is in core and is not specific to Spark. Any engine could set the scan parallelism and use adaptive split planning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org