[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7731: Core: Implement adaptive split planning in core.

via GitHub Sun, 04 Jun 2023 20:16:16 -0700


aokolnychyi commented on code in PR #7731:
URL: https://github.com/apache/iceberg/pull/7731#discussion_r1217404778



##########
core/src/main/java/org/apache/iceberg/util/TableScanUtil.java:
##########
@@ -35,16 +38,22 @@
 import org.apache.iceberg.ScanTaskGroup;
 import org.apache.iceberg.SplittableScanTask;
 import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.CloseableGroup;
 import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.CloseableIterator;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.FluentIterable;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class TableScanUtil {
+  private static final Logger LOG = 
LoggerFactory.getLogger(TableScanUtil.class);
+  private static final long MIN_SPLIT_SIZE = 16 * 1024 * 1024; // 16 MB

Review Comment:
   I would say 8 MB is already small enough so such tasks should proceed fairly 
quickly. I would be a bit concerned going smaller than that. The cost of 
opening files is non-trivial. Also, too small splits may overload the 
underlying storage with a large number of requests. I even find 16 MB 
reasonable, to be honest. I don't want our queries to fail with rate limits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7731: Core: Implement adaptive split planning in core.

Reply via email to