aokolnychyi commented on code in PR #7731:
URL: https://github.com/apache/iceberg/pull/7731#discussion_r1217404778
##########
core/src/main/java/org/apache/iceberg/util/TableScanUtil.java:
##########
@@ -35,16 +38,22 @@
import org.apache.iceberg.ScanTaskGroup;
import org.apache.iceberg.SplittableScanTask;
import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.CloseableGroup;
import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.CloseableIterator;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.FluentIterable;
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.types.Types;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
public class TableScanUtil {
+ private static final Logger LOG =
LoggerFactory.getLogger(TableScanUtil.class);
+ private static final long MIN_SPLIT_SIZE = 16 * 1024 * 1024; // 16 MB
Review Comment:
I would say 8 MB is already small enough so such tasks should proceed fairly
quickly. I would be a bit concerned going smaller than that. The cost of
opening files is non-trivial. Also, too small splits may overload the
underlying storage with a large number of requests. I even find 16 MB
reasonable, to be honest. I don't want our queries to fail with rate limits.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]