danielcweeks commented on code in PR #7731:
URL: https://github.com/apache/iceberg/pull/7731#discussion_r1210583681


##########
core/src/main/java/org/apache/iceberg/util/TableScanUtil.java:
##########
@@ -35,16 +38,22 @@
 import org.apache.iceberg.ScanTaskGroup;
 import org.apache.iceberg.SplittableScanTask;
 import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.CloseableGroup;
 import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.CloseableIterator;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.FluentIterable;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class TableScanUtil {
+  private static final Logger LOG = 
LoggerFactory.getLogger(TableScanUtil.class);
+  private static final long MIN_SPLIT_SIZE = 16 * 1024 * 1024; // 16 MB

Review Comment:
   I don't think we want to focus too much on optimization/cost of single tasks 
here.  The goal is to achieve better overall wall-clock performance through 
parallelism and sacrifice some individual task efficiency for overall 
throughput.  I would go as low as conceivably possible, which means either a 
value like this or the row group size if configured since that's the lowest 
splittable unit of work. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to