hantangwangd commented on code in PR #12201:
URL: https://github.com/apache/iceberg/pull/12201#discussion_r1947986708


##########
core/src/main/java/org/apache/iceberg/util/TableScanUtil.java:
##########
@@ -236,6 +236,9 @@ public static long adjustSplitSize(long scanSize, int 
parallelism, long splitSiz
     // use the configured split size if it produces at least one split per slot
     // otherwise, adjust the split size to target parallelism with a 
reasonable minimum
     // increasing the split size may cause expensive spills and is not done 
automatically
+    if (splitSize <= 0) {

Review Comment:
   When querying in Spark, the `adjustSplitSize(...)` is invoked earlier than 
`TableScanUtil.planTaskGroups(...)` as follows, so the illegal split size will 
be handle in `adjustSplitSize(...)` first and meet this problem. I believe in 
other engine, if they have the logic to adjust the split size, this would be 
the case as well.
   
   ```
           TableScanUtil.planTaskGroups(
                   CloseableIterable.withNoopClose(tasks()),
                   adjustSplitSize(tasks(), scan.targetSplitSize()),
                   scan.splitLookback(),
                   scan.splitOpenFileCost());
   ```
   
   According to my understanding of what you mead, we should simply add a 
`checkArgument` for splitSize in `adjustSplitSize(...)` to throw an error for 
illegal value, is that right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to