hantangwangd commented on code in PR #12201: URL: https://github.com/apache/iceberg/pull/12201#discussion_r1947986708
########## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ########## @@ -236,6 +236,9 @@ public static long adjustSplitSize(long scanSize, int parallelism, long splitSiz // use the configured split size if it produces at least one split per slot // otherwise, adjust the split size to target parallelism with a reasonable minimum // increasing the split size may cause expensive spills and is not done automatically + if (splitSize <= 0) { Review Comment: When querying in Spark, the `adjustSplitSize(...)` is invoked earlier than `TableScanUtil.planTaskGroups(...)` as follows, so the illegal split size will be handle in `adjustSplitSize(...)` first and meet this problem. I believe in other engine, if they have the logic to adjust the split size, this would be the case as well. ``` TableScanUtil.planTaskGroups( CloseableIterable.withNoopClose(tasks()), adjustSplitSize(tasks(), scan.targetSplitSize()), scan.splitLookback(), scan.splitOpenFileCost()); ``` According to my understanding of what you mean, we should simply add a `checkArgument` for splitSize in `adjustSplitSize(...)` to throw an error for illegal value, is that right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org