huaxingao commented on code in PR #10943: URL: https://github.com/apache/iceberg/pull/10943#discussion_r1779324357
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java: ########## @@ -78,7 +78,19 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S Schema expectedSchema, List<Expression> filters, Supplier<ScanReport> scanReportSupplier) { - super(spark, table, readConf, expectedSchema, filters, scanReportSupplier); + this(spark, table, scan, readConf, expectedSchema, filters, scanReportSupplier, null); + } + + SparkPartitioningAwareScan( + SparkSession spark, + Table table, + Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan, + SparkReadConf readConf, + Schema expectedSchema, + List<Expression> filters, + Supplier<ScanReport> scanReportSupplier, + Integer pushedLimit) { Review Comment: Thanks @RussellSpitzer for sharing your concern and suggestion! I think we can borrow @aokolnychyi 's idea of adding [`ParquetBatchReadConf` and `OrcBatchReadConf`](https://github.com/apache/iceberg/pull/9841#discussion_r1594829408). We can have something like ``` @Value.Immutable public interface ParquetBatchReadConf extends Serializable { ParquetReaderType readerType(); // this is for comet, we don't need this for now int batchSize(); @Nullable Integer limit(); } @Value.Immutable public interface OrcBatchReadConf extends Serializable { int batchSize(); } ``` Similarly, we can also have `ParquetRowReadConf` and `OrcRowReadConf`. I have changed the code to add `ParquetBatchReadConf` and `OrcBatchReadConf`. We still need to pass `pushedLimit` to the `SparkPartitioningAwareScan`, `SparkScan` and `SparkBatch` constructors, so `pushedLimit` can be passed from `SparkScanBuilder` to `SparkBatch`, this is because `pushedLimit` is not available in `SparkReadConf`, we have to call `SparkScanBuilder`.`pushLimit` to get `pushedLimit`. Please let me know if this approach looks OK to you. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org