rdblue commented on code in PR #6622: URL: https://github.com/apache/iceberg/pull/6622#discussion_r1103898014
########## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ########## @@ -193,6 +337,19 @@ private Schema schemaWithMetadataColumns() { @Override public Scan build() { + // if aggregates are pushed down, instead of constructing a SparkBatchQueryScan, creating file + // read tasks and sending over the tasks to Spark executors, a SparkLocalScan will be created + // and the scan is done locally on the Spark driver instead of the executors. The statistics + // info will be retrieved from manifest file and used to build a Spark internal row, which + // contains the pushed down aggregate values. + if (pushedAggregateRows != null) { Review Comment: I think it would be slightly better to create the scan in the aggregation methods. Then this could be `if (localScan != null) { return localScan }` which is a bit more generic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org