count to iceberg

via GitHub Sun, 12 Feb 2023 15:45:07 -0800


rdblue commented on code in PR #6622:
URL: https://github.com/apache/iceberg/pull/6622#discussion_r1103898014



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -193,6 +337,19 @@ private Schema schemaWithMetadataColumns() {
 
   @Override
   public Scan build() {
+    // if aggregates are pushed down, instead of constructing a 
SparkBatchQueryScan, creating file
+    // read tasks and sending over the tasks to Spark executors, a 
SparkLocalScan will be created
+    // and the scan is done locally on the Spark driver instead of the 
executors. The statistics
+    // info will be retrieved from manifest file and used to build a Spark 
internal row, which
+    // contains the pushed down aggregate values.
+    if (pushedAggregateRows != null) {

Review Comment:
   I think it would be slightly better to create the scan in the aggregation 
methods. Then this could be `if (localScan != null) { return localScan }` which 
is a bit more generic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6622: push down min/max/count to iceberg

Reply via email to