advancedxy commented on PR #9233: URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1848427549
> Spark 3.4 (same as my app) with the condition on the partitions will actually prune the unaffected partitions, while 3.5 will not. I did some quick debug. The reason why spark 3.4 succeeded is that `org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown` in Iceberg pushes down join conditions into the target source. In spark 3.5, this rule is removed in favor of upstream spark's `GroupBasedRowLevelOperationScanPlanning`, which push down commands' condition instead of rewrite plan's filter(https://github.com/apache/spark/commit/5a92eccd514b7bc0513feaecb041aee2f8cd5a24#diff-635af3d82f2675b4bb3fd07673916477844a2a7b76d65b23b9cda9a63228ec6dR40). So to make system function push down work for merge statement, you may have to pattern match ReplaceData and MergeRows, etc. Also cc @aokolnychyi . > I'm not sure why Filter would be pushed down to the data source for a full outer join This question is answered, it's covered by `RowLevelCommandScanRelationPushDown` or `GroupBasedRowLevelOperationScanPlanning` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org