Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

via GitHub Sat, 09 Dec 2023 06:39:59 -0800


advancedxy commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1848427549


   > Spark 3.4 (same as my app) with the condition on the partitions will 
actually prune the unaffected partitions, while 3.5 will not.
   
   I did some quick debug. The reason why spark 3.4 succeeded is that 
`org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown`
  in Iceberg pushes down join conditions into the target source. In spark 3.5, 
this rule is removed in favor of upstream spark's 
`GroupBasedRowLevelOperationScanPlanning`, which push down commands' condition 
instead of rewrite plan's 
filter(https://github.com/apache/spark/commit/5a92eccd514b7bc0513feaecb041aee2f8cd5a24#diff-635af3d82f2675b4bb3fd07673916477844a2a7b76d65b23b9cda9a63228ec6dR40).
    So to make system function push down work for merge statement, you may have 
to pattern match ReplaceData and MergeRows, etc. Also cc @aokolnychyi .
   
   > I'm not sure why Filter would be pushed down to the data source for a full 
outer join
   
   This question is answered, it's covered by 
`RowLevelCommandScanRelationPushDown` or 
`GroupBasedRowLevelOperationScanPlanning`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

Reply via email to