advancedxy commented on PR #9233: URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847450030
> ``` == Physical Plan == ReplaceData (13) +- * Sort (12) +- * Project (11) +- MergeRows (10) +- SortMergeJoin FullOuter (9) <---- Full Outer here ``` If the join type is full outer, it means that there are NoMatchedActions. So your merge into command should have an `when not matched` clause, is that correct? > ```(1) BatchScan target Output [60]: [..., _file#2279] target (branch=null) [filters=((((MEAS_YM = '202306' AND ((MEAS_DD = '02' AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)))) OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD) IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3)))) OR (((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR ((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3)))), groupedBy=MEAS_YM, MEAS_DD, POD_bucket] (5) BatchScan source Output [60]: [...] source (branch=null) [filters=, groupedBy=MEAS_YM, MEAS_DD, POD_bucket] (14) BatchScan target Output [8]: [..., _file#2590] target (branch=null) [filters=((((MEAS_YM = '202306' AND ((MEAS_DD = '02' AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)))) OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD) IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3)))) OR (((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR ((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3)))), POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD IS NOT NULL, MAGNITUDE IS NOT NULL, METER_KEY IS NOT NULL, REC_ID IS NOT NULL, COLLECT_ID IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket] (18) BatchScan source Output [7]: [...] source (branch=null) [filters=POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD IS NOT NULL, MAGNITUDE IS NOT NULL, METER_KEY IS NOT NULL, REC_ID IS NOT NULL, COLLECT_ID IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket] ``` Could you give the full plan tree or dag for this changed plan? Is the join type still full outer? This is quite strange. I'm not sure why Filter would be pushed down to the data source for a full outer join. You may set `spark.sql.planChangeLog.level` to `INFO` to get which rule changes the plan, and posted related plan changes in a gist, that would help to clarify the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org