advancedxy commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847450030

   >
   ```
   == Physical Plan ==
   ReplaceData (13)
   +- * Sort (12)
      +- * Project (11)
         +- MergeRows (10)
            +- SortMergeJoin FullOuter (9)  <---- Full Outer here
   ```
   
   If the join type is full outer, it means that there are NoMatchedActions. So 
your merge into command should have an `when not matched` clause, is that 
correct?
   
   >
   ```(1) BatchScan target
   Output [60]: [..., _file#2279]
   target (branch=null) [filters=((((MEAS_YM = '202306' AND ((MEAS_DD = '02' 
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' 
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND 
MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD) 
IN (0, 1, 2)))) OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD) 
IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3)))) OR 
(((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR 
((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3)))), 
groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (5) BatchScan source
   Output [60]: [...]
   source (branch=null) [filters=, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (14) BatchScan target
   Output [8]: [..., _file#2590]
   target (branch=null) [filters=((((MEAS_YM = '202306' AND ((MEAS_DD = '02' 
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' 
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND 
MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD) 
IN (0, 1, 2)))) OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD) 
IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3)))) OR 
(((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR 
((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3)))), POD IS NOT 
NULL, MEAS_YM IS NOT NULL, MEAS_DD IS NOT NULL, MAGNITUDE IS NOT NULL, 
METER_KEY IS NOT NULL, REC_ID IS NOT NULL, COLLECT_ID IS NOT NULL, 
groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (18) BatchScan source
   Output [7]: [...]
   source (branch=null) [filters=POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD 
IS NOT NULL, MAGNITUDE IS NOT NULL, METER_KEY IS NOT NULL, REC_ID IS NOT NULL, 
COLLECT_ID IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   ```
   Could you give the full plan tree or dag for this changed plan? Is the join 
type still full outer?  This is quite strange.  I'm not sure why Filter would 
be pushed down to the data source for a full outer join.  You may set 
`spark.sql.planChangeLog.level` to `INFO` to get which rule changes the plan, 
and posted related plan changes in a gist, that would help to clarify the 
problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to