rdblue commented on code in PR #9556: URL: https://github.com/apache/iceberg/pull/9556#discussion_r1468928409
########## spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ########## @@ -214,6 +214,8 @@ object RewriteMergeIntoTable extends RewriteRowLevelIcebergCommand with Predicat val rowFromSourceAttr = resolveAttrRef(ROW_FROM_SOURCE_REF, joinPlan) val rowFromTargetAttr = resolveAttrRef(ROW_FROM_TARGET_REF, joinPlan) + // The output expression should retain read attributes for correctly determining nullability + val matchedOutputsWithAttrs = matchedActions.map(matchedActionOutput(_, metadataAttrs) :+ readAttrs) Review Comment: Okay, I think I understand why the inputs would be aligned with the outputs. This is the copy-on-write case, where each matched clause should produce an output for every column in order. If that's the case, then the indexing on each output should work. That is, each output should correspond to an input attr because both are based on the target table. If that's the case, then I guess I can see what is happening here. The input and output correspond, so the input type and name should be used. But the output does depend on the nullability of the output expression. If that's right, then the short-term fix is what I pasted above. If the table column is optional, then this should produce an optional output in case there is a null value. That will work, but it doesn't explain why the output nullability is incorrect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org