Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

via GitHub Sun, 28 Jan 2024 11:58:20 -0800


rdblue commented on code in PR #9556:
URL: https://github.com/apache/iceberg/pull/9556#discussion_r1468928409



##########
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala:
##########
@@ -214,6 +214,8 @@ object RewriteMergeIntoTable extends 
RewriteRowLevelIcebergCommand with Predicat
 
     val rowFromSourceAttr = resolveAttrRef(ROW_FROM_SOURCE_REF, joinPlan)
     val rowFromTargetAttr = resolveAttrRef(ROW_FROM_TARGET_REF, joinPlan)
+    // The output expression should retain read attributes for correctly 
determining nullability
+    val matchedOutputsWithAttrs = matchedActions.map(matchedActionOutput(_, 
metadataAttrs) :+ readAttrs)

Review Comment:
   Okay, I think I understand why the inputs would be aligned with the outputs. 
This is the copy-on-write case, where each matched clause should produce an 
output for every column in order. If that's the case, then the indexing on each 
output should work. That is, each output should correspond to an input attr 
because both are based on the target table.
   
   If that's the case, then I guess I can see what is happening here. The input 
and output correspond, so the input type and name should be used. But the 
output does depend on the nullability of the output expression.
   
   If that's right, then the short-term fix is what I pasted above. If the 
table column is optional, then this should produce an optional output in case 
there is a null value.
   
   That will work, but it doesn't explain why the output nullability is 
incorrect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

Reply via email to