amogh-jahagirdar commented on PR #9556:
URL: https://github.com/apache/iceberg/pull/9556#issuecomment-1915239004

   > Just to be sure, can you set spark.sql.planChangeLog.level to info and 
execute the failing test to see what rule adds the default value behavior? It 
is still not clear how a wrong nullability info in MergeRows led to inserting 
default values.
   
   Sure, I've set that for before/after the fix (specifically the fix @rdblue 
mentioned here 
https://github.com/apache/iceberg/pull/9556#discussion_r1468932165). 
   
   Logs Before: 
https://gist.github.com/amogh-jahagirdar/98ffeaa21e203423c7da0c98e132e9bd
   Logs After: 
https://gist.github.com/amogh-jahagirdar/6209e373b37e2d7a134df447ec49f8b5
   
   I don't see anything differentiating in terms of the rules and their 
outputs.  I can still explain why there's a default value behavior though based 
on what I saw the debugger. 
   
   The Iceberg schema used for performing the write that gets built here 
SparkWriteBuilder#validateOrMergeWriteSchema: 
https://github.com/apache/iceberg/blob/54756b6f5c653be2ab271bfbe262e77109bf9608/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWriteBuilder.java#L120
   
   That schema makes the fields *required* instead of optional, due to the 
nullability passed through in the Spark Writer API being incorrect. Then when 
the write is actually performed, we end up writing default values.
   
   I've updated this PR to follow the approach @rdblue 
https://github.com/apache/iceberg/pull/9556#discussion_r1468932165 which after 
going through the thread I believe does make sense, and is a better solution 
compared to the previous approach which just forces the matched output list to 
have the attributes for making the nullability check behave as expected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to