amogh-jahagirdar commented on PR #9556: URL: https://github.com/apache/iceberg/pull/9556#issuecomment-1915239004
> Just to be sure, can you set spark.sql.planChangeLog.level to info and execute the failing test to see what rule adds the default value behavior? It is still not clear how a wrong nullability info in MergeRows led to inserting default values. Sure, I've set that for before/after the fix (specifically the fix @rdblue mentioned here https://github.com/apache/iceberg/pull/9556#discussion_r1468932165). Logs Before: https://gist.github.com/amogh-jahagirdar/98ffeaa21e203423c7da0c98e132e9bd Logs After: https://gist.github.com/amogh-jahagirdar/6209e373b37e2d7a134df447ec49f8b5 I don't see anything differentiating in terms of the rules and their outputs. I can still explain why there's a default value behavior though based on what I saw the debugger. The Iceberg schema used for performing the write that gets built here SparkWriteBuilder#validateOrMergeWriteSchema: https://github.com/apache/iceberg/blob/54756b6f5c653be2ab271bfbe262e77109bf9608/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWriteBuilder.java#L120 That schema makes the fields *required* instead of optional, due to the nullability passed through in the Spark Writer API being incorrect. Then when the write is actually performed, we end up writing default values. I've updated this PR to follow the approach @rdblue https://github.com/apache/iceberg/pull/9556#discussion_r1468932165 which after going through the thread I believe does make sense, and is a better solution compared to the previous approach which just forces the matched output list to have the attributes for making the nullability check behave as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org