amogh-jahagirdar opened a new issue, #9555: URL: https://github.com/apache/iceberg/issues/9555
### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Reproduction: Here's a simple unit test (can copy/paste this into `TestMerge`) ``` @Test public void testMergeIntoTsIssue() { createAndInitTable( "id INT, ts TIMESTAMP", "{ \"id\": 1, \"ts\": \"2000-01-01 00:00:00\" }\n" + "{ \"id\": 6, \"ts\": null }"); createOrReplaceView( "source", "id INT NOT NULL, dep STRING", "{ \"id\": 1, \"ts\": \"2000-01-01 00:00:00\" }\n"); sql( "MERGE INTO %s t USING source s " + "ON t.id == s.id " + "WHEN MATCHED THEN " + " UPDATE SET id=123, ts=current_timestamp()", commitTarget()); sql("SELECT * FROM %s", commitTarget()); } ``` In short: 1.) create a table 2.) insert some records where at least one of the records has a NULL column value. 3.) MERGE into the table with an update on matched records and set the column with the null value Expected: Record 1: id=123, ts=current_timestamp() Record 2: id=6, ts=null However, in Spark 3.4 we get Record 1:id=123, ts=current_timestamp() Record 2: id=6, ts=01-01-1970 00:00:000 (basically Unix epoch. in practice it's timestamp with tz so it'll appear to your timezone) I've done some debugging and what's happening is that the schema for the `SparkWrite` in 3.4 is treating all the fields as required, leading to the default behavior. The reason why it's treating it as required is because the Spark expression `nullability` is that the attributes for the fields https://github.com/apache/iceberg/blob/main/spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala#L182 aren't being passed to the merge output schema during planning. This nullability needs to be passed correctly so that the null values in the non-matched cases get written correctly. I'm currently looking into this, but creating this issue for tracking and awareness. Important Note: Spark 3.3 and Spark 3.5 do not have this bug based on my testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org