amogh-jahagirdar commented on PR #12736:
URL: https://github.com/apache/iceberg/pull/12736#issuecomment-2817463379

   Thanks @RussellSpitzer for taking a look!
   
   >One thing I want to make sure we test is "WHEN NOT MATCHED" with no 
matching clauses. Spark has some logic which transforms these from row level 
operations into Appends and I am a little worried some of the logic here for 
manipulating the output and input table will not work in that case.
   
https://github.com/apache/spark/blob/597cafc05d9a4ecd5a111d1dc9c92fb37c77ce3f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala#L47-L75
   https://github.com/apache/iceberg/issues/12653
   I think we may be fine there because we essentially do not care about 
writing row_lineage or last_updated sequence values in this case, so writing 
null or not writing are both valid.
   
   
   Exactly, the implementation as it is currently does not have an explicit 
rule case for projecting row IDs or sequence values in the Append case because 
that is a case where we know we would only be adding new rows where all the row 
lDs or sequence values will be null. Which means they don't need to explicitly 
be persisted as per the spec. 
   
   This is what I was trying to say in this comment 
https://github.com/apache/iceberg/pull/12736/files#diff-cc38222cfec9d6a4142d6c102e3654a23ea9ec9cc1d9226ba19fbf12ae7c0329R120-R124
 but let me know if I could make it more clear. Essentially when building the 
SparkWrite, it could either be for an an
   1. overwrite files (CoW)
   2. an append
   3. overwrite dynamic partitions/by filter.
   
   So in both 2,3 we know it will only be addition of new rows or an addition + 
removal of existing rows. In these cases the row lineage fields will be null 
for the entire write. 1 is the only case where we need to make sure the write 
schema has row lineage. 
   
   Let me know if I'm missing anything! I used to have an explicit handling for 
the merge cases which translated into appends but I realized it's not really 
needed so I removed it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to