RussellSpitzer commented on issue #11765: URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2542309392
> However, sir, I might have discovered some issues. When executing the COW-MERGE-INTO command, Spark needs to use the ods_table twice. The first time is to match data files based on incremental records, and the second time is to perform the actual data merge. If the data in the ods_table changes between the first and second usage, I would like to know if this could lead to abnormal execution results? What would happen if the data in the ods_table suddenly increases? What about if the data in the ods_table suddenly decreases? > > Example: ods_table is a partitioned table, during the execution of the merge-into statement, someone adds or deletes partitions. Yes this is true, the relation which is created of source data must remain constant through the two different passes of the source data. The Target (Iceberg Table) can change. If the query would return different results I believe you could see odd behavior. We have some hooks to prevent this when the source is Iceberg I think but I don't believe we have any for non Iceberg sources. I may be forgetting something else but @aokolnychyi probably knows. I believe the workaround here is for a non-idempotent Subquery you should cache or persist it prior to merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org