fengguangyuan commented on issue #9741:
URL: https://github.com/apache/iceberg/issues/9741#issuecomment-1951823635

   Hi, there.
   I believe it's the protection for the correctness of the existed data, 
instead of a bug.
   
   > Basic logics of parallel write: possibly read the same data, but never 
commit metadata based on the same snapshot.
   
   From line `at 
org.apache.iceberg.BaseOverwriteFiles.apply(BaseOverwriteFiles.java:31) 
~[iceberg-spark3-runtime.jar:?]
   ` in the stacktrace, we can know the thread is overwriting files but failed 
with losing delete files.
   
   Considering Compact & Overwrite tasks running in parallel (Expire task does 
nothing), it's possible that they are holding the same snapshot (including the 
same view of the delete files) to do their works, but at some point a Compact 
task `committed before the Overwrite task `trying to call the internal method 
to commit metadata, obviously these overwrite tasks will fail with 
`ValidationException` on the latest snapshot (old delete files are invisible) 
during in committing.
   
   Hope I had explained some key points to help you to understand the commit 
logic in parallel. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to