wypoon commented on PR #10784: URL: https://github.com/apache/iceberg/pull/10784#issuecomment-2623353054
For my edification, can someone please explain how duplicate file entries in manifests can arise? Can two entries for the same file occur in a single manifest? Can even two manifests be in the same manifest list if they overlap (have an entry for the same file in common)? I'd have thought both of these situations would be bugs. Or are there actual sequences of operations that lead to such outcomes, similar to how dangling deletes can occur? Basically, I'm trying to understand what repairs might be needed due to operations in Iceberg (and which of these are bugs), and what are needed only due to users doing something outside of Iceberg. Deleting a file from storage falls into the latter category. Also, I understand that there was an old bug where data file size was written incorrectly and this actually caused reads to fail, and this is the motivation for correcting the statistics in metadata. However, that bug was long fixed, so I wonder if there are still known situations where these statistics need to be corrected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org