wypoon commented on PR #10784:
URL: https://github.com/apache/iceberg/pull/10784#issuecomment-2623353054

   For my edification, can someone please explain how duplicate file entries in 
manifests can arise? Can two entries for the same file occur in a single 
manifest? Can even two manifests be in the same manifest list if they overlap 
(have an entry for the same file in common)? I'd have thought both of these 
situations would be bugs. Or are there actual sequences of operations that lead 
to such outcomes, similar to how dangling deletes can occur?
   Basically, I'm trying to understand what repairs might be needed due to 
operations in Iceberg (and which of these are bugs), and what are needed only 
due to users doing something outside of Iceberg. Deleting a file from storage 
falls into the latter category.
   Also, I understand that there was an old bug where data file size was 
written incorrectly and this actually caused reads to fail,  and this is the 
motivation for correcting the statistics in metadata. However, that bug was 
long fixed, so I wonder if there are still known situations where these 
statistics need to be corrected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to