wypoon commented on PR #10784: URL: https://github.com/apache/iceberg/pull/10784#issuecomment-2623572958
Another question I have about duplicate entries (in the manifests) is: do their presence make the table unreadable? Or is the table still readable and it is a valid state, although undesirable (as is the case with dangling deletes, for example)? An observation is that the existing subinterfaces of `org.apache.iceberg.actions.SnapshotUpdate` are all actions that do not change the state of the table (the data remains the same). This is not the case with removing data (or delete) files from the metadata because those files are missing from the storage. In that case, the state of the table is changed; either data is deleted or added (due to "undeletion" from removing delete files) or both. For this reason, I think that removing files from metadata is logically different from "repair" operations such as deduplication of entries or correcting statistics for entries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org