sshivampeta commented on PR #16289: URL: https://github.com/apache/iceberg/pull/16289#issuecomment-4504322499
> The contract of REPLACE TABLE is that the table's data, schema, and partitioning are replaced. There are no assumptions to validate, so checking for conflicts between concurrent changes and those changes is not required. If this is what you're trying to change, then I'm -1 for this commit because it is changing behavior that is already considered and chosen. It looks like this is the intent of this PR, but please reply if I'm wrong. > > That said, REPLACE TABLE is also intended to keep table history (it is not a drop/re-create). So there are, potentially, valid problems with concurrency. For example, if a table's schema is updated from schema-id 1 to schema-id 2 and then the REPLACE TABLE operation removes the new schema-id 2 -- as though the concurrent commit never happened -- then that's a bug that we should fix. Thanks for the detailed feedback, @rdblue. You're right that REPLACE TABLE by contract replaces the table's data, schema, and partitioning — and that's intentional. This PR is not trying to change that contract. The concern is specifically about table history preservation. As you noted, REPLACE TABLE should keep table history (it is not a drop/re-create). The bug is that commitReplaceTransaction refreshes base to detect version conflicts but then commits the original stale current metadata wholesale — silently erasing concurrently-committed schemas, expired snapshots, and other history changes. For example: A concurrent expireSnapshots successfully removes snapshot-1, but the replace brings it back because its stale metadata still references it — this is table corruption (the snapshot files may have been physically deleted). A concurrent updateSchema adds schema-id 2, but the replace drops it — as you described, "as though the concurrent commit never happened." The current approach (fail-fast on any concurrent metadata change) is the simplest safe starting point to prevent this class of bugs. It's conservative — it will reject even non-conflicting concurrent changes — but it ensures no committed work is silently lost. A more refined approach that allows non-conflicting changes to survive (e.g., rebasing replace metadata on top of the refreshed base) could follow as a separate PR once the immediate safety issue is addressed. Would this framing address your concern? Happy to discuss further or adjust the approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
