rdblue commented on code in PR #14004: URL: https://github.com/apache/iceberg/pull/14004#discussion_r2462220053
########## format/spec.md: ########## @@ -1875,6 +1875,25 @@ Some implementations require that GZIP compressed files have the suffix `.gz.met Although the spec allows for including the deleted row itself (in addition to the path and position of the row in the data file) in v2 position delete files, writing the row is optional and no implementation currently writes it. The ability to write and read the row is supported in the Java implementation but is deprecated in version 1.11.0. +### Schema Evolution/Type Promotion + +Column projection rules are designed so that the table will remain readable even if writers use an outdated schema. At the beginning of a transaction Writers should load the latest schema (the schema referenced by `current-schema-id` from the latest table metadata) and use it for reading and writing data. Note, that in the common cases of schema evolution (adding nullable columns, adding required columns with an `initial-default`, renaming a column, dropping a column, or doing type promotion), appending data with outdated schemas presents no issues under either SNAPSHOT or SERIALIZABLE isolation levels + +However, the less common case of updating default values may need to be handled depending on isolation level. Consider two concurrent transactions: + +* **T1** modifies the `write-default` on the column. +* **T2** writes data that makes use of `write-default` from the changed column in the first transaction. + +If the **T1** commits before **T2** then handling **T2** depends on isolation level. Review Comment: I see what you're saying here, but it assumes a definition of these isolation levels. Serializable as it is commonly understood -- that in order to commit the second modification it must produce the same result in the commit order -- certainly requires failing. But it isn't clear to me that T2 "may be committed" under snapshot isolation in all definitions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
