RussellSpitzer commented on code in PR #10835: URL: https://github.com/apache/iceberg/pull/10835#discussion_r1700381957
########## format/spec.md: ########## @@ -399,6 +401,9 @@ Sorting floating-point numbers should produce the following behavior: `-NaN` < ` A data or delete file is associated with a sort order by the sort order's id within [a manifest](#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes. +#### Writing with Identity transform + +When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an identity partition transforms applied are omitted. This is not the case for any other transform type. Review Comment: I'm not sure the "Due to" sentence is helpful here. I'm generally in favor of trying to keep the spec as sparse as possible. "When writing data files," can probably be removed from the first sentence as well without changing the meaning of the sentence. I'm not sure "those with an identity transform" is the right description. "All columns, including those whose values are also present in partition metadata, should be written to datafiles"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org