RussellSpitzer commented on code in PR #10835:
URL: https://github.com/apache/iceberg/pull/10835#discussion_r1700381957


##########
format/spec.md:
##########
@@ -399,6 +401,9 @@ Sorting floating-point numbers should produce the following 
behavior: `-NaN` < `
 
 A data or delete file is associated with a sort order by the sort order's id 
within [a manifest](#manifests). Therefore, the table must declare all the sort 
orders for lookup. A table could also be configured with a default sort order 
id, indicating how the new data should be sorted by default. Writers should use 
this default sort order to sort the data on write, but are not required to if 
the default order is prohibitively expensive, as it would be for streaming 
writes.
 
+#### Writing with Identity transform
+
+When writing data files, all columns including those with an identity 
transforms should be written to data files. This provides redundancy in case of 
corruption or bugs in the metadata layer. Due to [column projection 
rules](#column-projection) readers can still properly scan the table if columns 
that have an identity partition transforms applied are omitted. This is not the 
case for any other transform type.

Review Comment:
   I'm not sure the "Due to" sentence is helpful here. I'm generally in favor 
of trying to keep the spec as sparse as possible.
   
   
   "When writing data files,"  can probably be removed from the first sentence 
as well without changing the meaning of the sentence.
   
   I'm not sure "those with an identity transform" is the right description. 
"All columns, including those whose values are also present in partition 
metadata, should be written to datafiles"?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to