gaborkaszab commented on code in PR #8981:
URL: https://github.com/apache/iceberg/pull/8981#discussion_r1382965493


##########
format/spec.md:
##########
@@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` 
then `pos` to optimize
 
 Equality delete files identify deleted rows in a collection of data files by 
one or more column values, and may optionally contain additional columns of the 
deleted row.
 
-Equality delete files store any subset of a table's columns and use the 
table's field ids. The _delete columns_ are the columns of the delete file used 
to match data rows. Delete columns are identified by id in the delete file 
[metadata column `equality_ids`](#manifests). Float and double columns cannot 
be used as delete columns in equality delete files.
+Equality delete files store any subset of a table's columns and use the 
table's field ids. The _delete columns_ are the columns of the delete file used 
to match data rows. Delete columns are identified by id in the delete file 
[metadata column `equality_ids`](#manifests). The column restrictions for 
columns used in equality delete files are the same as those for [identifier 
fields](#identifier-field-ids) with the exception that optional columns and 
columns nested under optional structs are allowed (if a 
+parent struct column is null it implies the leaf column is null).

Review Comment:
   What would be the meaning of a null in the list of equality IDs? If I'm not 
mistaken equality IDs are used for identifying which columns are used for the 
equality checks, but 'null column' doesn't make sense for me.
   
   If you mean null values in the equality dele files, that's a good question. 
I believe in SQL world NULL doesn't equal to any value, even NULL doesn't equal 
to NULL, so I wonder what would be the desired semantics when we find a NULL 
value in the equality delete file. Should we apply "IS NULL" on that particular 
column? But then it won't be an equality check as the name "equality delete" 
would suggest but an IS NULL check. I'd simply not allow NULL values in the 
delete files TBH.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to