gaborkaszab commented on code in PR #8981:
URL: https://github.com/apache/iceberg/pull/8981#discussion_r1386184942


##########
format/spec.md:
##########
@@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` 
then `pos` to optimize
 
 Equality delete files identify deleted rows in a collection of data files by 
one or more column values, and may optionally contain additional columns of the 
deleted row.
 
-Equality delete files store any subset of a table's columns and use the 
table's field ids. The _delete columns_ are the columns of the delete file used 
to match data rows. Delete columns are identified by id in the delete file 
[metadata column `equality_ids`](#manifests). Float and double columns cannot 
be used as delete columns in equality delete files.
+Equality delete files store any subset of a table's columns and use the 
table's field ids. The _delete columns_ are the columns of the delete file used 
to match data rows. Delete columns are identified by id in the delete file 
[metadata column `equality_ids`](#manifests). The column restrictions for 
columns used in equality delete files are the same as those for [identifier 
fields](#identifier-field-ids) with the exception that optional columns and 
columns nested under optional structs are allowed (if a 
+parent struct column is null it implies the leaf column is null).

Review Comment:
   I have one implementation related objection against treating `NULL`s in the 
delete files as `IS NULL`s:
   This might be different in each query engine but in general I think it's in 
planning time when you create the predicates for the query, but the information 
that a row in the delete file contains `NULL` will be known only during the 
execution phase when you read the actual rows. But at that point you won't be 
exchanging an equality predicate to an `IS NULL` predicate, right?
   Do you know if query engines already are able to get around this problem and 
treat `NULL` values in the delete files as `IS NULL`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to