rdblue commented on code in PR #11240:
URL: https://github.com/apache/iceberg/pull/11240#discussion_r1796135851


##########
format/spec.md:
##########
@@ -619,19 +627,25 @@ Data files that match the query filter must be read by 
the scan.
 Note that for any snapshot, all file paths marked with "ADDED" or "EXISTING" 
may appear at most once across all manifest files in the snapshot. If a file 
path appears more than once, the results of the scan are undefined. Reader 
implementations may raise an error in this case, but are not required to do so.
 
 
-Delete files that match the query filter must be applied to data files at read 
time, limited by the scope of the delete file using the following rules.
+Delete files and deletion vector metadata that match the filters must be 
applied to data files at read time, limited by the following scope rules.
 
+* A deletion vector must be applied to a data file when all of the following 
are true:
+    - The data file's `file_path` is equal to the deletion vector's 
`referenced_data_file`
+    - The data file's data sequence number is _less than or equal to_ the 
deletion vector's data sequence number
+    - The data file's partition (both spec and partition values) is equal [4] 
to the deletion vector's partition

Review Comment:
   Yeah, I debated whether to keep this or not and ended up deciding that it is 
helpful. The second requirement should also be impossible if the first 
requirement is true.
   
   I kept both of these because I want people reading the spec to understand 
that these can be relied on in the scan planning algorithm. If this only said 
that the `file_path` must match, then implementers may think that they need to 
consider _all_ deletion vectors without first filtering by partition and data 
sequence number.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to