Fokko commented on code in PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1423526180


##########
crates/iceberg/src/spec/manifest_list.rs:
##########
@@ -30,6 +30,9 @@ use self::{
 
 use super::{FormatVersion, StructType};
 
+/// The seq number when no added files are present.
+pub const UNASSIGNED_SEQ_NUMBER: i64 = -1;

Review Comment:
   In PyIceberg we set it to `0` when it is not set (v2) or unknown (v1). It is 
used to effectively prune and delete files that are not relevant to the data 
that are being read.
   
   In PyIceberg we first do the normal query planning by applying the partition 
filtering and the metrics. The new end up with a list of files where we compute 
the minimal data file sequence number:
   
   
https://github.com/apache/iceberg-python/blob/8c8abb5c4c258e32941110a9ce0938e1328290b3/pyiceberg/table/__init__.py#L1028-L1037
   
   There is an obvious fallback to `INITIAL_SEQUENCE_NUMBER` which is `0`. If 
this happens then we know that we can't use this number to prune, and all the 
deletes files that are present will be included (because the sequence number 
there is also greater than or equal to zero.
   
   I would suggest setting this to zero.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to