ZENOTME opened a new issue, #777:
URL: https://github.com/apache/iceberg-rust/issues/777

   After #349, we support appending DataFile now. But I found there are some 
check may miss now: When we append DataFile, schema evolution or partition 
evolution may happen in the table after we generate the DataFile, which will 
cause the info of DataFile invalid. E.g partition value in DataFile will be 
invalid when partition evolution happen. lower_bound(upper_bound) will be 
invalid when schema evolution happen. So we need to detect the case that 
DataFile is incompatible with table. 
   
   
   For partition evolution, we have two ways to detect:
   1. Ensure that the partition value schema matches the existing partition 
spec in terms of type, [this is the way we have 
now](https://github.com/apache/iceberg-rust/blob/42aff04658a00b390122260dbbeaf512d11af61f/crates/iceberg/src/transaction.rs#L313).
 But there are some case it can't detect for this way, e.g. partition spec type 
<p1: int, p2: int> reorder to <p2: int, p1: int>
   2. **Ensure that the partition value schema matches the existing partition 
spec in terms of field name or field id.**
   
   For schema evolution:
   1. It may still lead to partition evolution, and the detection method for 
partition values is the same as mentioned above.
   2. Check whether the lower_bound/upper_bound is match using the field ID.
   
   Based on the above analysis, we need to make the following fixes:
   - [ ] The partition in DataFile should include types to facilitate 
validation. e.g. the field name and field id 
   - [ ] Append operations need to add validation checks for scheme evolution: 
lower bounds, upper_bound.
   
   I'm not sure whether my understand is correct, please correct me if 
something wrong. cc @Fokko @liurenjie1024 @Xuanwo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to