ZENOTME opened a new issue, #777: URL: https://github.com/apache/iceberg-rust/issues/777
After #349, we support appending DataFile now. But I found there are some check may miss now: When we append DataFile, schema evolution or partition evolution may happen in the table after we generate the DataFile, which will cause the info of DataFile invalid. E.g partition value in DataFile will be invalid when partition evolution happen. lower_bound(upper_bound) will be invalid when schema evolution happen. So we need to detect the case that DataFile is incompatible with table. For partition evolution, we have two ways to detect: 1. Ensure that the partition value schema matches the existing partition spec in terms of type, [this is the way we have now](https://github.com/apache/iceberg-rust/blob/42aff04658a00b390122260dbbeaf512d11af61f/crates/iceberg/src/transaction.rs#L313). But there are some case it can't detect for this way, e.g. partition spec type <p1: int, p2: int> reorder to <p2: int, p1: int> 2. **Ensure that the partition value schema matches the existing partition spec in terms of field name or field id.** For schema evolution: 1. It may still lead to partition evolution, and the detection method for partition values is the same as mentioned above. 2. Check whether the lower_bound/upper_bound is match using the field ID. Based on the above analysis, we need to make the following fixes: - [ ] The partition in DataFile should include types to facilitate validation. e.g. the field name and field id - [ ] Append operations need to add validation checks for scheme evolution: lower bounds, upper_bound. I'm not sure whether my understand is correct, please correct me if something wrong. cc @Fokko @liurenjie1024 @Xuanwo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org