ZENOTME commented on code in PR #1079: URL: https://github.com/apache/iceberg-rust/pull/1079#discussion_r2040013598
########## crates/iceberg/src/writer/file_writer/parquet_writer.rs: ########## @@ -458,6 +459,54 @@ impl ParquetWriter { Ok(builder) } + + #[allow(dead_code)] + fn partition_value_from_statistics( + table_spec: Arc<PartitionSpec>, + lower_bounds: &HashMap<i32, Datum>, + upper_bounds: &HashMap<i32, Datum>, + ) -> Result<Struct> { + let mut partition_literals: Vec<Option<Literal>> = Vec::new(); + + for field in table_spec.fields() { + if let (Some(lower), Some(upper)) = ( + lower_bounds.get(&field.field_id), + upper_bounds.get(&field.field_id), + ) { + if !field.transform.preserves_order() { + return Err(Error::new( + ErrorKind::DataInvalid, + format!( + "cannot infer partition value for non linear partition field (needs to preserve order): {} with transform {}", + field.name, field.transform + ), + )); + } + + if lower != upper { Review Comment: > I don't think so, transform(lower) == transform(upper) doesn't mean the transformed result of each row are all same. This is interesting. The check here restricts the appended data file to have the same value for partition column. But in spec, the data file only needs to guarantee that the partition value of partition column within single data file is same. e.g. for `year(ts)`, `2015-10-13`, `2015-11-13` is ok to exist in single data file I think. But under this restriction, we could not append data file containing these two row, right? I'm not sure whether worth it, I think there are two ways to avoid this restriction: 1. Scan whole data file to compute the partition and make sure they are same. 2. For partition transform, preserve original order properties(I'm not sure whether this description is accurate, e.g year, month), `transform(lower) == transform(upper)` means the transformed result of each row are all same? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org