ZENOTME commented on code in PR #1079:
URL: https://github.com/apache/iceberg-rust/pull/1079#discussion_r2040013598


##########
crates/iceberg/src/writer/file_writer/parquet_writer.rs:
##########
@@ -458,6 +459,54 @@ impl ParquetWriter {
 
         Ok(builder)
     }
+
+    #[allow(dead_code)]
+    fn partition_value_from_statistics(
+        table_spec: Arc<PartitionSpec>,
+        lower_bounds: &HashMap<i32, Datum>,
+        upper_bounds: &HashMap<i32, Datum>,
+    ) -> Result<Struct> {
+        let mut partition_literals: Vec<Option<Literal>> = Vec::new();
+
+        for field in table_spec.fields() {
+            if let (Some(lower), Some(upper)) = (
+                lower_bounds.get(&field.field_id),
+                upper_bounds.get(&field.field_id),
+            ) {
+                if !field.transform.preserves_order() {
+                    return Err(Error::new(
+                        ErrorKind::DataInvalid,
+                        format!(
+                            "cannot infer partition value for non linear 
partition field (needs to preserve order): {} with transform {}",
+                            field.name, field.transform
+                        ),
+                    ));
+                }
+
+                if lower != upper {

Review Comment:
   > I don't think so, transform(lower) == transform(upper) doesn't mean the 
transformed result of each row are all same.
   
   This is interesting. The check here restricts the appended data file to have 
the same value for partition column. But in spec, the data file only needs to 
guarantee that the partition value of partition column within single data file 
is same. e.g. for `year(ts)`, `2015-10-13`, `2015-11-13` is ok to exist in 
single data file I think. But under this restriction, we could not append data 
file containing these two row, right? 
   I'm not sure whether worth it, I think there are two ways to avoid this 
restriction:
   1. Scan whole data file  to compute the partition and make sure they are 
same.
   2. For partition transform, preserve original order properties(I'm not sure 
whether this description is accurate, e.g year, month), `transform(lower) == 
transform(upper)` means the transformed result of each row are all same? 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to