liurenjie1024 commented on code in PR #1849:
URL: https://github.com/apache/iceberg-rust/pull/1849#discussion_r2522618102


##########
crates/iceberg/src/spec/manifest/_serde.rs:
##########
@@ -258,6 +258,10 @@ fn parse_bytes_entry(v: Vec<BytesEntry>, schema: &Schema) 
-> Result<HashMap<i32,
                 })?
                 .clone();
             m.insert(entry.key, Datum::try_from_bytes(&entry.value, 
data_type)?);
+        } else {
+            // Field is not in current schema (e.g., dropped field due to 
schema evolution).
+            // Store the statistic as binary data to preserve it even though 
we don't know its type.
+            m.insert(entry.key, Datum::binary(entry.value.to_vec()));

Review Comment:
   This fix may lead to slient error in user application, because user assumes 
that the returned statistics in Manifest are all parsed, and they will see a 
type mismatch. I think there are two ways to do this fix:
   1. We add a new enum:
   ```rust
   enum Statistic {
      Parsed(Datum),
      Raw(Vec<u8)
   }
   ```
   And change `DataFile`'s lower/upper bound to `HashMap<i32, Statistic>`.  
This will lead to breaking api change, but will users will be aware of this, 
and will not see slient breaking of their application.
   
   2. We pass `TableMetadata` into this parsing function, and search for 
missing field id in all schemas in `TableMetadata`. This approach may slow down 
the deserialization a little when seeing field ids due to schema evolution, but 
it will not lead to any api change.
   
   Personally I perfer to approach 2, WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to