ZENOTME opened a new issue, #774: URL: https://github.com/apache/iceberg-rust/issues/774
## Context Make Datafile Serializable && Deserializable is useful, e.g. In distributed compute engine, it will create multiple writers in multiple machines and write the data in parallel and get the DataFile as the results, these DataFiles will be sent to a coordinator and append using transaction. In this case, DataFile should able to be Serializable && Deserializable. ## Solution For now, we support Serialize DataFile in _serde module and we should convert the DataFile to _serde::DataFile first, the interface looks like: `pub fn try_from(value: super::DataFile, partition_type: &StructType,is_version_1: bool) -> _serde::DataFile`. More detail: https://github.com/apache/iceberg-rust/blob/98cd34dc03cd87b330c7bff8fe9f3241746062ac/crates/iceberg/src/spec/manifest.rs#L1361. There is something we need to resolve to support Datafile Serializable && Deserializable: 1. The related interface needs to be exposed to the public 2. The interface is not friendly. If the DataFile can be self-contain, things will be easier, e.g. DataFile itself can be Serialize && Deserialize, the user doesn't need to convert it using an interface like `pub fn try_from(value: super::DataFile, partition_type: &StructType,is_version_1: bool) -> _serde::DataFile` To solve the above, I think there are two solutions: 1. Make DateFile self-contain, store the partition type and version in DataFile directly so that it converts into _serde::DataFile directly and it can be Serialize && Deserialize. 2. Provide something like ``` struct SerializableDataFile { version: i32, partition_type: StructType data_file: DataFile } ``` I prefer solution 1 because it looks more natural. Welcome to different opinions and solutions. cc @liurenjie1024 @Fokko @Xuanwo @c-thiel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org