ZENOTME opened a new issue, #774:
URL: https://github.com/apache/iceberg-rust/issues/774

   ## Context 
   
   Make Datafile Serializable && Deserializable is useful, e.g. In distributed 
compute engine, it will create multiple writers in multiple machines and write 
the data in parallel and get the DataFile as the results, these DataFiles will 
be sent to a coordinator and append using transaction. In this case, DataFile 
should able to be Serializable && Deserializable.
   
   ## Solution 
   
   For now, we support Serialize DataFile in _serde module and we should 
convert the DataFile to _serde::DataFile first, the interface looks like:  `pub 
fn try_from(value: super::DataFile, partition_type: &StructType,is_version_1: 
bool) -> _serde::DataFile`. More detail: 
https://github.com/apache/iceberg-rust/blob/98cd34dc03cd87b330c7bff8fe9f3241746062ac/crates/iceberg/src/spec/manifest.rs#L1361.
   
   There is something we need to resolve to support Datafile Serializable && 
Deserializable:
   1. The related interface needs to be exposed to the public 
   2. The interface is not friendly. If the DataFile can be self-contain, 
things will be easier, e.g. DataFile itself can be Serialize && Deserialize, 
the user doesn't need to convert it using an interface like `pub fn 
try_from(value: super::DataFile, partition_type: &StructType,is_version_1: 
bool) -> _serde::DataFile`
   
   To solve the above, I think there are two solutions:
   1. Make DateFile self-contain, store the partition type and version in 
DataFile directly so that it converts into _serde::DataFile directly and it can 
be Serialize && Deserialize.
   2. Provide something like 
   ```
   struct SerializableDataFile {
     version: i32,
     partition_type: StructType
     data_file: DataFile
   }
   ```
   I prefer solution 1 because it looks more natural. Welcome to different 
opinions and solutions. cc @liurenjie1024 @Fokko @Xuanwo @c-thiel 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to