Sl1mb0 opened a new issue, #778: URL: https://github.com/apache/iceberg-rust/issues/778
At the moment, the building and serialization of Iceberg metadata is coupled together. For example, let's say I want to build a `ManifestFile` that I then add to a `ManifestList`: (some code has not been included for the sake of brevity) ```rust let manifest_file_path = NamedTempFile::new().unwrap(); let manifest_file_output = FileIOBuilder::new_fs_io() .build() .unwrap() .new_output(manifest_file_path.path().to_str().unwrap()) .unwrap(); let manifest_writer = ManifestWriter::new(manifest_file_output, 0, Vec::new()); let manifest_file = manifest_writer .write(manifest) .await .unwrap() let manifest_list_path = NamedTempFile::new().unwrap(); let manifest_list_output = FileIOBuilder::new_fs_io() .build() .unwrap() .new_output(manifest_list_path.path().to_str().unwrap()) .unwrap(); let mut writer = ManifestListWriter::v2(manifest_list_output,0,0,0); writer.add_manifests(vec![manifest_file]); writer.close().await.unwrap(); ``` - There is an abstract coupling of building and serialization: in order to 'build' a `ManifestFile` you have to 'write' a `Manifest`. - There is another abstract coupling of building/serde: The _where this metadata gets written to_ is included in the _what metadata is written_ - When you specify a location to write a `ManifestFile` to - that location is where the `ManifestFile` gets written to _and is [included in the metadata](https://github.com/apache/iceberg-rust/blob/42aff04658a00b390122260dbbeaf512d11af61f/crates/iceberg/src/spec/manifest.rs#L305) of that `ManifestFile`_ - This means that when the built `ManifestFile` is added to a `ManifestList`, the location of the `ManifestFile` is what's used to 'point' the `ManifestList` to that `ManifestFile` - This coupling forces the user to use the `FileIO`/`OutputFile`/`InputFile` type to write to their preferred storage layer instead of allowing the user to build/use their own abstractions for "where the bytes get written to" - We would really like to separate the building and serialization layers as that will allow us to use our own storage layer abstractions. - To provide an example: if the user wants to use their own storage layer for storing metadata bytes - They must build/write all the necessary metadata types using `FileIO` - They would then need to 'copy' all these bytes to their preferred storage layer - :warning: **problem** :warning: - Because the metadata itself contains "where" the metadata is once that metadata is "moved" somewhere else, it's no longer valid. This is because the 'metadata hierarchy' (IE which metadata points to which snapshot points to which manifest list etc) is only valid for where it was built/serialized. To illustrate this:  In the above example the `ManifestList` and `ManifestFile` were built/serialized on `Node B` and then copied over to `Node A` but because the building/serialization was performed on `Node B` - the `ManifestList` on `Node A` points to the `ManifestFile` on `Node B`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org