liurenjie1024 commented on PR #1328: URL: https://github.com/apache/iceberg-rust/pull/1328#issuecomment-2900132681
> Thanks everyone for chiming in here. Let me summarize the discussion. I think there is consensus that the callback is not ideal. > > 1. Supply required information to construct the summaries > > 1. Instead of having the `Fn(i32) -> Result<Option<StructType>>` provider, we could pass in a `HashMap<i32, StructType>`. We would bind all the `PartitionSpec`'s in PyIceberg. This is relative straightforward, but comes at a cost when there are many PartitionSpecs (which should be okay for the majority of tables). > 2. What @kevinjqliu suggested [Expose Avro reader to PyIceberg #1328 (comment)](https://github.com/apache/iceberg-rust/pull/1328#discussion_r2094174778) suggested. Pass in the current `Schema` and `PartitionSpec`'s to Iceberg-Rust where we can do the lazy binding on the Iceberg-Rust side. > 3. Go all the way, and convert the `TableMetadata` to Iceberg-Rust, this is probably where we end up at some point at some day, but require a lot of scaffolding. > 2. Deserialize in `Vec<u8>` instead of a `Datum`, and convert them later into the actual type. This removes the dependency on the `Schema` and the `PartitionSpec`'s. > > I'm leaning towards 2 since that aligns the best with PyIceberg, where we can deserialize the manifest-list without having to know about the schema. I would make sure that we have consensus before moving into a certain direction, and happy to follow up on that. +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org