ZENOTME commented on issue #891: URL: https://github.com/apache/iceberg-rust/issues/891#issuecomment-2592111636
> Thank you, [@ZENOTME](https://github.com/ZENOTME), for the response. > > I work with a startup building a distributed query engine for large-scale Iceberg tables (>1PB). The `partition-specific` hierarchical directory becomes crucial from a query engine's perspective! > > I'm working on a high-performance stream writer using Iceberg-Rust as the core library, I need to soon justify the ROI in picking Iceberg-Rust over the Java library, and this feature is a deal breaker. Hence, I'm happy to contribute if you could help point me in the right direction. Hi, recently I'm thinking a design to support this and I will send it as a draft PR as a start point to help us discuss this quickly. However, it involved the interface design so I'm not sure it will reach consensus and merge it to upstream soon. Actually, we support to custom a LocationGenerator in parquet writer here: https://github.com/ZENOTME/iceberg-rust/blob/cde35ab0eefffae88c521d4e897ba86ee754861c/crates/iceberg/src/writer/file_writer/parquet_writer.rs#L66, so maybe you can try to use this to custom a LocationGenerator which produce the path in partition subdirectory, e.g. ``` struct PartitionLocationGenerator; impl PartitionLocationGenerator { pub new(partition_value) -> Self; pub generate() -> String { let table_localtion = ...; format!("{}/{}/", table_location, partition_value.to_string()) } } ``` However it requires you to create a new parquet writer builder with a new LocationGenerator for every new partition value. And that's why we may need some redesign here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org