ZENOTME opened a new pull request, #741: URL: https://github.com/apache/iceberg-rust/pull/741
Hi, I'm working on writer recently and find some weaknesses of our original writer interface design: 1. Our IcebergWriter Builder interface looks like following: ``` trait IcebergWriterBuilder { type C; async fn build(self, config: Self::C) -> Result<Self::R>; } ``` And I realized that this custom config param cause we can't combine the write flexibility. E.g. In partition writer ``` struct PartitionWriter<IcebergWriterBuilder> { inner_writer_builder: B } impl PartitionWriter<IcebergWriterBuilder> { pub async fn write(..) { self.inner_writer_builder.build(...) // We can't build because we don't know pass which param. } } ``` So avoid this problem, we should pass the custom param when create the builder and the build interface should looks like `fn build() -> Self` 2. The schema of FileWriter can determined by base writer like: data file writer, position delete writer, equality delete writer. In our original design, user should pass the schema to file writer builder when create them like ``` let file_builder = ParquetWriterBuilder(schema) ``` However, sometimes the schema is hard to determine when we create them. E.g. equality delete writer, we only know what the schema looks like util we pass the equality id and create the equality delete writer. To avoid the problem, we change `fn build() -> Self` of FileWriterBuilder to `fn build(schema:SchemaRef) -> Self`. By this way, the schema of FileWriter is determined by base writer. I send this discussion as a PR to make it easier to express my idea. BTW, I applied this change and complete partition writer, delta writer in https://github.com/ZENOTME/iceberg-rust/tree/partition_writer and it looks well. Feel free for any suggestion. cc @liurenjie1024 @Xuanwo @Fokko @c-thiel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org