ZENOTME opened a new pull request, #741:
URL: https://github.com/apache/iceberg-rust/pull/741

   Hi, I'm working on writer recently and find some weaknesses of our original 
writer interface design:
   1. Our IcebergWriter Builder interface looks like following:
   ```
   trait IcebergWriterBuilder {
    type C;
    async fn build(self, config: Self::C) -> Result<Self::R>;
   }
   ```
   And I realized that this custom config param cause we can't combine the 
write flexibility. E.g. In partition writer
   ```
   struct PartitionWriter<IcebergWriterBuilder> {
     inner_writer_builder: B
   }
   
   impl  PartitionWriter<IcebergWriterBuilder>  {
     pub async fn write(..) {
         self.inner_writer_builder.build(...) // We can't build because we 
don't know pass which param.
     }
   }
   ```
   So avoid this problem, we should pass the custom param when create the 
builder and the build interface should looks like `fn build() -> Self`
   
   2. The schema of FileWriter can determined by base writer like: data file 
writer, position delete writer, equality delete writer.
   In our original design, user should pass the schema to file writer builder 
when create them like 
   ```
   let file_builder = ParquetWriterBuilder(schema)
   ``` 
   However, sometimes the schema is hard to determine when we create them. E.g. 
equality delete writer, we only know what the schema looks like util we pass 
the equality id and create the equality delete writer. To avoid the problem, we 
change `fn build() -> Self` of FileWriterBuilder to `fn build(schema:SchemaRef) 
-> Self`. By this way, the schema of FileWriter is determined by base writer. 
   
   I send this discussion as a PR to make it easier to express my idea. BTW, I 
applied this change and complete partition writer, delta writer in 
https://github.com/ZENOTME/iceberg-rust/tree/partition_writer and it looks well.
   
   Feel free for any suggestion. cc @liurenjie1024 @Xuanwo @Fokko @c-thiel 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to