ZENOTME opened a new pull request, #135:
URL: https://github.com/apache/iceberg-rust/pull/135

   related issue: #34 
   I drafted a writer framework which has been implemented in icelake 
https://github.com/icelake-io/icelake/issues/243 and proved that it's 
extensible and flexible. The following is the introduction of this API:
   
   ## Target
   
   The writer API is designed to be extensible and flexible. Each writer is 
decoupled and can be create and config independently. User can:
   1. Combine different writer builder to build a writer which have complex 
write logic. Such as FanoutPartition + DataFileWrite or FanoutPartition + 
PosititionDeleteFileWrite.
   2. Customize the writer and combine it with original writer builder to build 
a writer which
   can process the data in a specific way.  
   
   ## How it works
   
   There are two kinds of writer and related builder:
   1. `IcebergWriter` and `IcebergWriterBuilder`, they are focus on the data 
process logical.
       If you want to support a new data process logical, you need to implement 
a new `IcebergWriter` and `IcebergWriterBuilder`.
   2. `FileWriter` and `FileWriterBuilder`, they are focus on the physical file 
write.
       If you want to support a new physical file format, you need to implement 
a new `FileWriter` and `FileWriterBuilder`.
   
   The create process of iceberg writer is:
   1. Create a `FileWriterBuilder`.
       1a. Combine it with other `FileWriterBuilder` to get a new 
`FileWriterBuilder`.
   2. Use FileWriterBuilder to create a `IcebergWriterBuilder`.
       2a. Combine it with other `IcebergWriterBuilder` to get a new 
`IcebergWriterBuilder`.
   3. Use `build` function in `IcebergWriterBuilder` to create a 
`IcebergWriter`.
   
   ## Simple Case 1: Create a data file writer using parquet file format.
   ```
   // 1. Create a parquet file writer builder.
   let parquet_writer_builder = 
ParquetFileWriterBuilder::new(parquet_file_writer_config);
   // 2. Create a data file writer builder.
   let DataFileWriterBuilder = 
DataFileWriterBuilder::new(parquet_writer_builder,data_file_writer_config);
   // 3. Create a iceberg writer.
   let iceberg_writer = DataFileWriterBuilder.build(schema).await?;
   
   iceberg_writer.write(input).await?;
   
   let write_result = iceberg_writer.flush().await?;
   ```
   
   ## Complex Case 2: Create a fanout partition data file writer using parquet 
file format.
   ```
   // 1. Create a parquet file writer builder.
   let parquet_writer_builder = 
ParquetFileWriterBuilder::new(parquet_file_writer_config);
   // 2. Create a data file writer builder.
   let DataFileWriterBuilder = 
DataFileWriterBuilder::new(parquet_writer_builder,data_file_writer_config);
   // 3. Create a fanout partition writer builder.
   let fanout_partition_writer_builder = 
FanoutPartitionWriterBuilder::new(DataFileWriterBuilder, partition_config);
   // 4. Create a iceberg writer.
   let iceberg_writer = fanout_partition_writer_builder.build(schema).await?;
   
   iceberg_writer.write(input).await?;
   
   let write_result = iceberg_writer.flush().await?;
   ```
   
   ## More case: may be the example in icelake 
   
https://github.com/icelake-io/icelake/blob/949dda79d2ebdfa7ad07e2f88a67d01d7040c181/icelake/tests/insert_tests_v2.rs#L164
   
   Feel free for any suggestions and I'm glad to modify them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to