ZENOTME opened a new pull request, #135: URL: https://github.com/apache/iceberg-rust/pull/135
related issue: #34 I drafted a writer framework which has been implemented in icelake https://github.com/icelake-io/icelake/issues/243 and proved that it's extensible and flexible. The following is the introduction of this API: ## Target The writer API is designed to be extensible and flexible. Each writer is decoupled and can be create and config independently. User can: 1. Combine different writer builder to build a writer which have complex write logic. Such as FanoutPartition + DataFileWrite or FanoutPartition + PosititionDeleteFileWrite. 2. Customize the writer and combine it with original writer builder to build a writer which can process the data in a specific way. ## How it works There are two kinds of writer and related builder: 1. `IcebergWriter` and `IcebergWriterBuilder`, they are focus on the data process logical. If you want to support a new data process logical, you need to implement a new `IcebergWriter` and `IcebergWriterBuilder`. 2. `FileWriter` and `FileWriterBuilder`, they are focus on the physical file write. If you want to support a new physical file format, you need to implement a new `FileWriter` and `FileWriterBuilder`. The create process of iceberg writer is: 1. Create a `FileWriterBuilder`. 1a. Combine it with other `FileWriterBuilder` to get a new `FileWriterBuilder`. 2. Use FileWriterBuilder to create a `IcebergWriterBuilder`. 2a. Combine it with other `IcebergWriterBuilder` to get a new `IcebergWriterBuilder`. 3. Use `build` function in `IcebergWriterBuilder` to create a `IcebergWriter`. ## Simple Case 1: Create a data file writer using parquet file format. ``` // 1. Create a parquet file writer builder. let parquet_writer_builder = ParquetFileWriterBuilder::new(parquet_file_writer_config); // 2. Create a data file writer builder. let DataFileWriterBuilder = DataFileWriterBuilder::new(parquet_writer_builder,data_file_writer_config); // 3. Create a iceberg writer. let iceberg_writer = DataFileWriterBuilder.build(schema).await?; iceberg_writer.write(input).await?; let write_result = iceberg_writer.flush().await?; ``` ## Complex Case 2: Create a fanout partition data file writer using parquet file format. ``` // 1. Create a parquet file writer builder. let parquet_writer_builder = ParquetFileWriterBuilder::new(parquet_file_writer_config); // 2. Create a data file writer builder. let DataFileWriterBuilder = DataFileWriterBuilder::new(parquet_writer_builder,data_file_writer_config); // 3. Create a fanout partition writer builder. let fanout_partition_writer_builder = FanoutPartitionWriterBuilder::new(DataFileWriterBuilder, partition_config); // 4. Create a iceberg writer. let iceberg_writer = fanout_partition_writer_builder.build(schema).await?; iceberg_writer.write(input).await?; let write_result = iceberg_writer.flush().await?; ``` ## More case: may be the example in icelake https://github.com/icelake-io/icelake/blob/949dda79d2ebdfa7ad07e2f88a67d01d7040c181/icelake/tests/insert_tests_v2.rs#L164 Feel free for any suggestions and I'm glad to modify them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org