ZENOTME commented on code in PR #275:
URL: https://github.com/apache/iceberg-rust/pull/275#discussion_r1560477918


##########
crates/iceberg/src/writer/mod.rs:
##########
@@ -15,13 +15,69 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! The iceberg writer module.
-
-use crate::spec::DataFileBuilder;
+//! Iceberg writer module.
+//!
+//! The writer API is designed to be extensible and flexible. Each writer is 
decoupled and can be create and config independently. User can:
+//! 1.Customize the writer using the writer trait.
+//! 2.Combine different writer to build a writer which have complex write 
logic.
+//!
+//! There are two kinds of writer:
+//! 1. FileWriter: Focus on writing record batch to different physical file 
format.(Such as parquet. orc)
+//! 2. IcebergWriter: Focus on the logical format of iceberg table. It will 
write the data using the FileWriter finally.
+//!
+//! # Simple example for data file writer:
+//! ```ignore
+//! // Create a parquet file writer builder. The parameter can get from table.
+//! let file_writer_builder = ParquetWriterBuilder::new(
+//!    0,
+//!    WriterProperties::builder().build(),
+//!    schema,
+//!    file_io.clone(),
+//!    loccation_gen,
+//!    file_name_gen,
+//! )
+//! // Create a data file writer using parquet file writer builder.
+//! let data_file_builder = DataFileBuilder::new(file_writer_builder);
+//! // Build the data file writer.
+//! let data_file_writer = data_file_builder.build().await.unwrap();
+//!
+//! data_file_writer.write(&record_batch).await.unwrap();
+//! let data_files = data_file_writer.flush().await.unwrap();
+//! ```
 
+pub mod base_writer;
 pub mod file_writer;
 
-type DefaultOutput = Vec<DataFileBuilder>;
+use crate::{spec::DataFile, Result};
+use arrow_array::RecordBatch;
+
+type DefaultInput = RecordBatch;
+type DefaultOutput = Vec<DataFile>;
+
+/// The builder for iceberg writer.
+#[async_trait::async_trait]
+pub trait IcebergWriterBuilder<I = DefaultInput, O = DefaultOutput>:
+    Send + Clone + 'static
+{
+    /// The associated writer type.
+    type R: IcebergWriter<I, O>;
+    /// The associated writer config type used to build the writer.
+    type C;
+    /// Build the iceberg writer.
+    async fn build(self, config: Self::C) -> Result<Self::R>;
+}
+
+/// The iceberg writer used to write data to iceberg table.
+#[async_trait::async_trait]
+pub trait IcebergWriter<I = DefaultInput, O = DefaultOutput>: Send + 'static {
+    /// Write data to iceberg table.
+    async fn write(&mut self, input: I) -> Result<()>;
+    /// Close the writer and return the written data files.
+    /// If close failed, the data written before maybe be lost. User may need 
to recreate the writer and rewrite the data again.
+    /// # NOTE
+    /// After close, no matter successfully or fail,the writer should never be 
used again, otherwise the writer will panic.
+    async fn close(&mut self) -> Result<O>;

Review Comment:
   I find that flush is hard to handle the error case. To make the semantics 
easier to maintain. I use `close(&mut)` for `IcebergWriter` instead. The user 
should not use the writer again after the writer closes it, no matter 
successful or fail. This behavior should be guaranteed at compile time so I use 
panic for it rather than return an error. 
   
   Why not use `close(self)?` to guarantee this by compilor.
   
   Because we need the writer can be used as a trait object, like `Box<dyn 
IcebergWriter>`. So we can't have a interface like `close(self)`.
   



##########
crates/iceberg/src/writer/mod.rs:
##########
@@ -15,13 +15,69 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! The iceberg writer module.
-
-use crate::spec::DataFileBuilder;
+//! Iceberg writer module.
+//!
+//! The writer API is designed to be extensible and flexible. Each writer is 
decoupled and can be create and config independently. User can:
+//! 1.Customize the writer using the writer trait.
+//! 2.Combine different writer to build a writer which have complex write 
logic.
+//!
+//! There are two kinds of writer:
+//! 1. FileWriter: Focus on writing record batch to different physical file 
format.(Such as parquet. orc)
+//! 2. IcebergWriter: Focus on the logical format of iceberg table. It will 
write the data using the FileWriter finally.
+//!
+//! # Simple example for data file writer:
+//! ```ignore
+//! // Create a parquet file writer builder. The parameter can get from table.
+//! let file_writer_builder = ParquetWriterBuilder::new(
+//!    0,
+//!    WriterProperties::builder().build(),
+//!    schema,
+//!    file_io.clone(),
+//!    loccation_gen,
+//!    file_name_gen,
+//! )
+//! // Create a data file writer using parquet file writer builder.
+//! let data_file_builder = DataFileBuilder::new(file_writer_builder);
+//! // Build the data file writer.
+//! let data_file_writer = data_file_builder.build().await.unwrap();
+//!
+//! data_file_writer.write(&record_batch).await.unwrap();
+//! let data_files = data_file_writer.flush().await.unwrap();
+//! ```
 
+pub mod base_writer;
 pub mod file_writer;
 
-type DefaultOutput = Vec<DataFileBuilder>;
+use crate::{spec::DataFile, Result};
+use arrow_array::RecordBatch;
+
+type DefaultInput = RecordBatch;
+type DefaultOutput = Vec<DataFile>;
+
+/// The builder for iceberg writer.
+#[async_trait::async_trait]
+pub trait IcebergWriterBuilder<I = DefaultInput, O = DefaultOutput>:
+    Send + Clone + 'static
+{
+    /// The associated writer type.
+    type R: IcebergWriter<I, O>;
+    /// The associated writer config type used to build the writer.
+    type C;
+    /// Build the iceberg writer.
+    async fn build(self, config: Self::C) -> Result<Self::R>;
+}
+
+/// The iceberg writer used to write data to iceberg table.
+#[async_trait::async_trait]
+pub trait IcebergWriter<I = DefaultInput, O = DefaultOutput>: Send + 'static {
+    /// Write data to iceberg table.
+    async fn write(&mut self, input: I) -> Result<()>;
+    /// Close the writer and return the written data files.
+    /// If close failed, the data written before maybe be lost. User may need 
to recreate the writer and rewrite the data again.
+    /// # NOTE
+    /// After close, no matter successfully or fail,the writer should never be 
used again, otherwise the writer will panic.
+    async fn close(&mut self) -> Result<O>;

Review Comment:
   I find that flush is hard to handle the error case. To make the semantics 
easier to maintain. I use `close(&mut)` for `IcebergWriter` instead. The user 
should not use the writer again after the writer closes it, no matter 
successful or fail. This behavior should be guaranteed at compile time so I use 
panic for it rather than return an error. 
   
   Why not use `close(self)?` to guarantee this by compiler?
   Because we need the writer can be used as a trait object, like `Box<dyn 
IcebergWriter>`. So we can't have a interface like `close(self)`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to