liurenjie1024 commented on code in PR #738:
URL: https://github.com/apache/iceberg-rust/pull/738#discussion_r1891236836


##########
crates/iceberg/src/spec/manifest.rs:
##########
@@ -203,12 +206,95 @@ impl ManifestWriter {
         partition_summary
     }
 
-    /// Write a manifest.
-    pub async fn write(mut self, manifest: Manifest) -> Result<ManifestFile> {
+    /// Add a new manifest entry. This method will update following status of 
the entry:
+    /// - Update the entry status to `Added`
+    /// - Set the snapshot id to the current snapshot id
+    /// - Set the sequence number to `None` if it is invalid(smaller than 0)
+    /// - Set the file sequence number to `None`
+    pub(crate) fn add(&mut self, mut entry: ManifestEntry) -> Result<()> {
+        if entry.sequence_number().is_some_and(|n| n >= 0) {
+            entry.status = ManifestStatus::Added;
+            entry.snapshot_id = Some(self.snapshot_id);
+            entry.file_sequence_number = None;
+        } else {
+            entry.status = ManifestStatus::Added;
+            entry.snapshot_id = Some(self.snapshot_id);
+            entry.sequence_number = None;
+            entry.file_sequence_number = None;
+        };
+        self.add_entry(entry)?;
+        Ok(())
+    }
+
+    /// Add a delete manifest entry. This method will update following status 
of the entry:
+    /// - Update the entry status to `Deleted`
+    /// - Set the snapshot id to the current snapshot id
+    ///
+    /// # TODO
+    /// Remove this check after merge append.
+    #[allow(dead_code)]
+    pub(crate) fn delete(&mut self, mut entry: ManifestEntry) -> Result<()> {
+        entry.status = ManifestStatus::Deleted;
+        entry.snapshot_id = Some(self.snapshot_id);
+        self.add_entry(entry)?;
+        Ok(())
+    }
+
+    /// Add an existing manifest entry. This method will update following 
status of the entry:
+    /// - Update the entry status to `Existing`
+    ///
+    /// # TODO
+    /// Remove this check after merge append.
+    #[allow(dead_code)]
+    pub(crate) fn existing(&mut self, mut entry: ManifestEntry) -> Result<()> {
+        entry.status = ManifestStatus::Existing;
+        self.add_entry(entry)?;
+        Ok(())
+    }
+
+    fn add_entry(&mut self, entry: ManifestEntry) -> Result<()> {
+        // Check if the entry has sequence number
+        if (entry.status == ManifestStatus::Deleted || entry.status == 
ManifestStatus::Existing)
+            && (entry.sequence_number.is_none() || 
entry.file_sequence_number.is_none())
+        {
+            return Err(Error::new(
+                ErrorKind::DataInvalid,
+                "Manifest entry with status Existing or Deleted should have 
sequence number",
+            ));
+        }
+
+        // Update the statistics
+        match entry.status {
+            ManifestStatus::Added => {
+                self.added_files += 1;
+                self.added_rows += entry.data_file.record_count;
+            }
+            ManifestStatus::Deleted => {
+                self.deleted_files += 1;
+                self.deleted_rows += entry.data_file.record_count;
+            }
+            ManifestStatus::Existing => {
+                self.existing_files += 1;
+                self.existing_rows += entry.data_file.record_count;
+            }
+        }
+        if entry.is_alive() {
+            if let Some(seq_num) = entry.sequence_number {
+                self.min_seq_num = Some(self.min_seq_num.map_or(seq_num, |v| 
min(v, seq_num)));
+            }
+        }
+        self.update_field_summary(&entry);
+
+        self.manifset_entries.push(entry);
+        Ok(())
+    }
+
+    /// Write manifest file and return it.
+    pub async fn to_manifest_file(mut self, metadata: ManifestMetadata) -> 
Result<ManifestFile> {

Review Comment:
   I have concerns with this api, since it's error prone. According to 
iceberg's spec, each manifest file should contains one type of data file: data 
or deletes. It's quite possible that the user pass different kinds entries in 
previouse method, then the metadata is different. My suggestion is to follow 
java/python's approach:
   1. A factory method like 
   ```
   pub fn new_v1_writer(...) {}
   pub fn new_v2_writer(...) {}
   pub fn new_v2_delete_writer(...) {}
   ``` 
   2. We could use things like trait or enum to abstract out common parts of 
different writers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to