liurenjie1024 commented on code in PR #615:
URL: https://github.com/apache/iceberg-rust/pull/615#discussion_r1774372777


##########
crates/iceberg/src/spec/schema.rs:
##########
@@ -943,6 +965,122 @@ impl SchemaVisitor for PruneColumn {
     }
 }
 
+struct ReassignFieldIds {
+    next_field_id: i32,
+    old_to_new_id: HashMap<i32, i32>,
+}
+
+// We are not using the visitor here, as post order traversal is not desired.
+// Instead we want to re-assign all fields on one level first before diving 
deeper.
+impl ReassignFieldIds {
+    fn new(start_from: i32) -> Self {
+        Self {
+            next_field_id: start_from,
+            old_to_new_id: HashMap::new(),
+        }
+    }
+
+    fn reassign_field_ids(&mut self, fields: Vec<NestedFieldRef>) -> 
Vec<NestedFieldRef> {
+        // Visit fields on the same level first
+        let outer_fields = fields
+            .into_iter()
+            .map(|field| {
+                self.old_to_new_id.insert(field.id, self.next_field_id);

Review Comment:
   We should check exisistence of `field.id` here since we don't trust user 
provide field ids, and it may duplicate.



##########
crates/iceberg/src/spec/schema.rs:
##########
@@ -105,7 +116,16 @@ impl SchemaBuilder {
     }
 
     /// Builds the schema.
-    pub fn build(self) -> Result<Schema> {
+    pub fn build(mut self) -> Result<Schema> {
+        if let Some(start_from) = self.reassign_field_ids_from {

Review Comment:
   I'm thinking that maybe we should put this part in the end of this method? 
The build process has some check, for example identifier field type, uniqueness 
of field name. Though current approach is correct, the error reporting maybe 
confusing. For example when some field doesn't pass identifier type check, 
current approach reports reassigned id, rather original id.



##########
crates/iceberg/src/spec/schema.rs:
##########
@@ -86,6 +87,16 @@ impl SchemaBuilder {
         self
     }
 
+    /// Reassign all field-ids (nested) on build.
+    /// If `start_from` is provided, it will start reassigning from that id 
(inclusive).
+    /// If not provided, it will start from 0.
+    ///
+    /// All specified aliases and identifier fields will be updated to the new 
field-ids.
+    pub fn with_reassigned_field_ids(mut self, start_from: Option<i32>) -> 
Self {
+        self.reassign_field_ids_from = Some(start_from.unwrap_or(0));

Review Comment:
   I also think in this case we shoud enforce user to pass an `u32` rather an 
`Option<i32>`, since if this method is called, we always reassign field ids.



##########
crates/iceberg/src/spec/schema.rs:
##########
@@ -2237,4 +2377,168 @@ table {
         let schema = table_schema_simple().0;
         assert_eq!(3, schema.highest_field_id());
     }
+
+    #[test]

Review Comment:
   Thanks for the tests!



##########
crates/iceberg/src/spec/schema.rs:
##########
@@ -86,6 +87,16 @@ impl SchemaBuilder {
         self
     }
 
+    /// Reassign all field-ids (nested) on build.
+    /// If `start_from` is provided, it will start reassigning from that id 
(inclusive).
+    /// If not provided, it will start from 0.
+    ///
+    /// All specified aliases and identifier fields will be updated to the new 
field-ids.
+    pub fn with_reassigned_field_ids(mut self, start_from: Option<i32>) -> 
Self {
+        self.reassign_field_ids_from = Some(start_from.unwrap_or(0));

Review Comment:
   Also, I think this method is currently only used by `TableMetadata` builder? 
Could we make this method `pub(crate)` for now to avoid exposing to user?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to