liurenjie1024 commented on code in PR #1928:
URL: https://github.com/apache/iceberg-rust/pull/1928#discussion_r2646407963


##########
crates/iceberg/src/spec/schema/mod.rs:
##########
@@ -127,6 +127,22 @@ impl SchemaBuilder {
 
     /// Builds the schema.
     pub fn build(self) -> Result<Schema> {
+        // If field IDs need to be reassigned, do it first before validation
+        if let Some(start_from) = self.reassign_field_ids_from {

Review Comment:
   What's the motivation of this change? The problem with this change is that 
it may make the error more difficult to read. For example, if user passed a non 
exists identifier id, originally the error message would be quite clear and 
easy to understand. But this change will confuse user.



##########
crates/iceberg/src/arrow/schema.rs:
##########
@@ -253,19 +266,54 @@ fn get_field_doc(field: &Field) -> Option<String> {
     None
 }
 
-struct ArrowSchemaConverter;
+struct ArrowSchemaConverter {
+    /// When set, the schema builder will reassign field IDs starting from 
this value
+    /// using level-order traversal (breadth-first).
+    reassign_field_ids_from: Option<i32>,
+    /// Generates unique placeholder IDs for fields before reassignment.
+    /// Required because `ReassignFieldIds` builds an old-to-new ID mapping
+    /// that expects unique input IDs.
+    temp_field_id_counter: i32,

Review Comment:
   ```suggestion
       next_field_id: i32,
   ```
   
   This is more clear.



##########
crates/iceberg/src/arrow/schema.rs:
##########
@@ -253,19 +266,54 @@ fn get_field_doc(field: &Field) -> Option<String> {
     None
 }
 
-struct ArrowSchemaConverter;
+struct ArrowSchemaConverter {
+    /// When set, the schema builder will reassign field IDs starting from 
this value
+    /// using level-order traversal (breadth-first).
+    reassign_field_ids_from: Option<i32>,
+    /// Generates unique placeholder IDs for fields before reassignment.
+    /// Required because `ReassignFieldIds` builds an old-to-new ID mapping
+    /// that expects unique input IDs.
+    temp_field_id_counter: i32,
+}
 
 impl ArrowSchemaConverter {
     fn new() -> Self {
-        Self {}
+        Self {
+            reassign_field_ids_from: None,
+            temp_field_id_counter: 0,
+        }
+    }
+
+    fn new_with_field_ids_from(start_from: i32) -> Self {
+        Self {
+            reassign_field_ids_from: Some(start_from),
+            temp_field_id_counter: 0,
+        }
+    }
+
+    fn get_field_id(&mut self, field: &Field) -> Result<i32> {
+        if self.reassign_field_ids_from.is_some() {
+            // Field IDs will be reassigned by the schema builder.
+            // We need unique temporary IDs because ReassignFieldIds builds an
+            // old->new ID mapping that requires unique input IDs.
+            let temp_id = self.temp_field_id_counter;
+            self.temp_field_id_counter += 1;
+            Ok(temp_id)
+        } else {
+            get_field_id_from_metadata(field)

Review Comment:
   This is a significant hehavior change, please add some doc to highlight it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to