Re: [PR] Core: Implement source-ids to deal with multi arguments transforms [iceberg]

via GitHub Fri, 25 Apr 2025 05:47:40 -0700


ajantha-bhat commented on code in PR #12897:
URL: https://github.com/apache/iceberg/pull/12897#discussion_r2060142013



##########
api/src/main/java/org/apache/iceberg/PartitionField.java:
##########
@@ -70,14 +100,14 @@ public boolean equals(Object other) {
     }
 
     PartitionField that = (PartitionField) other;
-    return sourceId == that.sourceId
+    return sourceIds.toString().equals(that.sourceIds.toString())

Review Comment:
   Just wondering why not `Objects.equals(sourceIds, that.sourceIds);` do we 
need the extra string conversion? 



##########
.palantir/revapi.yml:
##########
@@ -1177,6 +1177,10 @@ acceptedBreaks:
       old: "class org.apache.iceberg.Metrics"
       new: "class org.apache.iceberg.Metrics"
       justification: "Java serialization across versions is not guaranteed"
+    - code: "java.class.defaultSerializationChanged"

Review Comment:
   Doesn't this breaks backward compatibility? and this is not allowed for 
minor versions.
   
   We need to implement custom `writeObject` and `readObject`  to ensure we can 
serialize the field if it has only `sourceId`?
   
   ```
   private void readObject(ObjectInputStream in) throws IOException, 
ClassNotFoundException {
       in.defaultReadObject(); // read all default fields
   
       // Ensure backward compatibility
       if (sourceIds == null && sourceId != null) {
           sourceIds = Collections.singletonList(sourceId);
       }
   }
   
   private void writeObject(ObjectOutputStream out) throws IOException {
       out.defaultWriteObject(); // only write sourceIds
   }
   
   ```
   
   Note: I know we have handled the JSON serialization properly in 
`PartitionSpecParser`. Just wondering about default java serializer here.
   



##########
core/src/main/java/org/apache/iceberg/PartitionSpecParser.java:
##########
@@ -98,7 +101,16 @@ static void toJsonFields(UnboundPartitionSpec spec, 
JsonGenerator generator) thr
       generator.writeStartObject();
       generator.writeStringField(NAME, field.name());
       generator.writeStringField(TRANSFORM, field.transformAsString());
-      generator.writeNumberField(SOURCE_ID, field.sourceId());
+      if (field.sourceIds().size() == 1) {
+        generator.writeNumberField(SOURCE_ID, field.sourceId());
+      } else {
+        generator.writeFieldName(SOURCE_IDS);
+        generator.writeStartArray();
+        for (Integer value : field.sourceIds()) {
+          generator.writeNumber(value);
+        }
+        generator.writeEndArray();
+      }

Review Comment:
   nit: Iceberg follows the new line spacing for code blocks 
   https://iceberg.apache.org/contribute/?h=contribut#block-spacing



##########
api/src/main/java/org/apache/iceberg/PartitionSpec.java:
##########
@@ -601,17 +601,26 @@ public Builder alwaysNull(String sourceName) {
 
     // add a partition field with an auto-increment partition field id 
starting from
     // PARTITION_DATA_ID_START
+    Builder add(List<Integer> sourceIds, String name, Transform<?, ?> 
transform) {
+      return add(sourceIds, nextFieldId(), name, transform);
+    }
+
     Builder add(int sourceId, String name, Transform<?, ?> transform) {
-      return add(sourceId, nextFieldId(), name, transform);
+      return add(List.of(sourceId), name, transform);
     }
 
-    Builder add(int sourceId, int fieldId, String name, Transform<?, ?> 
transform) {
-      checkAndAddPartitionName(name, sourceId);
-      fields.add(new PartitionField(sourceId, fieldId, name, transform));
+    Builder add(List<Integer> sourceIds, int fieldId, String name, 
Transform<?, ?> transform) {
+      // we use the first entry in the source-ids list here
+      checkAndAddPartitionName(name, sourceIds.get(0));

Review Comment:
   How does this work for multi arg partition field? 
   Do we need a logic that accepts the `sourceIds` and resolve the name of 
multi arg transform? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Implement source-ids to deal with multi arguments transforms [iceberg]

Reply via email to