Re: [PR] Core: Update variant class visibility [iceberg]

via GitHub Mon, 27 Jan 2025 14:45:07 -0800


rdblue commented on code in PR #12105:
URL: https://github.com/apache/iceberg/pull/12105#discussion_r1931255603



##########
core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java:
##########
@@ -35,22 +39,55 @@
  * fields. This also does not allow updating or replacing the metadata for the 
unshredded object,
  * which could require recursively rewriting field IDs.
  */
-class ShreddedObject implements VariantObject {
-  private final SerializedMetadata metadata;
-  private final SerializedObject unshredded;
+public class ShreddedObject implements VariantObject {
+  private final VariantMetadata metadata;
+  private final VariantObject unshredded;
   private final Map<String, VariantValue> shreddedFields = Maps.newHashMap();
+  private final Set<String> removedFields = Sets.newHashSet();
   private SerializationState serializationState = null;
 
-  ShreddedObject(SerializedMetadata metadata) {
+  ShreddedObject(VariantMetadata metadata) {
     this.metadata = metadata;
     this.unshredded = null;
   }
 
-  ShreddedObject(SerializedObject unshredded) {
-    this.metadata = unshredded.metadata();
+  ShreddedObject(VariantMetadata metadata, VariantObject unshredded) {
+    this.metadata = metadata;
     this.unshredded = unshredded;
   }
 
+  @VisibleForTesting
+  VariantMetadata metadata() {
+    return metadata;
+  }
+
+  private Set<String> nameSet() {
+    Set<String> names = Sets.newHashSet(shreddedFields.keySet());
+
+    if (unshredded != null) {
+      Iterables.addAll(names, unshredded.fieldNames());
+    }
+
+    names.removeAll(removedFields);
+
+    return names;
+  }
+
+  @Override
+  public Iterable<String> fieldNames() {
+    return nameSet();
+  }
+
+  @Override
+  public int numFields() {
+    return nameSet().size();
+  }
+
+  public void remove(String field) {

Review Comment:
   The purpose of this class is to create objects from an unshredded, 
serialized variant in `value` and the fields in its corresponding 
`typed_value`. The serialized object is used to construct the `ShreddedObject` 
instance and then the shredded fields are set through `put`.
   
   This is intended to handle fields that are "missing" because the field's 
`value` and `typed_value` are null. In those cases, we need to basically add a 
`null` value to the `shreddedFields` map. We could do that, but the map 
implementations that we use (from Guava) don't allow `null` values. Even if we 
used a map that could handle `null`, we would have to handle those nulls in 
places like `nameSet` and in serialization. That way we correctly store that 
the field was missing according to the shredding spec, rather than defined and 
equal to a Variant null.
   
   I thought it was cleaner to handle missing fields by calling `remove` for 
the field name to show that it is not present in the shredded fields. I also 
think that using a separate set of field names makes the most sense for 
handling these instead of using `null` as a sentinel value in the 
`shreddedFields` map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Update variant class visibility [iceberg]

Reply via email to