rdblue commented on code in PR #12105: URL: https://github.com/apache/iceberg/pull/12105#discussion_r1931255603
########## core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java: ########## @@ -35,22 +39,55 @@ * fields. This also does not allow updating or replacing the metadata for the unshredded object, * which could require recursively rewriting field IDs. */ -class ShreddedObject implements VariantObject { - private final SerializedMetadata metadata; - private final SerializedObject unshredded; +public class ShreddedObject implements VariantObject { + private final VariantMetadata metadata; + private final VariantObject unshredded; private final Map<String, VariantValue> shreddedFields = Maps.newHashMap(); + private final Set<String> removedFields = Sets.newHashSet(); private SerializationState serializationState = null; - ShreddedObject(SerializedMetadata metadata) { + ShreddedObject(VariantMetadata metadata) { this.metadata = metadata; this.unshredded = null; } - ShreddedObject(SerializedObject unshredded) { - this.metadata = unshredded.metadata(); + ShreddedObject(VariantMetadata metadata, VariantObject unshredded) { + this.metadata = metadata; this.unshredded = unshredded; } + @VisibleForTesting + VariantMetadata metadata() { + return metadata; + } + + private Set<String> nameSet() { + Set<String> names = Sets.newHashSet(shreddedFields.keySet()); + + if (unshredded != null) { + Iterables.addAll(names, unshredded.fieldNames()); + } + + names.removeAll(removedFields); + + return names; + } + + @Override + public Iterable<String> fieldNames() { + return nameSet(); + } + + @Override + public int numFields() { + return nameSet().size(); + } + + public void remove(String field) { Review Comment: The purpose of this class is to create objects from an unshredded, serialized variant in `value` and the fields in its corresponding `typed_value`. The serialized object is used to construct the `ShreddedObject` instance and then the shredded fields are set through `put`. This is intended to handle fields that are "missing" because the field's `value` and `typed_value` are null. In those cases, we need to basically add a `null` value to the `shreddedFields` map. We could do that, but the map implementations that we use (from Guava) don't allow `null` values. Even if we used a map that could handle `null`, we would have to handle those nulls in places like `nameSet` and in serialization. That way we correctly store that the field was missing according to the shredding spec, rather than defined and equal to a Variant null. I thought it was cleaner to handle missing fields by calling `remove` for the field name to show that it is not present in the shredded fields. I also think that using a separate set of field names makes the most sense for handling these instead of using `null` as a sentinel value in the `shreddedFields` map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org