rdblue commented on code in PR #6525:
URL: https://github.com/apache/iceberg/pull/6525#discussion_r1066366701
##########
python/pyiceberg/avro/resolver.py:
##########
@@ -109,38 +109,46 @@ def resolve(
class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]):
- read_types: Optional[Dict[int, Callable[[Schema], StructProtocol]]]
+ read_types: Dict[int, Type[StructProtocol]]
+ context: List[int]
- def __init__(self, read_types: Optional[Dict[int, Callable[[Schema],
StructProtocol]]]):
+ def __init__(self, read_types: Dict[int, Type[StructProtocol]] =
EMPTY_DICT) -> None:
self.read_types = read_types
+ self.context = []
def schema(self, schema: Schema, expected_schema: Optional[IcebergType],
result: Reader) -> Reader:
return result
+ def before_field(self, field: NestedField, field_partner:
Optional[NestedField]) -> None:
+ self.context.append(field.field_id)
+
+ def after_field(self, field: NestedField, field_partner:
Optional[NestedField]) -> None:
+ self.context.pop()
+
def struct(self, struct: StructType, expected_struct:
Optional[IcebergType], field_readers: List[Reader]) -> Reader:
+ # -1 indicates the struct root
+ read_struct_id = self.context[-1] if len(self.context) > 0 else -1
+ struct_callable = self.read_types.get(read_struct_id, Record)
Review Comment:
I think that the implementation is correct for the case you describe and we
do want to support that. What I think we need to support for the metadata
structures is different because we have the write schema, read schema, and
record schema. When the read schema and the record schema don't match, we let
the record handle it.
We don't want to get rid of the read schema because that removes the ability
to project columns of the record schema. We can't get rid of the record schema
because the record implicitly has one. We just need to account for cases where
they differ.
It's also better to pass the read schema because we can make generic records
more friendly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]