rdblue commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1349762539
########## pyiceberg/avro/resolver.py: ########## @@ -233,7 +255,107 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): + def schema(self, schema: Schema, expected_schema: Optional[IcebergType], result: Writer) -> Writer: + return result + + def struct(self, struct: StructType, provided_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: + if not isinstance(provided_struct, StructType): + raise ResolveError(f"File/write schema are not aligned for struct, got {provided_struct}") + + provided_struct_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(provided_struct.fields)} + + results: List[Tuple[Optional[int], Writer]] = [] + iter(field_writers) + + for pos, write_field in enumerate(struct.fields): + if write_field.field_id in provided_struct_positions: + results.append((provided_struct_positions[write_field.field_id], field_writers[pos])) + else: + # There is a default value + if isinstance(write_field, NestedField) and write_field.write_default is not None: + # The field is not in the record, but there is a write default value + default_writer = DefaultWriter( + writer=visit(write_field.field_type, CONSTRUCT_WRITER_VISITOR), value=write_field.write_default Review Comment: I'm a little concerned by adding support for the write default. The problem is that we need to resolve the write default with child fields. For example, if you have a field with type `s struct<a int, b int>` and default `Struct(0, 0)`, there are changes that can make the default invalid. For example, `ADD COLUMN s.c INT NOT NULL DEFAULT 0`. That would probably add a new `NestedField(name='c', field_type=IntType(), initial_default=0, write_default=0)` but it doesn't necessarily update the write or initial default for `s`. Those defaults need to be resolved somehow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org