Re: [PR] Construct a writer tree [iceberg-python]

via GitHub Tue, 10 Oct 2023 05:06:51 -0700


Fokko commented on code in PR #40:
URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352317115



##########
pyiceberg/avro/resolver.py:
##########
@@ -233,7 +256,93 @@ def skip(self, decoder: BinaryDecoder) -> None:
         pass
 
 
-class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]):
+class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]):
+    def schema(self, write_schema: Schema, data_schema: Optional[IcebergType], 
result: Writer) -> Writer:
+        return result
+
+    def struct(self, write_schema: StructType, data_struct: 
Optional[IcebergType], field_writers: List[Writer]) -> Writer:
+        if not isinstance(data_struct, StructType):
+            raise ResolveError(f"File/write schema are not aligned for struct, 
got {data_struct}")
+
+        data_positions: Dict[int, int] = {field.field_id: pos for pos, field 
in enumerate(data_struct.fields)}
+        results: List[Tuple[Optional[int], Writer]] = []
+
+        for writer, write_field in zip(field_writers, write_schema.fields):
+            if write_field.field_id in data_positions:
+                results.append((data_positions[write_field.field_id], writer))
+            else:
+                # There is a default value
+                if write_field.write_default is not None:
+                    # The field is not in the record, but there is a write 
default value
+                    results.append((None, DefaultWriter(writer=writer, 
value=write_field.write_default)))  # type: ignore
+                elif write_field.required:
+                    raise ValueError(f"Field is required, and there is no 
write default: {write_field}")

Review Comment:
   I think this is correct, and let me illustrate this with an example:
   
   
![image](https://github.com/apache/iceberg-python/assets/1134248/df2e5350-dbdc-493c-b6f2-4e409464d339)
   
   All the three branches:
   
   - `if`: The field is in the `record_schema` and is part of the write schema. 
It will produce a `(0, IntegerWriter())` for the `0: status`.
   - `elif`: The field is not in the `record_schema`, but has a write default 
(we use this to write the `block_size_in_bytes` since it is required:
   
![image](https://github.com/apache/iceberg-python/assets/1134248/0491fcce-43da-4ec5-b747-50aac3908f85)
   - `else`: The else is not there anymore, and this branch is taken for the 
`sequence_number` and `file_sequence_number` where the field is part of the 
`record_schema`, but not part of the `file_schema`. Therefore we don't need to 
write any null bytes. For the read-case, this is different, and we would need a 
reader since we need to skip over the data in the file, but for the write case, 
we can just ignore certain fields because they are not part of the 
`file_schema`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Construct a writer tree [iceberg-python]

Reply via email to