joshuarobinson opened a new issue, #6434: URL: https://github.com/apache/iceberg/issues/6434
### Feature Request / Improvement Currently, pyiceberg 0.2.0 fails on creating a table scan for any table (that I have at least) with UUID columns. The root problem seems to be (thanks @Fokko for explanation) that UUID is encoded as a string in avro but as a fixed-width type in Iceberg. The failure looks like this: ``` ValueError: Unknown logical/physical type combination: {'type': 'fixed', 'name': 'uuid_fixed', 'size': 16, 'logicalType': 'uuid'} ``` The full stacktrace: ``` Traceback (most recent call last): File "/read.py", line 16, in <module> print(tbl.scan(selected_fields=("path","device")).to_arrow().to_pandas().head()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 350, in to_arrow for task in self.plan_files(): File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 335, in plan_files yield from (FileScanTask(file) for file in matching_partition_files) File "/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 335, in <genexpr> yield from (FileScanTask(file) for file in matching_partition_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 149, in <genexpr> return (entry.data_file for entry in live_entries(input_file)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 145, in <genexpr> return (entry for entry in read_manifest_entry(input_file) if entry.status != ManifestEntryStatus.DELETED) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 137, in read_manifest_entry with AvroFile(input_file) as reader: File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 136, in __enter__ self.schema = self.header.get_schema() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", line 85, in get_schema return AvroSchemaConversion().avro_to_iceberg(avro_schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 119, in avro_to_iceberg return Schema(*[self._convert_field(field) for field in avro_schema["fields"]], schema_id=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 119, in <listcomp> return Schema(*[self._convert_field(field) for field in avro_schema["fields"]], schema_id=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 227, in _convert_field field_type=self._convert_schema(plain_type), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 196, in _convert_schema return self._convert_record_type(avro_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 284, in _convert_record_type return StructType(*[self._convert_field(field) for field in record_type["fields"]]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 284, in <listcomp> return StructType(*[self._convert_field(field) for field in record_type["fields"]]) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 227, in _convert_field field_type=self._convert_schema(plain_type), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 196, in _convert_schema return self._convert_record_type(avro_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 284, in _convert_record_type return StructType(*[self._convert_field(field) for field in record_type["fields"]]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 284, in <listcomp> return StructType(*[self._convert_field(field) for field in record_type["fields"]]) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 227, in _convert_field field_type=self._convert_schema(plain_type), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 189, in _convert_schema return self._convert_logical_type(avro_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", line 368, in _convert_logical_type raise ValueError(f"Unknown logical/physical type combination: {avro_logical_type}") ValueError: Unknown logical/physical type combination: {'type': 'fixed', 'name': 'uuid_fixed', 'size': 16, 'logicalType': 'uuid'} ``` The problem appears even if I'm not selecting the UUID columns as part of the scan. ### Query engine Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org