joshuarobinson opened a new issue, #6434:
URL: https://github.com/apache/iceberg/issues/6434
### Feature Request / Improvement
Currently, pyiceberg 0.2.0 fails on creating a table scan for any table
(that I have at least) with UUID columns.
The root problem seems to be (thanks @Fokko for explanation) that UUID is
encoded as a string in avro but as a fixed-width type in Iceberg.
The failure looks like this:
```
ValueError: Unknown logical/physical type combination: {'type': 'fixed',
'name': 'uuid_fixed', 'size': 16, 'logicalType': 'uuid'}
```
The full stacktrace:
```
Traceback (most recent call last):
File "/read.py", line 16, in <module>
print(tbl.scan(selected_fields=("path","device")).to_arrow().to_pandas().head())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line
350, in to_arrow
for task in self.plan_files():
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line
335, in plan_files
yield from (FileScanTask(file) for file in matching_partition_files)
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line
335, in <genexpr>
yield from (FileScanTask(file) for file in matching_partition_files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line
149, in <genexpr>
return (entry.data_file for entry in live_entries(input_file))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line
145, in <genexpr>
return (entry for entry in read_manifest_entry(input_file) if
entry.status != ManifestEntryStatus.DELETED)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line
137, in read_manifest_entry
with AvroFile(input_file) as reader:
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py",
line 136, in __enter__
self.schema = self.header.get_schema()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py",
line 85, in get_schema
return AvroSchemaConversion().avro_to_iceberg(avro_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 119, in avro_to_iceberg
return Schema(*[self._convert_field(field) for field in
avro_schema["fields"]], schema_id=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 119, in <listcomp>
return Schema(*[self._convert_field(field) for field in
avro_schema["fields"]], schema_id=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 227, in _convert_field
field_type=self._convert_schema(plain_type),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 196, in _convert_schema
return self._convert_record_type(avro_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 284, in _convert_record_type
return StructType(*[self._convert_field(field) for field in
record_type["fields"]])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 284, in <listcomp>
return StructType(*[self._convert_field(field) for field in
record_type["fields"]])
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 227, in _convert_field
field_type=self._convert_schema(plain_type),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 196, in _convert_schema
return self._convert_record_type(avro_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 284, in _convert_record_type
return StructType(*[self._convert_field(field) for field in
record_type["fields"]])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 284, in <listcomp>
return StructType(*[self._convert_field(field) for field in
record_type["fields"]])
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 227, in _convert_field
field_type=self._convert_schema(plain_type),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 189, in _convert_schema
return self._convert_logical_type(avro_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py",
line 368, in _convert_logical_type
raise ValueError(f"Unknown logical/physical type combination:
{avro_logical_type}")
ValueError: Unknown logical/physical type combination: {'type': 'fixed',
'name': 'uuid_fixed', 'size': 16, 'logicalType': 'uuid'}
```
The problem appears even if I'm not selecting the UUID columns as part of
the scan.
### Query engine
Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]