joshuarobinson opened a new issue, #6434:
URL: https://github.com/apache/iceberg/issues/6434

   ### Feature Request / Improvement
   
   Currently, pyiceberg 0.2.0 fails on creating a table scan for any table 
(that I have at least) with UUID columns.
   
   The root problem seems to be (thanks @Fokko for explanation) that UUID is 
encoded as a string in avro but as a fixed-width type in Iceberg.
   
   The failure looks like this:
   ```
   ValueError: Unknown logical/physical type combination: {'type': 'fixed', 
'name': 'uuid_fixed', 'size': 16, 'logicalType': 'uuid'}
   ```
   
   The full stacktrace:
   ```
   Traceback (most recent call last):
     File "/read.py", line 16, in <module>
       
print(tbl.scan(selected_fields=("path","device")).to_arrow().to_pandas().head())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 
350, in to_arrow
       for task in self.plan_files():
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 
335, in plan_files
       yield from (FileScanTask(file) for file in matching_partition_files)
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/table/__init__.py", line 
335, in <genexpr>
       yield from (FileScanTask(file) for file in matching_partition_files)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 
149, in <genexpr>
       return (entry.data_file for entry in live_entries(input_file))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 
145, in <genexpr>
       return (entry for entry in read_manifest_entry(input_file) if 
entry.status != ManifestEntryStatus.DELETED)
              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/manifest.py", line 
137, in read_manifest_entry
       with AvroFile(input_file) as reader:
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", 
line 136, in __enter__
       self.schema = self.header.get_schema()
                     ^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/pyiceberg/avro/file.py", 
line 85, in get_schema
       return AvroSchemaConversion().avro_to_iceberg(avro_schema)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 119, in avro_to_iceberg
       return Schema(*[self._convert_field(field) for field in 
avro_schema["fields"]], schema_id=1)
                      
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 119, in <listcomp>
       return Schema(*[self._convert_field(field) for field in 
avro_schema["fields"]], schema_id=1)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 227, in _convert_field
       field_type=self._convert_schema(plain_type),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 196, in _convert_schema
       return self._convert_record_type(avro_type)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 284, in _convert_record_type
       return StructType(*[self._convert_field(field) for field in 
record_type["fields"]])
                          
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 284, in <listcomp>
       return StructType(*[self._convert_field(field) for field in 
record_type["fields"]])
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 227, in _convert_field
       field_type=self._convert_schema(plain_type),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 196, in _convert_schema
       return self._convert_record_type(avro_type)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 284, in _convert_record_type
       return StructType(*[self._convert_field(field) for field in 
record_type["fields"]])
                          
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 284, in <listcomp>
       return StructType(*[self._convert_field(field) for field in 
record_type["fields"]])
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 227, in _convert_field
       field_type=self._convert_schema(plain_type),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 189, in _convert_schema
       return self._convert_logical_type(avro_type)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/pyiceberg/utils/schema_conversion.py", 
line 368, in _convert_logical_type
       raise ValueError(f"Unknown logical/physical type combination: 
{avro_logical_type}")
   ValueError: Unknown logical/physical type combination: {'type': 'fixed', 
'name': 'uuid_fixed', 'size': 16, 'logicalType': 'uuid'}
   ```
   
   The problem appears even if I'm not selecting the UUID columns as part of 
the scan.
   
   ### Query engine
   
   Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to