xsfa opened a new issue, #1477:
URL: https://github.com/apache/iceberg-python/issues/1477

   ### Apache Iceberg version
   
   0.8.1 (latest release)
   
   ### Please describe the bug 🐞
   
   I think PyArrow is receiving misformatted data from the file metadata, 
causing me to be unable to call any of the file functions. Could this be caused 
by my Iceberg table format or is it a genuine bug? I have confirmed that my 
table is a valid Iceberg V2 table and readable.
   
   Code:
   
   ```python
   test_table = catalog.load_table("test.table")
   current_snapshot_id = test_table.metadata.current_snapshot_id
   test_table.inspect.files(current_snapshot_id)
   ```
   
   
   Full Stack Trace:
   ```python
   ---------------------------------------------------------------------------
   ArrowTypeError                            Traceback (most recent call last)
   Input [In [32]](vscode-notebook-cell:?execution_count=32), in <cell line: 
17>()
        [14](vscode-notebook-cell:?execution_count=32&line=14) 
current_snapshot_id = test_table.metadata.current_snapshot_id
        [15](vscode-notebook-cell:?execution_count=32&line=15) 
print(current_snapshot_id)
   ---> [17](vscode-notebook-cell:?execution_count=32&line=17) 
test_table.inspect.files(current_snapshot_id)
   
   File 
~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582, in 
InspectTable.files(self, snapshot_id)
       
[581](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:581)
 def files(self, snapshot_id: Optional[int] = None) -> "pa.Table":
   --> 
[582](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582)
     return self._files(snapshot_id)
   
   File 
~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576, in 
InspectTable._files(self, snapshot_id, data_file_filter)
       
[541](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:541)
         readable_metrics = {
       
[542](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:542)
             schema.find_column_name(field.field_id): {
       
[543](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:543)
                 "column_size": column_sizes.get(field.field_id),
      (...)
       
[554](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:554)
             for field in self.tbl.metadata.schema().fields
       
[555](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:555)
         }
       
[556](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:556)
         files.append({
       
[557](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:557)
             "content": data_file.content,
       
[558](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:558)
             "file_path": data_file.file_path,
      (...)
       
[573](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:573)
             "readable_metrics": readable_metrics,
       
[574](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:574)
         })
   --> 
[576](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576)
 return pa.Table.from_pylist(
       
[577](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:577)
     files,
       
[578](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:578)
     schema=files_schema,
       
[579](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:579)
 )
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3700, in 
pyarrow.lib.Table.from_pylist()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:5228, in 
pyarrow.lib._from_pylist()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3575, in 
pyarrow.lib.Table.from_arrays()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:1398, in 
pyarrow.lib._sanitize_arrays()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:350, in 
pyarrow.lib.asarray()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:320, in 
pyarrow.lib.array()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:39, in 
pyarrow.lib._sequence_to_array()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:144, in 
pyarrow.lib.pyarrow_internal_check_status()
   
   File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:123, in 
pyarrow.lib.check_status()
   
   ArrowTypeError: Could not convert {1: 145, 2: 545, 3: 132, 4: 91, 5: 92, 6: 
80, 7: 42, 8: 118, 9: 146, 10: 108, 11: 188, 12: 112, 13: 169, 14: 42, 15: 166, 
16: 1248, 17: 57, 18: 38, 19: 81, 20: 120, 21: 42, 22: 129, 23: 90, 24: 38, 25: 
38, 26: 80, 27: 544, 28: 112, 29: 79, 30: 131, 31: 71, 32: 70, 33: 70} with 
type dict: was not a sequence or recognized null for conversion to list type
   ```
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [X] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to