xsfa opened a new issue, #1477:
URL: https://github.com/apache/iceberg-python/issues/1477
### Apache Iceberg version
0.8.1 (latest release)
### Please describe the bug 🐞
I think PyArrow is receiving misformatted data from the file metadata,
causing me to be unable to call any of the file functions. Could this be caused
by my Iceberg table format or is it a genuine bug? I have confirmed that my
table is a valid Iceberg V2 table and readable.
Code:
```python
test_table = catalog.load_table("test.table")
current_snapshot_id = test_table.metadata.current_snapshot_id
test_table.inspect.files(current_snapshot_id)
```
Full Stack Trace:
```python
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
Input [In [32]](vscode-notebook-cell:?execution_count=32), in <cell line:
17>()
[14](vscode-notebook-cell:?execution_count=32&line=14)
current_snapshot_id = test_table.metadata.current_snapshot_id
[15](vscode-notebook-cell:?execution_count=32&line=15)
print(current_snapshot_id)
---> [17](vscode-notebook-cell:?execution_count=32&line=17)
test_table.inspect.files(current_snapshot_id)
File
~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582, in
InspectTable.files(self, snapshot_id)
[581](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:581)
def files(self, snapshot_id: Optional[int] = None) -> "pa.Table":
-->
[582](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582)
return self._files(snapshot_id)
File
~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576, in
InspectTable._files(self, snapshot_id, data_file_filter)
[541](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:541)
readable_metrics = {
[542](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:542)
schema.find_column_name(field.field_id): {
[543](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:543)
"column_size": column_sizes.get(field.field_id),
(...)
[554](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:554)
for field in self.tbl.metadata.schema().fields
[555](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:555)
}
[556](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:556)
files.append({
[557](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:557)
"content": data_file.content,
[558](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:558)
"file_path": data_file.file_path,
(...)
[573](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:573)
"readable_metrics": readable_metrics,
[574](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:574)
})
-->
[576](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576)
return pa.Table.from_pylist(
[577](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:577)
files,
[578](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:578)
schema=files_schema,
[579](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:579)
)
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3700, in
pyarrow.lib.Table.from_pylist()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:5228, in
pyarrow.lib._from_pylist()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3575, in
pyarrow.lib.Table.from_arrays()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:1398, in
pyarrow.lib._sanitize_arrays()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:350, in
pyarrow.lib.asarray()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:320, in
pyarrow.lib.array()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:39, in
pyarrow.lib._sequence_to_array()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:144, in
pyarrow.lib.pyarrow_internal_check_status()
File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:123, in
pyarrow.lib.check_status()
ArrowTypeError: Could not convert {1: 145, 2: 545, 3: 132, 4: 91, 5: 92, 6:
80, 7: 42, 8: 118, 9: 146, 10: 108, 11: 188, 12: 112, 13: 169, 14: 42, 15: 166,
16: 1248, 17: 57, 18: 38, 19: 81, 20: 120, 21: 42, 22: 129, 23: 90, 24: 38, 25:
38, 26: 80, 27: 544, 28: 112, 29: 79, 30: 131, 31: 71, 32: 70, 33: 70} with
type dict: was not a sequence or recognized null for conversion to list type
```
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [X] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]