xsfa opened a new issue, #1477: URL: https://github.com/apache/iceberg-python/issues/1477
### Apache Iceberg version 0.8.1 (latest release) ### Please describe the bug 🐞 I think PyArrow is receiving misformatted data from the file metadata, causing me to be unable to call any of the file functions. Could this be caused by my Iceberg table format or is it a genuine bug? I have confirmed that my table is a valid Iceberg V2 table and readable. Code: ```python test_table = catalog.load_table("test.table") current_snapshot_id = test_table.metadata.current_snapshot_id test_table.inspect.files(current_snapshot_id) ``` Full Stack Trace: ```python --------------------------------------------------------------------------- ArrowTypeError Traceback (most recent call last) Input [In [32]](vscode-notebook-cell:?execution_count=32), in <cell line: 17>() [14](vscode-notebook-cell:?execution_count=32&line=14) current_snapshot_id = test_table.metadata.current_snapshot_id [15](vscode-notebook-cell:?execution_count=32&line=15) print(current_snapshot_id) ---> [17](vscode-notebook-cell:?execution_count=32&line=17) test_table.inspect.files(current_snapshot_id) File ~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582, in InspectTable.files(self, snapshot_id) [581](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:581) def files(self, snapshot_id: Optional[int] = None) -> "pa.Table": --> [582](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:582) return self._files(snapshot_id) File ~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576, in InspectTable._files(self, snapshot_id, data_file_filter) [541](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:541) readable_metrics = { [542](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:542) schema.find_column_name(field.field_id): { [543](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:543) "column_size": column_sizes.get(field.field_id), (...) [554](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:554) for field in self.tbl.metadata.schema().fields [555](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:555) } [556](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:556) files.append({ [557](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:557) "content": data_file.content, [558](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:558) "file_path": data_file.file_path, (...) [573](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:573) "readable_metrics": readable_metrics, [574](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:574) }) --> [576](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:576) return pa.Table.from_pylist( [577](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:577) files, [578](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:578) schema=files_schema, [579](https://file+.vscode-resource.vscode-cdn.net/Users/tshenkute/iceberg-monitoring/jobs/~/opt/miniconda3/lib/python3.9/site-packages/pyiceberg/table/inspect.py:579) ) File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3700, in pyarrow.lib.Table.from_pylist() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:5228, in pyarrow.lib._from_pylist() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:3575, in pyarrow.lib.Table.from_arrays() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi:1398, in pyarrow.lib._sanitize_arrays() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:350, in pyarrow.lib.asarray() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:320, in pyarrow.lib.array() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi:39, in pyarrow.lib._sequence_to_array() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status() File ~/opt/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi:123, in pyarrow.lib.check_status() ArrowTypeError: Could not convert {1: 145, 2: 545, 3: 132, 4: 91, 5: 92, 6: 80, 7: 42, 8: 118, 9: 146, 10: 108, 11: 188, 12: 112, 13: 169, 14: 42, 15: 166, 16: 1248, 17: 57, 18: 38, 19: 81, 20: 120, 21: 42, 22: 129, 23: 90, 24: 38, 25: 38, 26: 80, 27: 544, 28: 112, 29: 79, 30: 131, 31: 71, 32: 70, 33: 70} with type dict: was not a sequence or recognized null for conversion to list type ``` ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org