agkphysics opened a new issue, #46328:
URL: https://github.com/apache/arrow/issues/46328
### Describe the bug, including details regarding any error messages,
version, and platform.
This follows on from a comment I left on #39914. Loading will fail if
`columns=` is specified and there is a list-type column which is not in the
specified `columns=` argument:
```python
import pandas as pd
import pyarrow as pa
a = pd.Series(pa.array([[1, 2, 3]]),
dtype=pd.ArrowDtype(pa.list_(pa.int64())))
b = pd.Series(pa.array([1]), dtype=pd.ArrowDtype(pa.int64()))
df = pd.DataFrame({"a": a, "b": b})
df.to_parquet("test.parquet", index=False)
pd.read_parquet("test.parquet", dtype_backend="pyarrow") # Works
pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["a"]) #
Works
pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"]) #
Fails
```
Results in
```
Traceback (most recent call last):
File "/.../test.py", line 10, in <module>
pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"])
# Fails
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line
667, in read_parquet
return impl.read(
^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line
281, in read
result = pa_table.to_pandas(**to_pandas_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 889, in
pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 5132, in pyarrow.lib.Table._to_pandas
File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py",
line 796, in table_to_dataframe
ext_columns_dtypes = _get_extension_dtypes(
^^^^^^^^^^^^^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py",
line 899, in _get_extension_dtypes
pandas_dtype = _pandas_api.pandas_dtype(dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/pandas-shim.pxi", line 150, in
pyarrow.lib._PandasAPIShim.pandas_dtype
File "pyarrow/pandas-shim.pxi", line 153, in
pyarrow.lib._PandasAPIShim.pandas_dtype
File
"/.../.venv/lib/python3.12/site-packages/pandas/core/dtypes/common.py", line
1645, in pandas_dtype
npdtype = np.dtype(dtype)
^^^^^^^^^^^^^^^
TypeError: data type 'list<item: int64>[pyarrow]' not understood
```
### Component(s)
Python, Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]