agkphysics opened a new issue, #46328: URL: https://github.com/apache/arrow/issues/46328
### Describe the bug, including details regarding any error messages, version, and platform. This follows on from a comment I left on #39914. Loading will fail if `columns=` is specified and there is a list-type column which is not in the specified `columns=` argument: ```python import pandas as pd import pyarrow as pa a = pd.Series(pa.array([[1, 2, 3]]), dtype=pd.ArrowDtype(pa.list_(pa.int64()))) b = pd.Series(pa.array([1]), dtype=pd.ArrowDtype(pa.int64())) df = pd.DataFrame({"a": a, "b": b}) df.to_parquet("test.parquet", index=False) pd.read_parquet("test.parquet", dtype_backend="pyarrow") # Works pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["a"]) # Works pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"]) # Fails ``` Results in ``` Traceback (most recent call last): File "/.../test.py", line 10, in <module> pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"]) # Fails ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 667, in read_parquet return impl.read( ^^^^^^^^^^ File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 281, in read result = pa_table.to_pandas(**to_pandas_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 889, in pyarrow.lib._PandasConvertible.to_pandas File "pyarrow/table.pxi", line 5132, in pyarrow.lib.Table._to_pandas File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 796, in table_to_dataframe ext_columns_dtypes = _get_extension_dtypes( ^^^^^^^^^^^^^^^^^^^^^^ File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 899, in _get_extension_dtypes pandas_dtype = _pandas_api.pandas_dtype(dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/pandas-shim.pxi", line 150, in pyarrow.lib._PandasAPIShim.pandas_dtype File "pyarrow/pandas-shim.pxi", line 153, in pyarrow.lib._PandasAPIShim.pandas_dtype File "/.../.venv/lib/python3.12/site-packages/pandas/core/dtypes/common.py", line 1645, in pandas_dtype npdtype = np.dtype(dtype) ^^^^^^^^^^^^^^^ TypeError: data type 'list<item: int64>[pyarrow]' not understood ``` ### Component(s) Python, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org