AlenkaF opened a new issue, #34165: URL: https://github.com/apache/arrow/issues/34165
### Describe the bug, including details regarding any error messages, version, and platform. When working on the extension type for tensors in PyArrow I came across a behaviour of the conversion to pandas that could be improved. Creating an extension array (fixed shape tensor in this case) and converting it to pandas works well ```python >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]] >>> storage = pa.array(arr, pa.list_(pa.int32(), 4)) >>> tensor = pa.ExtensionArray.from_storage(tensor_type, storage) >>> tensor.to_pandas() 0 [1, 2, 3, 4] 1 [10, 20, 30, 40] 2 [100, 200, 300, 400] dtype: object ``` But creating a table with an extension array and then converting it to pandas fails: ```python >>> data = [ ... pa.array([1, 2, 3]), ... pa.array(['foo', 'bar', None]), ... pa.array([True, None, True]), ... tensor ... ] >>> my_schema = pa.schema([('f0', pa.int8()), ... ('f1', pa.string()), ... ('f2', pa.bool_()), ... ('tensors_int', tensor_type)]) >>> table = pa.Table.from_arrays(data, schema=my_schema) >>> table.to_pandas() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/array.pxi", line 830, in pyarrow.lib._PandasConvertible.to_pandas return self._to_pandas(options, categories=categories, File "pyarrow/table.pxi", line 4004, in pyarrow.lib.Table._to_pandas mgr = table_to_blockmanager( File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 820, in table_to_blockmanager blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes) File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 1171, in _table_to_blocks return [_reconstruct_block(item, columns, extension_columns) File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 1171, in <listcomp> return [_reconstruct_block(item, columns, extension_columns) File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 776, in _reconstruct_block pandas_dtype = extension_columns[name] KeyError: 'tensors_int' ``` The issue is due to the extension array in this example not having `to_pandas_dtype` method implemented. In this case `ext_columns` does not get populated in `_get_extension_dtypes` method with the name of the column with an extension type: https://github.com/apache/arrow/blob/0368e410be4dac30eada13d307b415165aedc6a7/python/pyarrow/pandas_compat.py#L870-L879 It would be good if it would, in case `to_pandas_dtype` method is not implemented, convert the storage array https://github.com/apache/arrow/blob/0368e410be4dac30eada13d307b415165aedc6a7/python/pyarrow/pandas_compat.py#L776 similar to https://github.com/apache/arrow/blob/925cbd81427ae02ce897c406a264d53c8813b920/python/pyarrow/array.pxi#L2888-L2889 ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org