AlenkaF opened a new issue, #34165:
URL: https://github.com/apache/arrow/issues/34165

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When working on the extension type for tensors in PyArrow I came across a 
behaviour of the conversion to pandas that could be improved. 
   
   Creating an extension array (fixed shape tensor in this case) and converting 
it to pandas works well
   
   ```python
   >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
   >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
   >>> tensor = pa.ExtensionArray.from_storage(tensor_type, storage)
   >>> tensor.to_pandas()
   0            [1, 2, 3, 4]
   1        [10, 20, 30, 40]
   2    [100, 200, 300, 400]
   dtype: object
   ```
   But creating a table with an extension array and then converting it to 
pandas fails:
   
   ```python
   >>> data = [
   ...     pa.array([1, 2, 3]),
   ...     pa.array(['foo', 'bar', None]),
   ...     pa.array([True, None, True]),
   ...     tensor
   ... ]
   >>> my_schema = pa.schema([('f0', pa.int8()),
   ...                        ('f1', pa.string()),
   ...                        ('f2', pa.bool_()),
   ...                        ('tensors_int', tensor_type)])
   >>> table = pa.Table.from_arrays(data, schema=my_schema)
   >>> table.to_pandas()
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/array.pxi", line 830, in 
pyarrow.lib._PandasConvertible.to_pandas
       return self._to_pandas(options, categories=categories,
     File "pyarrow/table.pxi", line 4004, in pyarrow.lib.Table._to_pandas
       mgr = table_to_blockmanager(
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 
820, in table_to_blockmanager
       blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 
1171, in _table_to_blocks
       return [_reconstruct_block(item, columns, extension_columns)
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 
1171, in <listcomp>
       return [_reconstruct_block(item, columns, extension_columns)
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 
776, in _reconstruct_block
       pandas_dtype = extension_columns[name]
   KeyError: 'tensors_int'
   ```
   
   The issue is due to the extension array in this example not having 
`to_pandas_dtype` method implemented. In this case `ext_columns` does not get 
populated in `_get_extension_dtypes` method with the name of the column with an 
extension type:
   
   
https://github.com/apache/arrow/blob/0368e410be4dac30eada13d307b415165aedc6a7/python/pyarrow/pandas_compat.py#L870-L879
   
   It would be good if it would, in case `to_pandas_dtype` method is not 
implemented, convert the storage array 
https://github.com/apache/arrow/blob/0368e410be4dac30eada13d307b415165aedc6a7/python/pyarrow/pandas_compat.py#L776
 similar to
   
https://github.com/apache/arrow/blob/925cbd81427ae02ce897c406a264d53c8813b920/python/pyarrow/array.pxi#L2888-L2889
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to