Warzazatsy opened a new issue, #46019:
URL: https://github.com/apache/arrow/issues/46019
### Describe the enhancement requested
The current read_table function doesn't check if the "columns" argument is a
sequence rather than an iterable, leading to unexpected behaviour if the
iterable is a generator (it doesn't raise any error).
```python
import pyarrow as pa
import pyarrow.feather as feather
# Build a table example
n_legs = pa.array([2, 4, 5, 100])
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
names = ["n_legs", "animals"]
df_example = pa.Table.from_arrays([n_legs, animals], names=names)
# and save it
feather.write_feather(df_example,"df_example.feather")
```
then, trying to read it:
This will work:
```python
columns_as_list = [field.name for field in
pa.ipc.open_file("df_basic.feather").schema]
feather.read_table("df_basic.feather", columns = columns_as_list)
```
Result:
```
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
```
This will not work (because the generator is exausted during type check),
_but doesn't raise any error_
```python
columns_as_generator = (field.name for field in
pa.ipc.open_file("df_basic.feather").schema)
feather.read_table("df_basic.feather", columns = columns_as_generator)
```
Result:
```
pyarrow.Table
----
```
This behaviour is quite hidden, as an empty list [ ] for the "columns"
argument return all the columns. Then, when this function is called from
pandas, it return a dataframe with an index only, which is not what we should
expect.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]