[I] Feather read_table function should raise a TypeError if "columns" argument is not a sequence [arrow]

via GitHub Thu, 03 Apr 2025 07:13:32 -0700


Warzazatsy opened a new issue, #46019:
URL: https://github.com/apache/arrow/issues/46019


   ### Describe the enhancement requested
   
   The current read_table function doesn't check if the "columns" argument is a 
sequence rather than an iterable, leading to unexpected behaviour if the 
iterable is a generator (it doesn't raise any error).
   
   
   ```python
   import pyarrow as pa
   import pyarrow.feather as feather
   
   
   # Build a table example
   n_legs = pa.array([2, 4, 5, 100])
   animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
   names = ["n_legs", "animals"]
   
   df_example = pa.Table.from_arrays([n_legs, animals], names=names)
   
   # and save it
   feather.write_feather(df_example,"df_example.feather")
   
   ```
   
   then, trying to read it:
   
   This will work:
   ```python
   columns_as_list = [field.name for field in 
pa.ipc.open_file("df_basic.feather").schema]
   feather.read_table("df_basic.feather", columns = columns_as_list)
   
   ```
   Result:
   
   ```
   pyarrow.Table
   n_legs: int64
   animals: string
   ----
   n_legs: [[2,4,5,100]]
   animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
   ```
   
   This will not work (because the generator is exausted during type check), 
_but doesn't raise any error_
   ```python
   columns_as_generator = (field.name for field in 
pa.ipc.open_file("df_basic.feather").schema)
   feather.read_table("df_basic.feather", columns = columns_as_generator)
   
   ```
   Result:
   
   ```
   pyarrow.Table
   
   ----
   ```
   
   This behaviour is quite hidden, as an empty list [ ] for the "columns" 
argument return all the columns. Then, when this function is called from 
pandas, it return a dataframe with an index only, which is not what we should 
expect.  
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Feather read_table function should raise a TypeError if "columns" argument is not a sequence [arrow]

Reply via email to