[I] [Python] Table.from_pandas creates duplicate column names if the dataframe already contains __index_level_i__ columns [arrow]

via GitHub Fri, 18 Apr 2025 02:19:09 -0700


jorisvandenbossche opened a new issue, #46179:
URL: https://github.com/apache/arrow/issues/46179


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The pandas -> arrow conversion adds a `__inex_level_i__` column if the 
dataframe has an unnamed it wants to preserve (i.e. if it is not just a pandas 
RangeIndex). But if your dataframe already has such a column, you end up with a 
duplicate field:
   
   ```
   In [40]: df = pd.DataFrame({"col": [1, 2, 3], "__index_level_0__": [1, 2, 
3]}, index=[2, 3, 4])
   
   In [41]: df
   Out[41]: 
      col  __index_level_0__
   2    1                  1
   3    2                  2
   4    3                  3
   
   In [42]: pa.table(df)
   Out[42]: 
   pyarrow.Table
   col: int64
   __index_level_0__: int64
   __index_level_0__: int64
   ----
   col: [[1,2,3]]
   __index_level_0__: [[1,2,3]]
   __index_level_0__: [[2,3,4]]
   ```
   
   We could have it bump the integer number in the generated column? (although 
we would have to check how that works in the full roundtrip then)
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Python] Table.from_pandas creates duplicate column names if the dataframe already contains __index_level_i__ columns [arrow]

Reply via email to