dweih opened a new issue, #47022:
URL: https://github.com/apache/arrow/issues/47022

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Our code primarily uses polars but external tools use pandas, and when we 
use them to import parquet files with categorical columns that have unsigned 
int index types (uint16 and uint32) we get the error 
   
   `ArrowTypeError: Converting unsigned ddictionary indices to pandas not yet 
supported, index type: uint32`
   
   Simple repro below.
   
   ```
   import polars as pl
   import pyarrow as pa
   
   n = 100
   cat_values = [f"cat_{i}" for i in range(n)]
   df = pl.DataFrame({
       "cat": cat_values,
       "val": list(range(n))
   })
   arrow_table = df.to_arrow()
   
   dict_type = pa.dictionary(index_type=pa.uint16(), value_type=pa.string())
   arrow_table = arrow_table.set_column(
       arrow_table.schema.get_field_index("cat"),
       "cat",
       arrow_table.column("cat").cast(dict_type)
   )
   
   print("Arrow schema:", arrow_table.schema)
   
   
   try:
       pdf = pl.from_table(arrow_table).to_pandas()
       pdf = arrow_table.to_pandas()
       print("Loaded into pandas successfully.")
   except Exception as e:
       print("Failed to load into pandas:")
       print(e)
   
   try:
       pol_df = pl.from_arrow(arrow_table)
       print("Loaded into Polars successfully.")
   except Exception as e:
       print("Failed to load into Polars:")
       print(e)
   ```
   
   Finally, I wasn't sure whether to make this a FR or Issue, because it's 
missing, not incorrect.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to