aiudirog opened a new issue, #45086:
URL: https://github.com/apache/arrow/issues/45086

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I've encountered a consistent segfault when trying to use Pandas `.ffill()` 
with a boolean ChunkedArray on Windows (no problems on Linux):
   
   ```python
   import pandas as pd
   import pyarrow as pa
   print("Pandas:", pd.__version__)
   print("PyArrow:", pa.__version__)
   
   for i in range(1, 32):
       mx = 2 ** i
       print(i, mx)
       x = pd.Series(pd.NA, index=range(mx), dtype='boolean[pyarrow]')
       x[mx // 2] = True
       x.ffill()
   ```
   
   This always crashes on the 12th iteration, when the array has 4096 values, 
across multiple computers.
   
   I dug under the hood to see what Pandas was doing and was able to create 
this pure PyArrow example (though I'm not sure if I'm using the API correctly):
   
   ```python
   import pyarrow as pa 
   import pyarrow.compute as pc
   
   pad = [None] * 4000
   a = pa.chunked_array([pad, [True], pad], type=pa.bool_())
   pc.fill_null_forward(a)
   ```
   
   Additionally, I also tried to reproduce this with just `pa.array()` but that 
worked fine.
   
   -----------------------
   
   Operating System: Windows 10 (10.0.19045 Build 19045)
   Processor: Intel Core i7-12850HX
   Python Version: 3.12.7 (additionally confirmed by co-workers on 3.11 & 3.10)
   PyArrow Versions Tested: 16.1, 17.0, 18.0, 18.1
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to