vyasr opened a new issue, #44342:
URL: https://github.com/apache/arrow/issues/44342

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Under some very specific set of circumstances, importing pyarrow 17.0.0 from 
an arm wheel triggers a segmentation fault. The error comes from the jemalloc 
function `background_thread_entry` that is statically linked into libarrow.so. 
I can see libarrow.so being opened via strace, and when I run under gdb I see 
the following backtrace:
   ```
   [Detaching after vfork from child process 895]
   [New Thread 0xfffe18fff1d0 (LWP 960)]
   --Type <RET> for more, q to quit, c to continue without paging--c
   
   Thread 128 "jemalloc_bg_thd" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0xfffe18fff1d0 (LWP 960)]
   0x0000fffe1b2d2844 in background_thread_entry () from 
/pyenv/versions/3.12.6/lib/python3.12/site-packages/pyarrow/libarrow.so.1700
   
   (gdb) backtrace
   #0  0x0000fffe122f1844 in background_thread_entry () from 
/pyenv/versions/3.12.6/lib/python3.12/site-packages/pyarrow/libarrow.so.1700
   #1  0x0000ffff94a3a624 in start_thread (arg=0xfffe122f17e0 
<background_thread_entry>) at pthread_create.c:477
   #2  0x0000ffff94b3562c in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
   (gdb) bt full
   #0  0x0000fffe122f1844 in background_thread_entry () from 
/pyenv/versions/3.12.6/lib/python3.12/site-packages/pyarrow/libarrow.so.1700
   No symbol table info available.
   #1  0x0000ffff94a3a624 in start_thread (arg=0xfffe122f17e0 
<background_thread_entry>) at pthread_create.c:477
           ret = <optimized out>
           pd = <optimized out>
           unwind_buf = {cancel_jmp_buf = {{jmp_buf = {281466655208940, 
281474517841168, 281474517841166, 281473175642112, 281474517841167, 
281466691852256,
                   281466655209680, 281466655207888, 281473175646208, 
281466655207888, 281466655205808, 118832585594287181, 0, 118832583903213793, 0, 
0, 0,
                   0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 
0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
           not_first_call = <optimized out>
   #2  0x0000ffff94b3562c in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
   No locals.
   ```
   
   This error is quite difficult to reproduce. In addition to only observing 
this this particular issue with the pyarrow 17.0.0 release and only when 
testing on arm architectures, it is also highly sensitive to the exact order of 
prior operations. In my application I load multiple Python extension modules 
before importing pyarrow, and the order of those imports affects whether or not 
this issue manifests. The cases where the issue arises do manifest reliably, so 
it is not a flaky error, but simply adding an unrelated extra import or 
reordering unrelated imports is often sufficient to make the problem vanish. I 
attempted to rebuild libarrow.so using the same flags used to build the wheel 
(I can't be sure that I got them all right though, I based my compilation on 
the flags in 
https://github.com/apache/arrow/blob/main/ci/scripts/python_wheel_manylinux_build.sh).
 and then preload the library, but that too caused the segmentation fault to 
disappear, so it's also unlikely that I can get d
 ebug symbols into the build in any useful way. I am attempting to reduce this 
to an MWE in https://github.com/rapidsai/cudf/pull/17022, but I am not very 
hopeful in it being reduced all that far.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to