vyasr opened a new issue, #44342:
URL: https://github.com/apache/arrow/issues/44342
### Describe the bug, including details regarding any error messages,
version, and platform.
Under some very specific set of circumstances, importing pyarrow 17.0.0 from
an arm wheel triggers a segmentation fault. The error comes from the jemalloc
function `background_thread_entry` that is statically linked into libarrow.so.
I can see libarrow.so being opened via strace, and when I run under gdb I see
the following backtrace:
```
[Detaching after vfork from child process 895]
[New Thread 0xfffe18fff1d0 (LWP 960)]
--Type <RET> for more, q to quit, c to continue without paging--c
Thread 128 "jemalloc_bg_thd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xfffe18fff1d0 (LWP 960)]
0x0000fffe1b2d2844 in background_thread_entry () from
/pyenv/versions/3.12.6/lib/python3.12/site-packages/pyarrow/libarrow.so.1700
(gdb) backtrace
#0 0x0000fffe122f1844 in background_thread_entry () from
/pyenv/versions/3.12.6/lib/python3.12/site-packages/pyarrow/libarrow.so.1700
#1 0x0000ffff94a3a624 in start_thread (arg=0xfffe122f17e0
<background_thread_entry>) at pthread_create.c:477
#2 0x0000ffff94b3562c in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) bt full
#0 0x0000fffe122f1844 in background_thread_entry () from
/pyenv/versions/3.12.6/lib/python3.12/site-packages/pyarrow/libarrow.so.1700
No symbol table info available.
#1 0x0000ffff94a3a624 in start_thread (arg=0xfffe122f17e0
<background_thread_entry>) at pthread_create.c:477
ret = <optimized out>
pd = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {281466655208940,
281474517841168, 281474517841166, 281473175642112, 281474517841167,
281466691852256,
281466655209680, 281466655207888, 281473175646208,
281466655207888, 281466655205808, 118832585594287181, 0, 118832583903213793, 0,
0, 0,
0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0,
0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#2 0x0000ffff94b3562c in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
No locals.
```
This error is quite difficult to reproduce. In addition to only observing
this this particular issue with the pyarrow 17.0.0 release and only when
testing on arm architectures, it is also highly sensitive to the exact order of
prior operations. In my application I load multiple Python extension modules
before importing pyarrow, and the order of those imports affects whether or not
this issue manifests. The cases where the issue arises do manifest reliably, so
it is not a flaky error, but simply adding an unrelated extra import or
reordering unrelated imports is often sufficient to make the problem vanish. I
attempted to rebuild libarrow.so using the same flags used to build the wheel
(I can't be sure that I got them all right though, I based my compilation on
the flags in
https://github.com/apache/arrow/blob/main/ci/scripts/python_wheel_manylinux_build.sh).
and then preload the library, but that too caused the segmentation fault to
disappear, so it's also unlikely that I can get d
ebug symbols into the build in any useful way. I am attempting to reduce this
to an MWE in https://github.com/rapidsai/cudf/pull/17022, but I am not very
hopeful in it being reduced all that far.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]