mgorny opened a new issue, #47252:
URL: https://github.com/apache/arrow/issues/47252
### Describe the bug, including details regarding any error messages,
version, and platform.
Consider the following reduced example:
```python
import pyarrow as pa
from datetime import datetime
tables = [
pa.Table.from_pydict(
{
"date": [datetime(2020, 1, 1)],
"variable": ["aapl"],
"price": [110],
}
),
pa.Table.from_pydict(
{
"date": [datetime(2020, 1, 1)],
"variable": ["tlsa"],
"price": [220],
}
),
]
unpivoted = pa.concat_tables(tables)
unpivoted.sort_by([("date", "ascending"), ("variable", "ascending")])
```
Upon running it on PyArrow 21.0.0 built from source, I'm getting a
segmentation fault:
```
(gdb) bt
#0 0x00007fca89f77834 in arrow::NumericArray<arrow::Int64Type>::GetView
(this=0x0, i=0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/array/array_primitive.h:121
#1 arrow::compute::internal::ResolvedChunk::Value<arrow::Int64Type,
arrow::compute::internal::GetViewType<arrow::Int64Type, void> > (
this=this@entry=0x7ffc548ca930)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/chunked_internal.h:56
#2 0x00007fca8a0bd3f5 in operator()
(__closure=__closure@entry=0x7ffc548ca9c0, left=..., right=...)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:849
#3 0x00007fca8a0bd5b8 in
__gnu_cxx::__ops::_Iter_comp_iter<arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
arrow::compute::internal::CompressedChunkLocation)>
>::operator()<arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*> (this=0x7ffc548ca9c0,
__it1=0x55dac2821588, __it2=0x55dac2821580)
at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/predefined_ops.h:158
#4 std::__merge<arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
__gnu_cxx::__ops::_Iter_comp_iter<arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
arrow::compute::internal::CompressedChunkLocation)> >
>(arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
__gnu_cxx::__ops::_Iter_comp_iter<arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeNonN
ulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
arrow::compute::internal::CompressedChunkLocation)> >) (
__first1=__first1@entry=0x55dac2821580, __last1=0x55dac2821588,
__first2=0x55dac2821588, __last2=__last2@entry=0x55dac2821590,
__result=__result@entry=0x55dac2a6b4c0, __comp=...) at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:4887
#5 0x00007fca8a0bd5ff in
std::merge<arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
arrow::compute::internal::CompressedChunkLocation)>
>(arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *,
arrow::compute::internal::CompressedChunkLocation *, struct {...})
(__first1=__first1@entry=0x55dac2821580,
__last1=<optimized out>, __first2=<optimized out>,
__last2=__last2@entry=0x55dac2821590, __result=__result@entry=0x55dac2a6b4c0,
__comp=...) at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/predefined_ops.h:150
#6 0x00007fca8a0bd62c in arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type> (
this=<optimized out>, range_begin=0x55dac2821580,
range_middle=<optimized out>, range_end=0x55dac2821590,
temp_indices=0x55dac2a6b4c0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:840
#7 0x00007fca8a0bd664 in operator() (__closure=<optimized out>,
range_begin=<optimized out>, range_middle=<optimized out>,
range_end=<optimized out>, temp_indices=<optimized out>)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:756
#8 std::__invoke_impl<void, arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeInternal<arrow::Int64Type>(std::vector<arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
>*, int64_t)::<lambda(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)>&,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*> (__f=...) at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:63
#9 std::__invoke_r<void, arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeInternal<arrow::Int64Type>(std::vector<arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
>*, int64_t)::<lambda(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)>&,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*> (__fn=...) at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:113
#10
std::_Function_handler<void(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*),
arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeInternal<arrow::Int64Type>(std::vector<arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
>*, int64_t)::<lambda(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)> >::_M_invoke(const
std::_Any_data &, arrow::compute::internal::CompressedChunkLocation *&&,
arrow::compute::internal::CompressedChunkLocation *&&,
arrow::compute::internal::CompressedChunkLocation *&&,
arrow::compute::internal::CompressedChunkLocation *&&) (
__functor=..., __args#0=<optimized out>, __args#1=<optimized out>,
__args#2=<optimized out>, __args#3=<optimized out>)
at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
#11 0x00007fca8a0c43b3 in
std::function<void(arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*,
arrow::compute::internal::CompressedChunkLocation*)>::operator() (
this=this@entry=0x7ffc548cab78, __args#0=<optimized out>,
__args#1=<optimized out>, __args#2=<optimized out>,
__args#3=<optimized out>) at
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593
#12 0x00007fca8a0cb247 in
arrow::compute::internal::GenericMergeImpl<arrow::compute::internal::CompressedChunkLocation,
arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
>::MergeNullsAtEnd (this=0x7ffc548cab50,
left=..., right=..., null_count=0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort_internal.h:441
#13 0x00007fca8a0cb53a in
arrow::compute::internal::GenericMergeImpl<arrow::compute::internal::CompressedChunkLocation,
arrow::compute:--Type <RET> for more, q to quit, c to continue without paging--
:internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
>::Merge (this=this@entry=0x7ffc548cab50,
left=..., right=..., null_count=null_count@entry=0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort_internal.h:373
#14 0x00007fca8a0bd296 in arrow::compute::internal::(anonymous
namespace)::TableSorter::MergeInternal<arrow::Int64Type> (
this=0x7ffc548cae70, sorted=0x7ffc548cac90, null_count=0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:770
#15 0x00007fca8a0c0a61 in Visitor::Visit (this=0x7ffc548cacb0, type=...)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:721
#16 arrow::VisitTypeInline<arrow::compute::internal::(anonymous
namespace)::TableSorter::SortInternal()::Visitor> (type=...,
visitor=visitor@entry=0x7ffc548cacb0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/visit_type_inline.h:55
#17 0x00007fca8a0c1368 in arrow::compute::internal::(anonymous
namespace)::TableSorter::SortInternal (this=this@entry=0x7ffc548cae70)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:731
#18 0x00007fca8a0c1755 in arrow::compute::internal::(anonymous
namespace)::TableSorter::Sort (this=this@entry=0x7ffc548cae70)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:648
#19 0x00007fca8a0c1aa9 in arrow::compute::internal::(anonymous
namespace)::SortIndicesMetaFunction::SortIndices (
this=this@entry=0x55dac2665fc0, table=..., options=...,
ctx=ctx@entry=0x7ffc548cb0a0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:1070
#20 0x00007fca8a0c1daa in arrow::compute::internal::(anonymous
namespace)::SortIndicesMetaFunction::ExecuteImpl (this=0x55dac2665fc0,
args=..., options=0x55dac2b9c0e0, ctx=0x7ffc548cb0a0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:941
#21 0x00007fca8b5f3221 in arrow::compute::MetaFunction::Execute
(this=0x55dac2665fc0,
args=std::vector of length 1, capacity 1 = {...},
options=0x55dac2b9c0e0, ctx=0x7ffc548cb0a0)
at
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/function.cc:483
#22 0x00007fca836abf45 in __pyx_pf_7pyarrow_8_compute_8Function_6call
(__pyx_v_self=0x7fca83593670, __pyx_v_args=<optimized out>,
__pyx_v_options=<optimized out>, __pyx_v_memory_pool=<optimized out>,
__pyx_v_length=<optimized out>)
at
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/_compute.cpp:19602
#23 __pyx_pw_7pyarrow_8_compute_8Function_7call
(__pyx_v_self=<pyarrow._compute.MetaFunction at remote 0x7fca83593670>,
__pyx_args=<optimized out>, __pyx_nargs=<optimized out>,
__pyx_kwds=<optimized out>)
at
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/_compute.cpp:19389
#24 0x00007fca8da1817e in PyObject_Vectorcall () from
/usr/lib64/libpython3.12.so.1.0
#25 0x00007fca8da6dc26 in _PyEval_EvalFrameDefault () from
/usr/lib64/libpython3.12.so.1.0
#26 0x00007fca8da1801b in PyObject_VectorcallMethod () from
/usr/lib64/libpython3.12.so.1.0
#27 0x00007fca8c3e65f1 in __pyx_pf_7pyarrow_3lib_8_Tabular_33sort_by
(__pyx_v_self=0x7fca7bfcc130,
__pyx_v_sorting=[('date', 'ascending'), ('variable', 'ascending')],
__pyx_v_kwargs=Python Exception <class 'gdb.error'>: There is no member named
dk_nentries.
)
at
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/lib.cpp:168993
#28 __pyx_pw_7pyarrow_3lib_8_Tabular_34sort_by
(__pyx_v_self=<pyarrow.lib.Table at remote 0x7fca7bfcc130>,
__pyx_args=<optimized out>, __pyx_nargs=<optimized out>,
__pyx_kwds=<optimized out>)
at
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/lib.cpp:168827
#29 0x00007fca8da1817e in PyObject_Vectorcall () from
/usr/lib64/libpython3.12.so.1.0
#30 0x00007fca8da6dc26 in _PyEval_EvalFrameDefault () from
/usr/lib64/libpython3.12.so.1.0
#31 0x00007fca8db1bb5a in PyEval_EvalCode () from
/usr/lib64/libpython3.12.so.1.0
#32 0x00007fca8db40630 in ?? () from /usr/lib64/libpython3.12.so.1.0
#33 0x00007fca8db405a7 in ?? () from /usr/lib64/libpython3.12.so.1.0
#34 0x00007fca8db41d51 in ?? () from /usr/lib64/libpython3.12.so.1.0
#35 0x00007fca8db41af8 in _PyRun_SimpleFileObject () from
/usr/lib64/libpython3.12.so.1.0
#36 0x00007fca8db41918 in _PyRun_AnyFileObject () from
/usr/lib64/libpython3.12.so.1.0
#37 0x00007fca8db4da25 in Py_RunMain () from /usr/lib64/libpython3.12.so.1.0
#38 0x00007fca8db4d40a in Py_BytesMain () from
/usr/lib64/libpython3.12.so.1.0
#39 0x00007fca8d6294ae in ?? () from /usr/lib64/libc.so.6
#40 0x00007fca8d629569 in __libc_start_main () from /usr/lib64/libc.so.6
#41 0x000055daabdc2095 in _start ()
```
This is Gentoo Linux amd64. Curious enough, I can't reproduce this with
PyArrow installed from PyPI wheels.
Build logs for the build I've used to get backtraces:
- C++:
[dev-libs:apache-arrow-21.0.0:20250801-133657.log](https://github.com/user-attachments/files/21557832/dev-libs.apache-arrow-21.0.0.20250801-133657.log)
- Python:
[dev-python:pyarrow-21.0.0:20250801-134715.log](https://github.com/user-attachments/files/21557838/dev-python.pyarrow-21.0.0.20250801-134715.log)
### Component(s)
Python, C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]