mgorny opened a new issue, #47252:
URL: https://github.com/apache/arrow/issues/47252

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Consider the following reduced example:
   
   ```python
   import pyarrow as pa
   from datetime import datetime
   
   tables = [
       pa.Table.from_pydict(
           {
               "date": [datetime(2020, 1, 1)],
               "variable": ["aapl"],
               "price": [110],
           }
       ),
       pa.Table.from_pydict(
           {
               "date": [datetime(2020, 1, 1)],
               "variable": ["tlsa"],
               "price": [220],
           }
       ),
   ]
   
   unpivoted = pa.concat_tables(tables)
   unpivoted.sort_by([("date", "ascending"), ("variable", "ascending")])
   ```
   
   Upon running it on PyArrow 21.0.0 built from source, I'm getting a 
segmentation fault:
   
   ```
   (gdb) bt
   #0  0x00007fca89f77834 in arrow::NumericArray<arrow::Int64Type>::GetView 
(this=0x0, i=0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/array/array_primitive.h:121
   #1  arrow::compute::internal::ResolvedChunk::Value<arrow::Int64Type, 
arrow::compute::internal::GetViewType<arrow::Int64Type, void> > (
       this=this@entry=0x7ffc548ca930)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/chunked_internal.h:56
   #2  0x00007fca8a0bd3f5 in operator() 
(__closure=__closure@entry=0x7ffc548ca9c0, left=..., right=...)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:849
   #3  0x00007fca8a0bd5b8 in 
__gnu_cxx::__ops::_Iter_comp_iter<arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
 arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
 arrow::compute::internal::CompressedChunkLocation)> 
>::operator()<arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*> (this=0x7ffc548ca9c0, 
__it1=0x55dac2821588, __it2=0x55dac2821580)
       at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/predefined_ops.h:158
   #4  std::__merge<arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
__gnu_cxx::__ops::_Iter_comp_iter<arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
 arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
 arrow::compute::internal::CompressedChunkLocation)> > 
>(arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
__gnu_cxx::__ops::_Iter_comp_iter<arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeNonN
 ulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
 arrow::compute::internal::CompressedChunkLocation)> >) (
       __first1=__first1@entry=0x55dac2821580, __last1=0x55dac2821588, 
__first2=0x55dac2821588, __last2=__last2@entry=0x55dac2821590, 
       __result=__result@entry=0x55dac2a6b4c0, __comp=...) at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:4887
   #5  0x00007fca8a0bd5ff in 
std::merge<arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type>(arrow::compute::internal::CompressedChunkLocation*,
 arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)::<lambda(arrow::compute::internal::CompressedChunkLocation,
 arrow::compute::internal::CompressedChunkLocation)> 
>(arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, 
arrow::compute::internal::CompressedChunkLocation *, struct {...}) 
(__first1=__first1@entry=0x55dac2821580, 
       __last1=<optimized out>, __first2=<optimized out>, 
__last2=__last2@entry=0x55dac2821590, __result=__result@entry=0x55dac2a6b4c0, 
       __comp=...) at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/predefined_ops.h:150
   #6  0x00007fca8a0bd62c in arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeNonNulls<arrow::Int64Type> (
       this=<optimized out>, range_begin=0x55dac2821580, 
range_middle=<optimized out>, range_end=0x55dac2821590, 
       temp_indices=0x55dac2a6b4c0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:840
   #7  0x00007fca8a0bd664 in operator() (__closure=<optimized out>, 
range_begin=<optimized out>, range_middle=<optimized out>, 
       range_end=<optimized out>, temp_indices=<optimized out>)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:756
   #8  std::__invoke_impl<void, arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeInternal<arrow::Int64Type>(std::vector<arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
 >*, int64_t)::<lambda(arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)>&, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*> (__f=...) at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:63
   #9  std::__invoke_r<void, arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeInternal<arrow::Int64Type>(std::vector<arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
 >*, int64_t)::<lambda(arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)>&, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*> (__fn=...) at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:113
   #10 
std::_Function_handler<void(arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*), 
arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeInternal<arrow::Int64Type>(std::vector<arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
 >*, int64_t)::<lambda(arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)> >::_M_invoke(const 
std::_Any_data &, arrow::compute::internal::CompressedChunkLocation *&&, 
arrow::compute::internal::CompressedChunkLocation *&&, 
arrow::compute::internal::CompressedChunkLocation *&&, 
arrow::compute::internal::CompressedChunkLocation *&&) (
       __functor=..., __args#0=<optimized out>, __args#1=<optimized out>, 
__args#2=<optimized out>, __args#3=<optimized out>)
       at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
   #11 0x00007fca8a0c43b3 in 
std::function<void(arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*, 
arrow::compute::internal::CompressedChunkLocation*)>::operator() (
       this=this@entry=0x7ffc548cab78, __args#0=<optimized out>, 
__args#1=<optimized out>, __args#2=<optimized out>, 
       __args#3=<optimized out>) at 
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593
   #12 0x00007fca8a0cb247 in 
arrow::compute::internal::GenericMergeImpl<arrow::compute::internal::CompressedChunkLocation,
 
arrow::compute::internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
 >::MergeNullsAtEnd (this=0x7ffc548cab50, 
       left=..., right=..., null_count=0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort_internal.h:441
   #13 0x00007fca8a0cb53a in 
arrow::compute::internal::GenericMergeImpl<arrow::compute::internal::CompressedChunkLocation,
 arrow::compute:--Type <RET> for more, q to quit, c to continue without paging--
   
:internal::GenericNullPartitionResult<arrow::compute::internal::CompressedChunkLocation>
 >::Merge (this=this@entry=0x7ffc548cab50, 
       left=..., right=..., null_count=null_count@entry=0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort_internal.h:373
   #14 0x00007fca8a0bd296 in arrow::compute::internal::(anonymous 
namespace)::TableSorter::MergeInternal<arrow::Int64Type> (
       this=0x7ffc548cae70, sorted=0x7ffc548cac90, null_count=0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:770
   
   #15 0x00007fca8a0c0a61 in Visitor::Visit (this=0x7ffc548cacb0, type=...)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:721
   #16 arrow::VisitTypeInline<arrow::compute::internal::(anonymous 
namespace)::TableSorter::SortInternal()::Visitor> (type=..., 
       visitor=visitor@entry=0x7ffc548cacb0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/visit_type_inline.h:55
   #17 0x00007fca8a0c1368 in arrow::compute::internal::(anonymous 
namespace)::TableSorter::SortInternal (this=this@entry=0x7ffc548cae70)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:731
   #18 0x00007fca8a0c1755 in arrow::compute::internal::(anonymous 
namespace)::TableSorter::Sort (this=this@entry=0x7ffc548cae70)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:648
   
   #19 0x00007fca8a0c1aa9 in arrow::compute::internal::(anonymous 
namespace)::SortIndicesMetaFunction::SortIndices (
       this=this@entry=0x55dac2665fc0, table=..., options=..., 
ctx=ctx@entry=0x7ffc548cb0a0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:1070
   
   #20 0x00007fca8a0c1daa in arrow::compute::internal::(anonymous 
namespace)::SortIndicesMetaFunction::ExecuteImpl (this=0x55dac2665fc0, 
       args=..., options=0x55dac2b9c0e0, ctx=0x7ffc548cb0a0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/kernels/vector_sort.cc:941
   #21 0x00007fca8b5f3221 in arrow::compute::MetaFunction::Execute 
(this=0x55dac2665fc0, 
       args=std::vector of length 1, capacity 1 = {...}, 
options=0x55dac2b9c0e0, ctx=0x7ffc548cb0a0)
       at 
/usr/src/debug/dev-libs/apache-arrow-21.0.0/apache-arrow-21.0.0/cpp/src/arrow/compute/function.cc:483
   #22 0x00007fca836abf45 in __pyx_pf_7pyarrow_8_compute_8Function_6call 
(__pyx_v_self=0x7fca83593670, __pyx_v_args=<optimized out>, 
       __pyx_v_options=<optimized out>, __pyx_v_memory_pool=<optimized out>, 
__pyx_v_length=<optimized out>)
       at 
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/_compute.cpp:19602
   
   #23 __pyx_pw_7pyarrow_8_compute_8Function_7call 
(__pyx_v_self=<pyarrow._compute.MetaFunction at remote 0x7fca83593670>, 
       __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, 
__pyx_kwds=<optimized out>)
       at 
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/_compute.cpp:19389
   #24 0x00007fca8da1817e in PyObject_Vectorcall () from 
/usr/lib64/libpython3.12.so.1.0
   #25 0x00007fca8da6dc26 in _PyEval_EvalFrameDefault () from 
/usr/lib64/libpython3.12.so.1.0
   #26 0x00007fca8da1801b in PyObject_VectorcallMethod () from 
/usr/lib64/libpython3.12.so.1.0
   #27 0x00007fca8c3e65f1 in __pyx_pf_7pyarrow_3lib_8_Tabular_33sort_by 
(__pyx_v_self=0x7fca7bfcc130, 
       __pyx_v_sorting=[('date', 'ascending'), ('variable', 'ascending')], 
__pyx_v_kwargs=Python Exception <class 'gdb.error'>: There is no member named 
dk_nentries.
   )
       at 
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/lib.cpp:168993
   #28 __pyx_pw_7pyarrow_3lib_8_Tabular_34sort_by 
(__pyx_v_self=<pyarrow.lib.Table at remote 0x7fca7bfcc130>, 
       __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, 
__pyx_kwds=<optimized out>)
       at 
/usr/src/debug/dev-python/pyarrow-21.0.0/apache-arrow-21.0.0/python-python3_12/build0/temp.linux-x86_64-cpython-312/lib.cpp:168827
   #29 0x00007fca8da1817e in PyObject_Vectorcall () from 
/usr/lib64/libpython3.12.so.1.0
   #30 0x00007fca8da6dc26 in _PyEval_EvalFrameDefault () from 
/usr/lib64/libpython3.12.so.1.0
   #31 0x00007fca8db1bb5a in PyEval_EvalCode () from 
/usr/lib64/libpython3.12.so.1.0
   #32 0x00007fca8db40630 in ?? () from /usr/lib64/libpython3.12.so.1.0
   #33 0x00007fca8db405a7 in ?? () from /usr/lib64/libpython3.12.so.1.0
   #34 0x00007fca8db41d51 in ?? () from /usr/lib64/libpython3.12.so.1.0
   #35 0x00007fca8db41af8 in _PyRun_SimpleFileObject () from 
/usr/lib64/libpython3.12.so.1.0
   #36 0x00007fca8db41918 in _PyRun_AnyFileObject () from 
/usr/lib64/libpython3.12.so.1.0
   #37 0x00007fca8db4da25 in Py_RunMain () from /usr/lib64/libpython3.12.so.1.0
   #38 0x00007fca8db4d40a in Py_BytesMain () from 
/usr/lib64/libpython3.12.so.1.0
   #39 0x00007fca8d6294ae in ?? () from /usr/lib64/libc.so.6
   #40 0x00007fca8d629569 in __libc_start_main () from /usr/lib64/libc.so.6
   #41 0x000055daabdc2095 in _start ()
   ```
   
   This is Gentoo Linux amd64. Curious enough, I can't reproduce this with 
PyArrow installed from PyPI wheels.
   
   Build logs for the build I've used to get backtraces:
   - C++: 
[dev-libs:apache-arrow-21.0.0:20250801-133657.log](https://github.com/user-attachments/files/21557832/dev-libs.apache-arrow-21.0.0.20250801-133657.log)
   - Python: 
[dev-python:pyarrow-21.0.0:20250801-134715.log](https://github.com/user-attachments/files/21557838/dev-python.pyarrow-21.0.0.20250801-134715.log)
   
   ### Component(s)
   
   Python, C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to