gitmodimo opened a new issue, #47248:
URL: https://github.com/apache/arrow/issues/47248
### Describe the bug, including details regarding any error messages,
version, and platform.
During development of custom Acero nodes I found that exec plan does not
handle errors propertly. In simple scenario od two nodes source+sink
1. Deadlock occurs when `InputReceived` called by source on sink returns an
error (IE not Status::OK()). In this case source node does not complete its
operation and plan never gets finished.
2. Use after free occurs when `StopProducing` in sink node return an error.
I looks like `StopProducingErrorReporter` task gets posted already finished
and destroyed async_scheduler here:
https://github.com/apache/arrow/blob/87dca7d8320b549ad6ea17d84ef6c80360ef8c33/cpp/src/arrow/acero/exec_plan.cc#L234
```
Note: Google Test filter = ExecPlan/ExecPlanErrorReporting.StopProducing/4
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ExecPlan/ExecPlanErrorReporting
[ RUN ] ExecPlan/ExecPlanErrorReporting.StopProducing/4
=================================================================
==4161531==ERROR: AddressSanitizer: heap-use-after-free on address
0x50b000084a90 at pc 0x7d3d669b8bc5 bp 0x7ffe22cff750 sp 0x7ffe22cff740
READ of size 8 at 0x50b000084a90 thread T0
#0 0x7d3d669b8bc4 in AddSimpleTask<arrow::acero::(anonymous
namespace)::ExecPlanImpl::StopProducingImpl<__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> >
>(__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> >,
__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> >)::<lambda()> >
/data/home/gitmodimo/arrow/cpp/src/arrow/util/async_util.h:171
#1 0x7d3d669b8bc4 in
StopProducingImpl<__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> > >
/data/home/gitmodimo/arrow/cpp/src/arrow/acero/exec_plan.cc:234
#2 0x7d3d669b8e01 in operator()
/data/home/gitmodimo/arrow/cpp/src/arrow/acero/exec_plan.cc:218
#3 0x7d3d669b8e01 in __invoke_impl<void, arrow::acero::(anonymous
namespace)::ExecPlanImpl::StopProducing()::<lambda()>&>
/usr/include/c++/13/bits/invoke.h:61
#4 0x7d3d669b8e01 in __invoke_r<void, arrow::acero::(anonymous
namespace)::ExecPlanImpl::StopProducing()::<lambda()>&>
/usr/include/c++/13/bits/invoke.h:111
#5 0x7d3d669b8e01 in _M_invoke
/usr/include/c++/13/bits/std_function.h:290
#6 0x7d3d66c6f751 in std::function<void ()>::operator()() const
/usr/include/c++/13/bits/std_function.h:591
#7 0x7d3d66c6f751 in
arrow::acero::TaskSchedulerImpl::Abort(std::function<void ()>)
/data/home/gitmodimo/arrow/cpp/src/arrow/acero/task_util.cc:436
#8 0x7d3d669bcf92 in StopProducing
/data/home/gitmodimo/arrow/cpp/src/arrow/acero/exec_plan.cc:217
#9 0x6481a74c33c3 in
arrow::acero::ExecPlanErrorReporting_StopProducing_Test::TestBody()
/data/home/gitmodimo/arrow/cpp/src/arrow/acero/plan_test.cc:1927
#10 0x7d3d67baafce in void
testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,
void>(testing::Test*, void (testing::Test::*)(), char const*)
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2653
#11 0x7d3d67baafce in void
testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,
void>(testing::Test*, void (testing::Test::*)(), char const*)
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2689
#12 0x7d3d67b96455 in testing::Test::Run()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2728
#13 0x7d3d67b96455 in testing::Test::Run()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2718
#14 0x7d3d67b965e4 in testing::TestInfo::Run()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2874
#15 0x7d3d67b9678e in testing::TestSuite::Run()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:3052
#16 0x7d3d67b9678e in testing::TestSuite::Run()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:3006
#17 0x7d3d67b9f653 in testing::internal::UnitTestImpl::RunAllTests()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:6004
#18 0x7d3d67bab6a6 in bool
testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
bool>(testing::internal::UnitTestImpl*, bool
(testing::internal::UnitTestImpl::*)(), char const*)
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2653
#19 0x7d3d67bab6a6 in bool
testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
bool>(testing::internal::UnitTestImpl*, bool
(testing::internal::UnitTestImpl::*)(), char const*)
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2689
#20 0x7d3d67b96949 in testing::UnitTest::Run()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest.cc:5583
#21 0x6481a73fd0a6 in RUN_ALL_TESTS()
/data/home/gitmodimo/arrow/cpp/build/_deps/googletest-src/googletest/include/gtest/gtest.h:2334
#22 0x6481a73fd0a6 in main
/data/home/gitmodimo/arrow/cpp/src/arrow/compute/test_env.cc:50
#23 0x7d3d5f82a1c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
#24 0x7d3d5f82a28a in __libc_start_main_impl ../csu/libc-start.c:360
#25 0x6481a7407594 in _start
(/data/home/gitmodimo/arrow/cpp/build/relwithdebinfo/arrow-acero-plan-test+0x80594)
(BuildId: b680597df36b43e27569204de05aa316f5e5fec3)
0x50b000084a90 is located 0 bytes inside of 112-byte region
[0x50b000084a90,0x50b000084b00)
freed by thread T1 here:
#0 0x7d3d674ff5e8 in operator delete(void*, unsigned long)
../../../../src/libsanitizer/asan/asan_new_delete.cpp:164
#1 0x7d3d62505d03 in ~AsyncTaskSchedulerImpl
/data/home/gitmodimo/arrow/cpp/src/arrow/util/async_util.cc:163
#2 0x7d3d62505d03 in operator() /usr/include/c++/13/bits/unique_ptr.h:99
#3 0x7d3d62505d03 in ~unique_ptr
/usr/include/c++/13/bits/unique_ptr.h:404
#4 0x7d3d62505d03 in ~<lambda>
/data/home/gitmodimo/arrow/cpp/src/arrow/util/async_util.cc:472
#5 0x7d3d62505d03 in ~ThenOnComplete
/data/home/gitmodimo/arrow/cpp/src/arrow/util/future.h:518
#6 0x7d3d62505d03 in ~Callback
/data/home/gitmodimo/arrow/cpp/src/arrow/util/future.h:440
#7 0x7d3d62505d03 in ~FnImpl
/data/home/gitmodimo/arrow/cpp/src/arrow/util/functional.h:150
#8 0x7d3d62505d03 in ~FnImpl
/data/home/gitmodimo/arrow/cpp/src/arrow/util/functional.h:150
previously allocated by thread T0 here:
#0 0x7d3d674fe548 in operator new(unsigned long)
../../../../src/libsanitizer/asan/asan_new_delete.cpp:95
#1 0x7d3d625182de in make_unique<arrow::util::(anonymous
namespace)::AsyncTaskSchedulerImpl, arrow::StopToken,
arrow::internal::FnOnce<void(const arrow::Status&)> >
/usr/include/c++/13/bits/unique_ptr.h:1070
#2 0x7d3d625182de in
arrow::util::AsyncTaskScheduler::Make(arrow::internal::FnOnce<arrow::Status
(arrow::util::AsyncTaskScheduler*)>, arrow::internal::FnOnce<void
(arrow::Status const&)>, arrow::StopToken)
/data/home/gitmodimo/arrow/cpp/src/arrow/util/async_util.cc:468
Thread T1 created by T0 here:
#0 0x7d3d674f51f9 in pthread_create
../../../../src/libsanitizer/asan/asan_interceptors.cpp:245
#1 0x7d3d5fceceb0 in
std::thread::_M_start_thread(std::unique_ptr<std::thread::_State,
std::default_delete<std::thread::_State> >, void (*)())
(/lib/x86_64-linux-gnu/libstdc++.so.6+0xeceb0) (BuildId:
ca77dae775ec87540acd7218fa990c40d1c94ab1)
#2 0x7d3d6270aac2 in
thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >
/usr/include/c++/13/bits/std_thread.h:164
#3 0x7d3d6270aac2 in
arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)
/data/home/gitmodimo/arrow/cpp/src/arrow/util/thread_pool.cc:643
SUMMARY: AddressSanitizer: heap-use-after-free
/data/home/gitmodimo/arrow/cpp/src/arrow/util/async_util.h:171 in
AddSimpleTask<arrow::acero::(anonymous
namespace)::ExecPlanImpl::StopProducingImpl<__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> >
>(__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> >,
__gnu_cxx::__normal_iterator<arrow::acero::ExecNode**,
std::vector<arrow::acero::ExecNode*> >)::<lambda()> >
Shadow bytes around the buggy address:
0x50b000084800: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
0x50b000084880: 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa
0x50b000084900: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
0x50b000084980: 00 00 00 fa fa fa fa fa fa fa fa fa fd fd fd fd
0x50b000084a00: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
=>0x50b000084a80: fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
0x50b000084b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x50b000084b80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x50b000084c00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x50b000084c80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x50b000084d00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==4161531==ABORTING
```
I tried extending lifetime of async_scheduler and it does fix the problem
but the [fix seems a bit
ugly](https://github.com/gitmodimo/arrow/commit/4fc235f86b6def734d8305fd121c54b28c339165).
Note this is not complete fix as some modules are not in sync with those
changes. I am using only acero and dataset.
I created test cases for these two scenarios in my
[branch](https://github.com/gitmodimo/arrow/tree/ExecPlanErrorReporting):
1. ExecPlan/ExecPlanErrorReporting.Finish/1
2. ExecPlan/ExecPlanErrorReporting.StopProducing/4
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]