gaodayue opened a new issue #3706:
URL: https://github.com/apache/incubator-doris/issues/3706
**Describe the bug**
BE crashes with the following logs in be.out
```
start time: Tue May 19 18:15:02 CST 2020
*** Aborted at 1590397931 (unix time) try "date -d @1590397931" if you are
using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 23260 (TID 0x7f0f49140700) from PID 0;
stack trace: ***
@ 0x7f0faa6575e0 (unknown)
```
For unknown reason Glog fails to capture backtrace. Fortunately we have
coredump enabled and got the following backtrace
```
#0 0x0000000000000000 in ?? ()
#1 0x0000000000ff40e2 in doris::MemTracker::~MemTracker (this=0x2543c3040,
__in_chrg=<optimized out>) at
/home/gaodayue/doris-git/be/src/runtime/mem_tracker.cpp:194
#2 0x00000000015c79cb in operator() (this=<optimized out>,
__ptr=0x2543c3040) at /usr/local/include/c++/7.2.0/bits/unique_ptr.h:78
#3 ~unique_ptr (this=0x238bff9e0, __in_chrg=<optimized out>) at
/usr/local/include/c++/7.2.0/bits/unique_ptr.h:268
#4 doris::DataStreamSender::~DataStreamSender (this=0x238bff860,
__in_chrg=<optimized out>) at
/home/gaodayue/doris-git/be/src/runtime/data_stream_sender.cpp:430
#5 0x00000000015c7c61 in doris::DataStreamSender::~DataStreamSender
(this=0x238bff860, __in_chrg=<optimized out>) at
/home/gaodayue/doris-git/be/src/runtime/data_stream_sender.cpp:434
#6 0x000000000102d96e in checked_delete<doris::DataSink> (x=<optimized
out>) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/core/checked_delete.hpp:34
#7 ~scoped_ptr (this=0x6da945260, __in_chrg=<optimized out>) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/smart_ptr/scoped_ptr.hpp:89
#8 doris::PlanFragmentExecutor::~PlanFragmentExecutor (this=0x6da9450f0,
__in_chrg=<optimized out>) at
/home/gaodayue/doris-git/be/src/runtime/plan_fragment_executor.cpp:62
#9 0x0000000000fbbe2d in doris::FragmentExecState::~FragmentExecState
(this=0x6da945080, __in_chrg=<optimized out>) at
/home/gaodayue/doris-git/be/src/runtime/fragment_mgr.cpp:178
#10 0x0000000000fc34ea in std::_Sp_counted_ptr<doris::FragmentExecState*,
(__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at
/usr/local/include/c++/7.2.0/bits/shared_ptr_base.h:376
#11 0x0000000000d72f76 in
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0xfa38b180)
at /usr/local/include/c++/7.2.0/bits/shared_ptr_base.h:154
#12 0x0000000000fc3dca in ~list3 (this=0x43930d9b0, __in_chrg=<optimized
out>) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/bind/bind.hpp:358
#13 ~bind_t (this=0x43930d9a0, __in_chrg=<optimized out>) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/bind/bind.hpp:1275
#14 manager (op=<optimized out>, out_buffer=..., in_buffer=...) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_base.hpp:358
#15 manager (op=<optimized out>, out_buffer=..., in_buffer=...) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_base.hpp:379
#16 boost::detail::function::functor_manager<boost::_bi::bind_t<void,
boost::_mfi::mf2<void, doris::FragmentMgr,
std::shared_ptr<doris::FragmentExecState>, std::function<void
(doris::PlanFragmentExecutor*)> >
, boost::_bi::list3<boost::_bi::value<doris::FragmentMgr*>,
boost::_bi::value<std::shared_ptr<doris::FragmentExecState> >,
boost::_bi::value<std::function<void (doris::PlanFragmentExecutor*)> > > >
>::manage(boost::detail::function::function_buffer const&,
boost::detail::function::function_buffer&,
boost::detail::function::functor_manager_operation_type) (in_buffer=...,
out_buffer=..., op=<optimized out>)
at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_base.hpp:407
#17 0x0000000000ec23aa in clear (this=<optimized out>, functor=...) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_template.hpp:510
#18 clear (this=0x7f0f4913dfc0) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_template.hpp:868
#19 ~function0 (this=0x7f0f4913dfc0, __in_chrg=<optimized out>) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_template.hpp:752
#20 ~function (this=0x7f0f4913dfc0, __in_chrg=<optimized out>) at
/home/gaodayue/doris-git/thirdparty/installed/include/boost/function/function_template.hpp:1041
#21 doris::ThreadPool::work_thread (this=0x68d0b00, thread_id=<optimized
out>) at /home/gaodayue/doris-git/be/src/util/thread_pool.hpp:117
#22 0x0000000001a0adfd in thread_proxy ()
#23 0x00007f0faa64fe25 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f0fa9a5934d in clone () from /lib64/libc.so.6
```
The program crashes when destructing DataStreamSender's MemTracker. Let's
look at its state
```
(gdb) f 1
(gdb) p this
$25 = (doris::MemTracker * const) 0x2543c3040
(gdb) p _all_trackers
$26 = std::vector of length 4, capacity 4 = {0x2543c3040, 0x181ea2b60,
0x12cdae000, 0x77f0d00}
```
From cores, we see that it has three parent trackers
* instance tracker 0x181ea2b60
* query tracker 0x12cdae000
* process tracker 0x77f0d00
After looking into the state of the these trackers, I'm pretty sure the
memory of instance and query tracker is corrupted.
For example
1) `_type` of instance tracker's `_consumption` field is invalid value
630994688.
```
(gdb) p *(((doris::MemTracker*) 0x181ea2b60)->_consumption)
$28 = {<doris::RuntimeProfile::Counter> = {_vptr.Counter = 0x406cadda0,
_value = {_value = 4273231}, _type = 630994688}, current_value_ = {_value = 0}}
```
2) `_child_trackers` of query tracker is corrupted
```
(gdb) p ((doris::MemTracker*) 0x12cdae000)->_child_trackers
$29 = std::__cxx11::list = {[0] = 0x197756b60, [1] = 0x0, [2] = Cannot
access memory at address 0x10
```
Since instance and query tracker belongs to RuntimeState, we then look at
the state of RuntimeState. It looks that it has already been destructed.
```
(gdb) p _runtime_state
$30 = {px = 0x56fdb8a00}
(gdb) p ((doris::RuntimeState*) 0x56fdb8a00)->_block_mgr
$31 = {px = 0x0, pn = {pi_ = 0x0}}
(gdb) p ((doris::RuntimeState*) 0x56fdb8a00)->_block_mgr2
$32 = {px = 0x0, pn = {pi_ = 0x0}}
```
So far the cause of problem is clear, DataSink uses MemTrackers from
RuntimeState but the latter is destructed before the former. There is one last
remaining question: What are the conditions that trigger the BUG?
We can get the fragment instance id 36bf368c3c0d4ce2-8034bef11abe15cf from
the coredump. Grepping be.INFO, we found that the fragment is cancelled due to
timeout.
```
I0525 17:11:51.110321 24030 internal_service.cpp:149] exec plan fragment,
fragment_instance_id=36bf368c3c0d4ce2-8034bef11abe15cf,
coord=TNetworkAddress(hostname=xx.xx.xx.xx port=9020), backend=13
I0525 17:11:51.110373 24030 plan_fragment_executor.cpp:79] Prepare():
query_id=36bf368c3c0d4ce2-8034bef11abe15b2
fragment_instance_id=36bf368c3c0d4ce2-8034bef11abe15cf backend_num=13
I0525 17:11:51.111650 23886 plan_fragment_executor.cpp:272] Open():
fragment_instance_id=36bf368c3c0d4ce2-8034bef11abe15cf
I0525 17:12:11.104353 24028 internal_service.cpp:167] cancel framgent,
fragment_instance_id=36bf368c3c0d4ce2-8034bef11abe15cf, reason: 3
I0525 17:12:11.104423 24028 plan_fragment_executor.cpp:536] cancel():
fragment_instance_id=36bf368c3c0d4ce2-8034bef11abe15cf
I0525 17:12:11.115552 23886 plan_fragment_executor.cpp:536] cancel():
fragment_instance_id=36bf368c3c0d4ce2-8034bef11abe15cf
I0525 17:12:11.116982 23886 plan_fragment_executor.cpp:594] Fragment
36bf368c3c0d4ce2-8034bef11abe15cf:(Active: 19s992ms, non-child: 0.00%)
```
In this case `PlanFragmentExecutor::open_internal` will return early at
https://github.com/apache/incubator-doris/blob/bc35f3a31f7004fd548a12c8dee3ab2e8c9e4f22/be/src/runtime/plan_fragment_executor.cpp#L288
Then RuntimeState will be deallocated before DataSink according their
declaration order.
```
boost::scoped_ptr<DataSink> _sink;
boost::scoped_ptr<RuntimeState> _runtime_state;
boost::scoped_ptr<RowBatch> _row_batch;
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]