zhiqiang-hhhh opened a new pull request, #40495: URL: https://github.com/apache/doris/pull/40495
We have dead lock when submit scanner to scheduler failed. pstack looks like ```txt Thread 2012 (Thread 0x7f87363fb700 (LWP 4179707) "Pipe_normal [wo"): #0 0x00007f8b8f3dc82d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8b8f3d5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x000055b20f333e7a in __gthread_mutex_lock (__mutex=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default .h:749 #3 std::mutex::lock (this=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:100 #4 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<optimized out>) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229 #5 doris::vectorized::ScannerContext::append_block_to_queue (this=<optimized out>, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:234 #6 0x000055b20f32c0f9 in doris::vectorized::ScannerScheduler::submit (this=<optimized out>, ctx=..., scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:209 #7 0x000055b20f3338fc in doris::vectorized::ScannerContext::submit_scan_task (this=this@entry=0x7f8733d96010, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:217 #8 0x000055b20f3346cd in doris::vectorized::ScannerContext::get_block_from_queue (this=0x7f8733d96010, state=<optimized out>, block=0x7f871f728de0, eos=0x7f871abce470, id=<optimized out>) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:290 #9 0x000055b214cb4f13 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block (this=<optimized out>, state=0x7f872f0eb400, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f871abce470) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.cpp:1292 #10 0x000055b2142b5772 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block_after_projects (this=0x80, state=0x0, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.h:363 #11 0x000055b2142e7880 in doris::pipeline::StatefulOperatorX<doris::pipeline::StreamingAggLocalState>::get_block (this=0x7f871f9bee00, state=0x7f872f0eb400, block=0x7f8716d49060, eos=0x7f87363f4937) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/operator.cpp:587 ``` Deallock happens with following ```cpp Status ScannerContext::get_block_from_queue { std::unique_lock l(_transfer_lock); ... if (scan_task->is_eos()) { ... } else { // resubmit current running scanner to read the next block submit_scan_task(scan_task); } } ScannerContext::submit_scan_task(std::shared_ptr<ScanTask> scan_task) { _scanner_scheduler->submit(shared_from_this(), scan_task); } void ScannerScheduler::submit(std::shared_ptr<ScannerContext> ctx, std::shared_ptr<ScanTask> scan_task) { ... if (auto ret = sumbit_task(); !ret) { scan_task->set_status(Status::InternalError( "Failed to submit scanner to scanner pool reason:" + std::string(ret.msg()) + "|type:" + std::to_string(type))); ctx->append_block_to_queue(scan_task); return; } } void ScannerContext::append_block_to_queue(std::shared_ptr<ScanTask> scan_task) { ... std::lock_guard<std::mutex> l(_transfer_lock); ... } ``` Since mutex in cpp is not re-enterable, so the scanner thread will deadlock with itself. This pr fix the problem by making `ScannerScheduler::submit` return a Status instead of doing append failed task to the ScannerContext. The caller itself will decide where resubmit the scanner or just abort the execution of the query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org