zclllyybb opened a new pull request, #61782:
URL: https://github.com/apache/doris/pull/61782

   The callback's call() method may reuse the callback object (e.g., in 
vdata_stream_sender.h get_send_callback()), triggering a new RPC that mutates 
response_ and cntl_. If AutoReleaseClosure::Run() invokes call() before 
checking cntl_->Failed() or response_->status(), it reads the NEW RPC's state 
instead of the ORIGINAL RPC's result, causing:
   ```
   *** SIGSEGV address not mapped to object (@0x0) received by PID 238162 (TID 
240463 OR 0xfffa2c9898e0) from PID 0; stack trace: ***
    0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
    1# os::Linux::chained_handler(int, siginfo_t*, void*) in 
/opt/module/doris/java8/jre/lib/aarch64/server/libjvm.so
    2# JVM_handle_linux_signal in 
/opt/module/doris/java8/jre/lib/aarch64/server/libjvm.so
    3# signalHandler(int, siginfo_t*, void*) in 
/opt/module/doris/java8/jre/lib/aarch64/server/libjvm.so
    4# 0x0000FFFF0AB107C0 in linux-vdso.so.1
    5# doris::Status doris::Status::create<true>(doris::PStatus const&) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:398
    6# void doris::AutoReleaseClosure<doris::PTransmitDataParams, 
doris::pipeline::ExchangeSendCallback<doris::PTransmitDataResult> 
>::_process_status<doris::PTransmitDataResult>(doris::PTransmitDataResult*) at 
/home/zcp/repo_center/doris_release/doris/be/src/util/ref_count_closure.h:128
    7# doris::AutoReleaseClosure<doris::PTransmitDataParams, 
doris::pipeline::ExchangeSendCallback<doris::PTransmitDataResult> >::Run() at 
/home/zcp/repo_center/doris_release/doris/be/src/util/ref_count_closure.h:102
    8# brpc::Controller::EndRPC(brpc::Controller::CompletionInfo const&) in 
/opt/module/doris/be/lib/doris_be
    9# brpc::policy::ProcessRpcResponse(brpc::InputMessageBase*) in 
/opt/module/doris/be/lib/doris_be
   10# brpc::ProcessInputMessage(void*) in /opt/module/doris/be/lib/doris_be
   11# bthread::TaskGroup::task_runner(long) in 
/opt/module/doris/be/lib/doris_be
   12# bthread_make_fcontext in /opt/module/doris/be/lib/doris_be
   ```
   
   we have confirmed the data race is real existing with temporary LOGs which 
has been removed:
   ```
   F20260325 21:46:58.465230 3453395 brpc_closure.h:116] Check failed: 
_debug_generation_at_construction == current_gen (2 vs. 3) RACE DETECTED: 
AutoReleaseClosure response_ was reused by a new RPC (generation changed from 2 
to 3) while still in Run(). The old closure is about to read 
response_->status() but the new RPC may be concurrently writing to the same 
response_ object.
   ```
   
   and we add some be-ut which could only pass WITH this patch.
   before we fix:
   ```
   [----------] 7 tests from ExchangeSinkTest (5 ms total)
   
   [----------] Global test environment tear-down
   [==========] 7 tests from 1 test suite ran. (5 ms total)
   [  PASSED  ] 4 tests.
   [  FAILED  ] 3 tests, listed below:
   [  FAILED  ] ExchangeSinkTest.test_closure_call_must_not_corrupt_status_check
   [  FAILED  ] ExchangeSinkTest.test_closure_call_must_not_hide_error_status
   [  FAILED  ] ExchangeSinkTest.test_closure_call_must_not_hide_rpc_failure
   ```
   
   after:
   ```
   [----------] 7 tests from ExchangeSinkTest (4 ms total)
   
   [----------] Global test environment tear-down
   [==========] 7 tests from 1 test suite ran. (5 ms total)
   [  PASSED  ] 7 tests.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to