Mryange opened a new pull request, #57888:
URL: https://github.com/apache/doris/pull/57888
### What problem does this PR solve?
We previously had a crash. The cause is that we should not access the
request after calling add_block(...) because add_block may enqueue a closure
that runs on another thread and frees the request
```
==730145==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x7be1efd803a0 at pc 0x556b38d9d625 bp 0x7b16bf0193f0 sp 0x7b16bf0193e8
READ of size 4 at 0x7be1efd803a0 thread T1559
#0 0x556b38d9d624 in
google::protobuf::internal::RepeatedPtrFieldBase::size() const
/home/zcp/repo_center/doris_master/doris/thirdparty/installed/include/google/protobuf/repeated_ptr_field.h:185:29
#1 0x556b408ab062 in
google::protobuf::RepeatedPtrField<doris::PBlock>::size() const
/home/zcp/repo_center/doris_master/doris/thirdparty/installed/include/google/protobuf/repeated_ptr_field.h:1248:32
#2 0x556b408aaff4 in doris::PTransmitDataParams::_internal_blocks_size()
const
/home/zcp/repo_center/doris_master/doris/be/../gensrc/build/gen_cpp/internal_service.pb.h:32149:25
#3 0x556b4089731c in doris::PTransmitDataParams::blocks_size() const
/home/zcp/repo_center/doris_master/doris/be/../gensrc/build/gen_cpp/internal_service.pb.h:32152:10
#4 0x556b60a83c17 in
doris::vectorized::VDataStreamMgr::transmit_block(doris::PTransmitDataParams
const*, google::protobuf::Closure**, long)
/home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdata_stream_mgr.cpp:150:38
#5 0x556b407f7408 in
doris::PInternalService::_transmit_block(google::protobuf::RpcController*,
doris::PTransmitDataParams const*, doris::PTransmitDataResult*,
google::protobuf::Closure*, doris::Status const&, long)
/home/zcp/repo_center/doris_master/doris/be/src/service/internal_service.cpp:1673:40
#6 0x556b407f52bb in
doris::PInternalService::transmit_block(google::protobuf::RpcController*,
doris::PTransmitDataParams const*, doris::PTransmitDataResult*,
google::protobuf::Closure*)
/home/zcp/repo_center/doris_master/doris/be/src/service/internal_service.cpp:1610:9
#7 0x556b43fceba2 in
doris::PBackendService::CallMethod(google::protobuf::MethodDescriptor const*,
google::protobuf::RpcController*, google::protobuf::Message const*,
google::protobuf::Message*, google::protobuf::Closure*)
/home/zcp/repo_center/doris_master/doris/gensrc/build/gen_cpp/internal_service.pb.cc:49452:7
#8 0x556b6736273e in
brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770bd73e)
#9 0x556b67357426 in brpc::ProcessInputMessage(void*)
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770b2426)
#10 0x556b67357f20 in
brpc::InputMessenger::InputMessageClosure::~InputMessageClosure()
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770b2f20)
#11 0x556b673588dd in brpc::InputMessenger::OnNewMessages(brpc::Socket*)
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770b38dd)
#12 0x556b674a0adc in brpc::Socket::ProcessEvent(void*)
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x771fbadc)
#13 0x556b672e0f76 in bthread::TaskGroup::task_runner(long)
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x7703bf76)
#14 0x556b672cbbe0 in bthread_make_fcontext
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x77026be0)
```
```
Note: The done pointer will be saved in add_block and may be called
in another thread via done->Run().
For example, when blocks_size == 1, the process is as follows:
transmit_block (i=0)
└─> recvr->add_block(..., done, ...) // Pass done
└─> SenderQueue::add_block
└─> _pending_closures.push(done) // done is saved
get_batch() [another thread]
└─> closure_pair.first->Run() // ⚠️ done->Run() is called
└─> brpc releases request and response
transmit_block (i=1) [original thread continues]
└─> request->blocks_size() // ⚠️ request has already been
released!
At this point, a use-after-free issue occurs.
TODO: We should consider refactoring this part because add_block may
release the request.
We should not access the request after calling add_block.
```
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]