liaoxin01 opened a new pull request, #64574:
URL: https://github.com/apache/doris/pull/64574

   ## Problem
   
   The tablet header lock (`_meta_lock`) is a `std::shared_mutex`, which under 
libstdc++ wraps `pthread_rwlock_t` and is **thread-affine**: unlocking it from 
an OS thread other than the one that acquired it is undefined behavior.
   
   In cloud mode the header **write** lock can be held across a **suspending** 
call. Concretely, when a query-driven rowset sync pulls an overlapping 
(compacted) rowset, `CloudTablet::add_rowsets` warms up the new remote file 
**while holding the write lock**; on a cold-restarted BE, resolving the storage 
vault / building the S3 client issues a meta-service RPC. The holding bthread 
suspends and may **migrate to another worker pthread**, so the matching unlock 
runs on a **different OS thread**. This corrupts the glibc rwlock (the 
write-locked bit is left set with no owner), **permanently wedging** the lock — 
all readers/writers on that tablet pile up and queries time out (~90s).
   
   Observed in graceful-restart tests: a tablet's header lock stuck with 
`active_writer=[none]` yet every `try_lock`/`try_lock_shared` failing for >70 
min, and the last writer's acquire OS-tid differing from its release OS-tid 
(proof of the cross-thread unlock).
   
   ## Fix
   
   Replace `_meta_lock` with `BthreadSharedMutex`, a port of libc++'s 
`std::shared_mutex` (the two-gate condition-variable algorithm) onto 
`bthread::Mutex` / `bthread::ConditionVariable`:
   
   - Ownership is an integer state guarded by a briefly held internal mutex and 
carries **no OS-thread identity**, so locking on one worker and unlocking on 
another after a bthread migration is well defined — the permanent wedge can no 
longer happen.
   - Waiting blocks on a bthread condition variable, suspending the bthread 
instead of blocking the worker.
   - Writer-preferring; satisfies the C++ SharedMutex requirements, so it is a 
drop-in with `std::unique_lock` / `std::shared_lock`.
   
   Only the tablet header lock is switched; unrelated `std::shared_mutex` 
members (e.g. `TabletMeta::_meta_lock`) are left untouched. Call sites that 
named the type explicitly are converted to class template argument deduction or 
to `BthreadSharedMutex`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to