John Doe created HBASE-30213:
--------------------------------

             Summary: Spurious RegionTooBusyException due to non-atomic update 
of correlated fields heapSize/offHeapSize in ThreadSafeMemStoreSizing
                 Key: HBASE-30213
                 URL: https://issues.apache.org/jira/browse/HBASE-30213
             Project: HBase
          Issue Type: Bug
          Components: regionserver
            Reporter: John Doe


A multi-variable concurrency bug in ThreadSafeMemStoreSizing can cause write 
threads to observe a transiently inflated MemStore size and throw a spurious 
RegionTooBusyException immediately after a flush 
completes.ThreadSafeMemStoreSizing maintains two semantically correlated 
AtomicLong fields, heapSize and offHeapSize, whose sum is used by 
HRegion.checkResources() to determine whether incoming writes should be 
rejected. During decrMemStoreSize(), these two fields are decremented by two 
separate addAndGet() calls with no common lock: offHeapSize is decremented 
first (line 59), heapSize second (line 60).

A concurrent write RPC thread calling getMemStoreSize() reads heapSize first 
and offHeapSize second (line 53). If the read falls between the two decrements, 
it observes the stale pre-flush heapSize combined with the already-decremented 
offHeapSize, producing a sum that overestimates the true MemStore size by the 
full heapSizeDelta of the flush.

If this inflated sum exceeds blockingMemStoreSize, checkResources() incorrectly 
throws RegionTooBusyException (HRegion.java:5029), even though the MemStore is 
already safely below the threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to