qqvpp opened a new pull request, #7035:
URL: https://github.com/apache/hbase/pull/7035

   **Description**
   We have identified a bug in the `SimpleRegionNormalizer` logic that leads to 
incorrect region splits when region size information is missing. If the size 
cannot be determined for one or more regions (e.g. due to unavailable metrics 
from `RegionServers`), the average region size calculation becomes incorrect. 
This results in a scenario where all regions may be considered too large and 
get split unintentionally.
   
   **Observed Behavior:**
   
   When region size data is not available (e.g., `getRegionSizeMB()` returns 
-1), the computed average size does not account for that, and regions with 
valid size may appear excessively large compared to the average — resulting in 
multiple unnecessary splits.
   
   **Expected Behavior:**
   
   If region size is unknown for some regions, those regions should be skipped 
during normalization. The average region size should be computed only from the 
regions for which the size is known. No region should be split or merged unless 
its size is known.
   
   **Patch:**
   
   Skips regions with unknown size from average size computation.
   Prevents split and merge operations on regions with unknown size.
   Adds unit tests for scenarios with partial or total absence of size data.
   Patch author: Milan Vymazal <[email protected]>
   
   **Tests:**
   
   `testSplitOfLargeRegionIfOneIsNotKnow` verifies correct behavior when one 
region has unknown size.
   `testSplitOfAllUnknownSize` ensures that no split happens if size data is 
missing for all regions.
   Reproduction:
   
   Unfortunately, we are unable to reliably reproduce this bug in a live 
environment, since we cannot easily simulate the condition where RegionServer 
metrics are missing. However, we have confirmed the behavior through code 
analysis and the added unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to