[ 
https://issues.apache.org/jira/browse/HBASE-29660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029522#comment-18029522
 ] 

Duo Zhang commented on HBASE-29660:
-----------------------------------

So the problem here is that, master issues a open region request to region 
server when region server is already opened, then we will still schedule an 
AssignRegionHandler and put it into the cache map, but in AssignRegionHandler, 
we will just return without clearing the cache map.

This is a problem, we should always clear the cache map when 
AssignRegionHandler finishes, but I do not think we should make it a cache 
object.

Thanks.

> submittedRegionProcedures data leak in HRegionServer when region open failed.
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-29660
>                 URL: https://issues.apache.org/jira/browse/HBASE-29660
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: chaijunjie
>            Priority: Major
>
> There are 2 cache/map to track region Procedure in HRegionServer...
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L268C3-L268C97
> and
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L273
> When RS want to submit same region procedure, will ignore it.
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L3596C1-L3596C47
> But the executedRegionProcedures is a cache object, it will clean itself, but 
> submittedRegionProcedures will not, so when the region open failed on RS, it 
> will just return, see
> https://github.com/apache/hbase/blob/e57552521e2c228e8ec8d09ebef4657ba8a2b1d9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java#L146
> Then the rs.finishRegionProcedure never called in this RS for this region...
> Some times, the "MasterData" dir lost data in HDFS...I try to fix it, and 
> recreate a master region(Restart HMaster), but after that some region could 
> not open...(tigger by balancer/SCP)...just found these logs....then we need 
> restart many RegionServers...
> I think we could call finishRegionProcedure in cleanUpAndReportFailure method 
> after report it to master succeed..And also could set 
> submittedRegionProcedures as a cache object not a map to avoid...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to