[
https://issues.apache.org/jira/browse/HBASE-29364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhiwen Deng updated HBASE-29364:
--------------------------------
Description:
We have encountered multiple instances where regions were opened on
RegionServers (RS) that had already been offlined. It wasn't until recently
that we discovered a potential cause for this issue, and the details of the
problem are as follows:
Our HDFS storage reached the online level, which caused the upper-level master
and rs to be unable to write and abort. Finally, we manually accessed and
deleted some data, and HDFS was restored. Then the hbck report showed that some
regions were opened on the offline rs, which caused these regions to be unable
to server. We finally used hbck2 to assigns these regions, and the problem was
solved.
Here is the analysis of the region transition for one specific region:
19f709990ad65ce3d51ddeaf29acf436:
*
2025-05-21, 05:48:11 : The region was assigned to
rs-hostname,20700,1747777624803, but due to some anomalies, it could not be
opened on the target RS. At this point, the RS reported the open result to the
Master:
{code:java}
2025-05-21,05:48:11,646 INFO
[RpcServer.priority.RWQ.Fifo.write.handler=2,queue=0,port=20600]
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase: Received
report from rs-hostname,20700,1747777624803, transitionCode=FAILED_OPEN,
seqId=-1, regionNode=state=OPENING, location=rs-hostname,20700,1747777624803,
table=test:xxx, region=19f709990ad65ce3d51ddeaf29acf436, proc=pid=78499,
ppid=78034, state=RUNNABLE;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure {code}
was:
We have encountered multiple instances where regions were opened on
RegionServers (RS) that had already been offlined. It wasn't until recently
that we discovered a potential cause for this issue, and the details of the
problem are as follows:
Our HDFS storage reached the online level, which caused the upper-level master
and rs to be unable to write and abort. Finally, we manually accessed and
deleted some data, and HDFS was restored. Then the hbck report showed that some
regions were opened on the offline rs, which caused these regions to be unable
to server. We finally used hbck2 to assigns these regions, and the problem was
solved.
> Region will be opened in unknown regionserver when master is changed & rs
> crashed
> ---------------------------------------------------------------------------------
>
> Key: HBASE-29364
> URL: https://issues.apache.org/jira/browse/HBASE-29364
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.3.0
> Reporter: Zhiwen Deng
> Priority: Major
>
> We have encountered multiple instances where regions were opened on
> RegionServers (RS) that had already been offlined. It wasn't until recently
> that we discovered a potential cause for this issue, and the details of the
> problem are as follows:
> Our HDFS storage reached the online level, which caused the upper-level
> master and rs to be unable to write and abort. Finally, we manually accessed
> and deleted some data, and HDFS was restored. Then the hbck report showed
> that some regions were opened on the offline rs, which caused these regions
> to be unable to server. We finally used hbck2 to assigns these regions, and
> the problem was solved.
> Here is the analysis of the region transition for one specific region:
> 19f709990ad65ce3d51ddeaf29acf436:
> *
> 2025-05-21, 05:48:11 : The region was assigned to
> rs-hostname,20700,1747777624803, but due to some anomalies, it could not be
> opened on the target RS. At this point, the RS reported the open result to
> the Master:
> {code:java}
> 2025-05-21,05:48:11,646 INFO
> [RpcServer.priority.RWQ.Fifo.write.handler=2,queue=0,port=20600]
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase: Received
> report from rs-hostname,20700,1747777624803, transitionCode=FAILED_OPEN,
> seqId=-1, regionNode=state=OPENING, location=rs-hostname,20700,1747777624803,
> table=test:xxx, region=19f709990ad65ce3d51ddeaf29acf436, proc=pid=78499,
> ppid=78034, state=RUNNABLE;
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)