[jira] [Comment Edited] (HBASE-29463) Bidirectional serial replication will block if a region’s last edit before rs crashed was from the peer cluster

haosen chen (Jira) Sun, 20 Jul 2025 23:53:34 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18008605#comment-18008605
 ]


haosen chen edited comment on HBASE-29463 at 7/21/25 6:52 AM:
--------------------------------------------------------------

[~zhangduo] Sure. As edit 59-60 are generated within Cluster B, their 
clusterIds inherently contain the ID of Cluster B, thereby causing them to be 
filtered by the ClusterMarkingEntryFilter.


was (Author: JIRAUSER285292):
[~zhangduo] Sure. As edit 59-60 are generated within Cluster B, their 
clusterIds inherently contain the ID of Cluster B, thereby causing them to be 
filtered by the ClusterMarkingEntryFilter.

> Bidirectional serial replication will block if a region’s last edit before rs 
> crashed was from the peer cluster
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29463
>                 URL: https://issues.apache.org/jira/browse/HBASE-29463
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.4.5
>            Reporter: haosen chen
>            Priority: Major
>         Attachments: image-2025-07-21-14-52-19-057.png, 
> image-2025-07-21-14-52-47-751.png
>
>
> For two HBase clusters that enable bidirectional replication and set up 
> serial replication, when a region in cluster A received last edit from peer 
> cluster before RS crashed, the replication from cluster A to B will block. 
> Because in this situation, the HBase replication system will wait until the 
> last pushed sequence id reaches the new barrier but edit received from peer 
> cluster will never be pushed.
> When Region r1 in Cluster A pushes its last edit (e.g., seqID 58) to Cluster 
> B and subsequently received two additional edits (seqID 59–60) from Cluster B 
> and then the rs crashed, Region r1 will be reopened on another RegionServer 
> and set a new barrier at seqID 61. However, edits 59–60 will never be pushed 
> to Cluster B again, causing the _last pushed sequenceId_ to stagnate. As a 
> result, the {{SerialReplicationChecker}} will repeatedly fail its checks.
> The new RS will keep print DEBUG LOG:
> 2025-07-14 20:05:53,953 DEBUG 
> [RS_OPEN_REGION-regionserver/172.16.0.43:6002-0.replicationSource.wal-reader.172.16.0.43%2C6002%2C1752216296629.172.16.0.43%2C6002%2C1752216296629.regiongroup-1,1]
>  regionserver.SerialReplicationChecker: Replication barrier for 
> test1/46b4ecbd63d7fbcb16d68e106f904013/30=[#edits: 0 = <>]: 
> ReplicationBarrierResult [barriers=[23, 29, 68], state=OPEN, 
> parentRegionNames=]
> 2025-07-14 20:05:53,953 DEBUG 
> [RS_OPEN_REGION-regionserver/172.16.0.43:6002-0.replicationSource.wal-reader.172.16.0.43%2C6002%2C1752216296629.172.16.0.43%2C6002%2C1752216296629.regiongroup-1,1]
>  regionserver.SerialReplicationChecker: Previous range for 
> test1/46b4ecbd63d7fbcb16d68e106f904013/30=[#edits: 0 = <>] has not been 
> finished yet, give up
> 2025-07-14 20:05:53,953 DEBUG 
> [RS_OPEN_REGION-regionserver/172.16.0.43:6002-0.replicationSource.wal-reader.172.16.0.43%2C6002%2C1752216296629.172.16.0.43%2C6002%2C1752216296629.regiongroup-1,1]
>  regionserver.SerialReplicationChecker: Can not push 
> test1/46b4ecbd63d7fbcb16d68e106f904013/30=[#edits: 0 = <>], wait



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HBASE-29463) Bidirectional serial replication will block if a region’s last edit before rs crashed was from the peer cluster

Reply via email to