[ https://issues.apache.org/jira/browse/HBASE-29463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18008605#comment-18008605 ]
haosen chen edited comment on HBASE-29463 at 7/21/25 6:52 AM: -------------------------------------------------------------- [~zhangduo] Sure. As edit 59-60 are generated within Cluster B, their clusterIds inherently contain the ID of Cluster B, thereby causing them to be filtered by the ClusterMarkingEntryFilter. was (Author: JIRAUSER285292): [~zhangduo] Sure. As edit 59-60 are generated within Cluster B, their clusterIds inherently contain the ID of Cluster B, thereby causing them to be filtered by the ClusterMarkingEntryFilter. > Bidirectional serial replication will block if a region’s last edit before rs > crashed was from the peer cluster > --------------------------------------------------------------------------------------------------------------- > > Key: HBASE-29463 > URL: https://issues.apache.org/jira/browse/HBASE-29463 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 2.4.5 > Reporter: haosen chen > Priority: Major > Attachments: image-2025-07-21-14-52-19-057.png, > image-2025-07-21-14-52-47-751.png > > > For two HBase clusters that enable bidirectional replication and set up > serial replication, when a region in cluster A received last edit from peer > cluster before RS crashed, the replication from cluster A to B will block. > Because in this situation, the HBase replication system will wait until the > last pushed sequence id reaches the new barrier but edit received from peer > cluster will never be pushed. > When Region r1 in Cluster A pushes its last edit (e.g., seqID 58) to Cluster > B and subsequently received two additional edits (seqID 59–60) from Cluster B > and then the rs crashed, Region r1 will be reopened on another RegionServer > and set a new barrier at seqID 61. However, edits 59–60 will never be pushed > to Cluster B again, causing the _last pushed sequenceId_ to stagnate. As a > result, the {{SerialReplicationChecker}} will repeatedly fail its checks. > The new RS will keep print DEBUG LOG: > 2025-07-14 20:05:53,953 DEBUG > [RS_OPEN_REGION-regionserver/172.16.0.43:6002-0.replicationSource.wal-reader.172.16.0.43%2C6002%2C1752216296629.172.16.0.43%2C6002%2C1752216296629.regiongroup-1,1] > regionserver.SerialReplicationChecker: Replication barrier for > test1/46b4ecbd63d7fbcb16d68e106f904013/30=[#edits: 0 = <>]: > ReplicationBarrierResult [barriers=[23, 29, 68], state=OPEN, > parentRegionNames=] > 2025-07-14 20:05:53,953 DEBUG > [RS_OPEN_REGION-regionserver/172.16.0.43:6002-0.replicationSource.wal-reader.172.16.0.43%2C6002%2C1752216296629.172.16.0.43%2C6002%2C1752216296629.regiongroup-1,1] > regionserver.SerialReplicationChecker: Previous range for > test1/46b4ecbd63d7fbcb16d68e106f904013/30=[#edits: 0 = <>] has not been > finished yet, give up > 2025-07-14 20:05:53,953 DEBUG > [RS_OPEN_REGION-regionserver/172.16.0.43:6002-0.replicationSource.wal-reader.172.16.0.43%2C6002%2C1752216296629.172.16.0.43%2C6002%2C1752216296629.regiongroup-1,1] > regionserver.SerialReplicationChecker: Can not push > test1/46b4ecbd63d7fbcb16d68e106f904013/30=[#edits: 0 = <>], wait -- This message was sent by Atlassian Jira (v8.20.10#820010)