Apache9 commented on code in PR #7528:
URL: https://github.com/apache/hbase/pull/7528#discussion_r2610485310
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java:
##########
@@ -459,7 +458,8 @@ public ReplicationResult replicate(ReplicateContext
replicateContext) {
try {
// replicate the batches to sink side.
parallelReplicate(replicateContext, batches);
- return ReplicationResult.COMMITTED;
+
getReplicationSource().cleanupHFileRefsAndPersistOffsets(replicateContext.getEntries());
Review Comment:
OK I think I get why you call this method here, since here we can make sure
that the wal entries have been persistent, it is OK for us to persist the
offset. But for me, I prefer we follow the old way where call this in
ReplicationSourceShipper.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/VerifyWALEntriesReplicationEndpoint.java:
##########
@@ -59,10 +60,11 @@ private void checkCell(Cell cell) {
}
@Override
- public ReplicationResult replicate(ReplicateContext replicateContext) {
+ public boolean replicate(ReplicateContext replicateContext) throws
IOException {
replicateContext.entries.stream().map(WAL.Entry::getEdit).flatMap(e ->
e.getCells().stream())
.forEach(this::checkCell);
- return ReplicationResult.COMMITTED;
+
getReplicationSource().cleanupHFileRefsAndPersistOffsets(replicateContext.getEntries());
Review Comment:
Why we need to call this here?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##########
@@ -246,42 +230,11 @@ void shipEdits(WALEntryBatch entryBatch) {
}
}
- private ReplicationResult getReplicationResult() {
Review Comment:
As said above, we should keep these methods here, and before calling these
methods, we call the method in ReplicationEndpoint out to flush data out.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##########
@@ -216,7 +216,7 @@ public int getTimeout() {
* the context are assumed to be persisted in the target cluster.
Review Comment:
What I mean is that, we should add a method may be called
`beforePersistingReplicationOffset`, and call it before we call
updateLogPosition in ReplicationSourceShipper method. For old implementation,
we just do nothing as we can make sure that every thing is persistent, and for
S3 based endpoint, we close the writer to persist the data on S3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]