Apache9 commented on code in PR #7528:
URL: https://github.com/apache/hbase/pull/7528#discussion_r2616198948


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##########
@@ -216,7 +216,7 @@ public int getTimeout() {
    * the context are assumed to be persisted in the target cluster.

Review Comment:
   I prefer we control it by time/size limit.
   
   Even if the endpoint can persist the data after every shipment, we do not 
need to record the offset every time right? We just need to make sure that once 
the `ReplicationSourceShipper` want to record the offset, all the data before 
this offset has been persistent. So we can introduce a 
`beforePersistingReplicationOffset` method for replication endpoint, if you 
persist the data after every shipment, you just need to do nothing. If it is S3 
based endpoint, we close the output file to persist the data.
   
   In this way, the ReplicationSourceShipper does not need to know whether the 
endpoint can persist the data or not after every shipment. And in the future, 
for HBaseInterClusterReplicationEndpoint, we could also introduce some 
asynchronous mechanism to increase performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to