[kudu-CR] KUDU-3662 [5/n] Add metrics support to replication

Alexey Serbin (Code Review) Thu, 04 Sep 2025 10:19:57 -0700

Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23246 )


Change subject: KUDU-3662 [5/n] Add metrics support to replication
......................................................................


Patch Set 5: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java
File 
java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java:

http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java@127
PS4, Line 127:   private long getLastEndTimestamp() {
             :     long hybridTime = 
ReflectionSecurityUtils.getLongFieldValue(lastEndTimestampField, delegate);
             :     long[] parsed = 
HybridTimeUtil.HTTimestampToPhysicalAndLogical(hybridTime);
             :     long epochMicros = parsed[0];
             :     return epochMicros / 1_000;
             :   }
> The physical timestamp alone provides all the necessary information to 
> understand replication progress, detect lag, and trigger appropriate alerts.

OK, so this approach assumes it's not necessary to track changes that happened 
withing the same physical timestamp (i.e. same microsecond).  I guess it's 
totally fine for most of the situations, but keep in mind that this approach 
doesn't allow to tell whether all the operations were replicated if there were 
many fast updates in the end of the recent activity, and the very last ones 
ended up assigned the same physical timestamp because they came in so closely.  
It's rare, but it still can happen.


http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/test/java/org/apache/kudu/replication/TestReplicationMetrics.java
File 
java/kudu-replication/src/test/java/org/apache/kudu/replication/TestReplicationMetrics.java:

http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/test/java/org/apache/kudu/replication/TestReplicationMetrics.java@129
PS4, Line 129:
> Initially we can have more delay due to job startup and initial snapshot sc
OK, that sounds reasonable.  To avoid introducing new flaky test, consider 
double-checking that the initial delay is enough in case of TSAN bits running 
on relatively busy machines.

Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/23246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfbd34c707e7539ee88863399ae3061683f8bb3b
Gerrit-Change-Number: 23246
Gerrit-PatchSet: 5
Gerrit-Owner: Marton Greber <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Comment-Date: Thu, 04 Sep 2025 17:19:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3662 [5/n] Add metrics support to replication

Reply via email to