Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23246 )
Change subject: KUDU-3662 [5/n] Add metrics support to replication ...................................................................... Patch Set 5: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java File java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java: http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java@127 PS4, Line 127: private long getLastEndTimestamp() { : long hybridTime = ReflectionSecurityUtils.getLongFieldValue(lastEndTimestampField, delegate); : long[] parsed = HybridTimeUtil.HTTimestampToPhysicalAndLogical(hybridTime); : long epochMicros = parsed[0]; : return epochMicros / 1_000; : } > The physical timestamp alone provides all the necessary information to > understand replication progress, detect lag, and trigger appropriate alerts. OK, so this approach assumes it's not necessary to track changes that happened withing the same physical timestamp (i.e. same microsecond). I guess it's totally fine for most of the situations, but keep in mind that this approach doesn't allow to tell whether all the operations were replicated if there were many fast updates in the end of the recent activity, and the very last ones ended up assigned the same physical timestamp because they came in so closely. It's rare, but it still can happen. http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/test/java/org/apache/kudu/replication/TestReplicationMetrics.java File java/kudu-replication/src/test/java/org/apache/kudu/replication/TestReplicationMetrics.java: http://gerrit.cloudera.org:8080/#/c/23246/4/java/kudu-replication/src/test/java/org/apache/kudu/replication/TestReplicationMetrics.java@129 PS4, Line 129: > Initially we can have more delay due to job startup and initial snapshot sc OK, that sounds reasonable. To avoid introducing new flaky test, consider double-checking that the initial delay is enough in case of TSAN bits running on relatively busy machines. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/23246 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibfbd34c707e7539ee88863399ae3061683f8bb3b Gerrit-Change-Number: 23246 Gerrit-PatchSet: 5 Gerrit-Owner: Marton Greber <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Comment-Date: Thu, 04 Sep 2025 17:19:41 +0000 Gerrit-HasComments: Yes
