Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/24096 )
Change subject: KUDU-3692 Export entity attributes as Prometheus labels ...................................................................... Patch Set 3: Code-Review+1 (22 comments) Thanks a lot for putting together this patch! Overall LGTM, just a few nits and questions. Also, maybe we should revisit turning this new Prometheus metrics format by default. http://gerrit.cloudera.org:8080/#/c/24096/3//COMMIT_MSG Commit Message: PS3: Just to clarify: this patch doesn't introduce host/node attribute into metrics yet, correct? If so, are you planning to add those in a separate patch, if at all? Thanks! http://gerrit.cloudera.org:8080/#/c/24096/3//COMMIT_MSG@12 PS3, Line 12: and attributes I guess only the ID was embedded in metric names, right? http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc File src/kudu/util/metrics-test.cc: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@152 PS3, Line 152: FLAGS_prometheus_use_entity_labels = true; nit: use google::FlagSaver if it's necessary to rollback the customized setting http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@184 PS3, Line 184: // HELP/TYPE is emitted once per metric name (deduped). It would be great to add automated scenario to verify this. IIUC, the current approach with ASSERT_STR_CONTAINS() doesn't verify the uniqueness of the HELP/TYPE lines. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@247 PS3, Line 247: FLAGS_prometheus_use_entity_labels = true; Instead, use google::FlagSaver for more robust flags' rollback. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@837 PS3, Line 837: FLAGS_prometheus_use_entity_labels = true; Use google::FlagSaver instead http://gerrit.cloudera.org:8080/#/c/24096/2/src/kudu/util/metrics.cc File src/kudu/util/metrics.cc: http://gerrit.cloudera.org:8080/#/c/24096/2/src/kudu/util/metrics.cc@42 PS2, Line 42: fals > Thank you for the reminder. I overlooked this point earlier. Marton, which later release you are thinking about? May 1.19.0 be such a release? Before this patch the Prometheus format metrics in Kudu were not quite usable, IMO. This update moves towards widely adopted notation that allows for metrics aggregation and better analysis, so for most users this change would be a welcome update, I guess. I don't think we need a two-phase approach here with a deprecation warning first, we might just go ahead and document the change in this behavior in release notes for Kudu 1.19.0, and also document the migration steps. Thoughts? http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc File src/kudu/util/metrics.cc: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@42 PS3, Line 42: prometheus_use_entity_labels Consider naming it to show it's related to metrics, e.g. metrics_prometheus_use_entity_labels or similar http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@854 PS3, Line 854: const string& prefix, style nit: move this into a separate line a well once switching to multi-line parameter listing http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1015 PS3, Line 1015: const string& prefix, style nit: move this into a separate line a well once switching to multi-line parameter listing http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1092 PS3, Line 1092: const string& prefix, style nit: move this into a separate line a well once switching to multi-line parameter listing http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1175 PS3, Line 1175: // New format: labels injected, no space after comma, _sum/_count carry unit_type. : static constexpr const char* const kFmt = : "$0$1{$2unit_type=\"$3\",quantile=\"$4\"} $5\n"; : static constexpr const char* const kSumCountFmt = : "$0$1{$2unit_type=\"$3\"} $4\n"; : // Legacy format: no labels, space after comma, _sum/_count have no labels. : static constexpr const char* const kLegacyFmt = : "$0$1{unit_type=\"$2\", quantile=\"$3\"} $4\n"; : static constexpr const char* const kLegacySumCountFmt = : "$0$1 $2\n"; : static constexpr const char* const kHelpTypeFmt = : "# HELP $0$1 $2\n# TYPE $3$4 $5\n"; nit: split this block and move its parts under the corresponding 'if (FLAGS_prometheus_use_entity_labels) { ... } else { ... }' scope http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h File src/kudu/util/prometheus_writer.h: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h@69 PS3, Line 69: std::string BuildServerPrometheusLabels( nit: add description for this utility function http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h@73 PS3, Line 73: std::string BuildTableTabletPrometheusLabels( nit: add description for this utility function http://gerrit.cloudera.org:8080/#/c/24096/1/src/kudu/util/prometheus_writer.cc File src/kudu/util/prometheus_writer.cc: http://gerrit.cloudera.org:8080/#/c/24096/1/src/kudu/util/prometheus_writer.cc@56 PS1, Line 56: > Yes, I agree with you. The "partition" labels can become very long, increas +1 It's a good call. BTW, any plans for node/host metric attribution? http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc File src/kudu/util/prometheus_writer.cc: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@31 PS3, Line 31: std:: style nit for there and elsewhere in this .cc file: add 'using std::string;' and other corresponding 'using' directives in the beginning of the file, and drop the 'std::' namespace prefix http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@56 PS3, Line 56: nit: consider adding DCHECK() here to make sure 'entity_type' == "tablet" http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@66 PS3, Line 66: "kudu.master" Make this a constant to use here and in WriteAsPrometheus() as well. Ideally, that constant should be used in src/kudu/master/master.cc as well -- that's where the metrics namespace is assigned for all the metrics emitted by kudu-master. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@68 PS3, Line 68: "kudu.tabletserver" Make this a constant to use here and in WriteAsPrometheus() as well. Ideally, that constant should be used in src/kudu/tserver/tablet_server.cc as well -- that's where the metrics namespace is assigned for all the metrics emitted by kudu-tserver. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@73 PS3, Line 73: LOG(ERROR) nit: change to LOG(DFATAL) -- in release build it will be LOG(ERROR), but in debug it will into LOG(FATAL) for faster/easier troubleshooting http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@78 PS3, Line 78: auto uuid = attrs.find("uuid"); : if (uuid != attrs.end()) { style nit: switch to using FindOrNull() or other helper from src/kudu/gutil/map-util.h http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@91 PS3, Line 91: EscapePrometheusLabelValue nit for here and elsewhere when well-known and controlled input is used: is there a way not to spend extra CPU cycles and memory to process strings that shouldn't have any symbols needing special treatment (e.g., current set of Kudu metric entities doesn't need any escaping, IIUC)? Instead, consider adding a DCHECK() to verify that there aren't any symbols here that require special treatment. -- To view, visit http://gerrit.cloudera.org:8080/24096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I690918d06f19c393369b8fc43c3ec90dd2231d3d Gerrit-Change-Number: 24096 Gerrit-PatchSet: 3 Gerrit-Owner: Yan-Daojiang <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Yan-Daojiang <[email protected]> Gerrit-Comment-Date: Wed, 18 Mar 2026 20:10:19 +0000 Gerrit-HasComments: Yes
