Yan-Daojiang has posted comments on this change. ( http://gerrit.cloudera.org:8080/24096 )
Change subject: KUDU-3692 Export entity attributes as Prometheus labels ...................................................................... Patch Set 4: (22 comments) Hi Alexey, Thank you for the detailed and thorough review. All comments and issues have been addressed. I would really appreciate it if you could take another look when you have time. Thanks again for your help! http://gerrit.cloudera.org:8080/#/c/24096/3//COMMIT_MSG Commit Message: PS3: > Just to clarify: this patch doesn't introduce host/node attribute into metr Correct, this patch does not introduce host/node attributes into metrics yet. I'm planning to add node-related labels in a follow-up patch. Since this information is not currently part of the metrics framework, I need to further investigate and design how the node-level labels should be structured and propagated. Will follow up on that separately. http://gerrit.cloudera.org:8080/#/c/24096/3//COMMIT_MSG@12 PS3, Line 12: and part of at > I guess only the ID was embedded in metric names, right? Yes, only the ID is embedded in metric names currently. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc File src/kudu/util/metrics-test.cc: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@152 PS3, Line 152: EST_F(MetricsTest, TableAndTabletPrometheus > nit: use google::FlagSaver if it's necessary to rollback the customized set Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@247 PS3, Line 247: const auto& out = output.str(); > Instead, use google::FlagSaver for more robust flags' rollback. Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@837 PS3, Line 837: hist->IncrementBy(4, 40); > Use google::FlagSaver instead Done http://gerrit.cloudera.org:8080/#/c/24096/2/src/kudu/util/metrics.cc File src/kudu/util/metrics.cc: http://gerrit.cloudera.org:8080/#/c/24096/2/src/kudu/util/metrics.cc@42 PS2, Line 42: labe > Marton, which later release you are thinking about? May 1.19.0 be such a r Thanks for the suggestion. I agree that the legacy format is not quite usable for standard Prometheus workflows, and the new label-based format is the right direction. For now I've kept the flag metrics_prometheus_use_entity_labels defaulting to false to avoid breaking existing environments that may have Grafana dashboards or alerting rules built on the legacy metric names. Flipping the default would immediately invalidate those setups on upgrade. That said, I'm not very familiar with the community's release process and migration conventions, so I'd appreciate your and others' input on the right timing and approach for changing the default. I'm happy to help with whatever is needed — release notes, migration docs, or follow-up patches. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc File src/kudu/util/metrics.cc: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@42 PS3, Line 42: metrics_prometheus_use_entit > Consider naming it to show it's related to metrics, e.g. metrics_prometheus Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@854 PS3, Line 854: > style nit: move this into a separate line a well once switching to multi-li Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1015 PS3, Line 1015: st { > style nit: move this into a separate line a well once switching to multi-li Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1092 PS3, Line 1092: > style nit: move this into a separate line a well once switching to multi-li Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1175 PS3, Line 1175: }; : static constexpr const char* const kHelpTypeFmt = : "# HELP $0$1 $2\n# TYPE $3$4 $5\n"; : : const char* const name = prototype_->name(); : DCHECK(name); : const char* const unit = MetricUnit::Name(prototype_->unit()); : DCHECK(unit); : : // A snapshot is taken to have more consistent statistics while generating : // the output. : const HdrHistogram h(*histogram_); > nit: split this block and move its parts under the corresponding 'if (FLAGS Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1226 PS3, Line 1226: } else { : // Legacy format: no labels, space after comma, _sum/_count have no labels. > It seems this code generates incorrect unit for the summary count. In the Good catch! The _count metric indeed had two issues: Issue 1: unit_type label — _count is a dimensionless counter, but unit_type was incorrectly set to the histogram's native unit. Fixed by using MetricUnit::Name(MetricUnit::kUnits) for _count lines, while keeping the original unit for quantile lines and _sum. Issue 2: # HELP text — The HELP description for _count was generated from prototype_->description(), which often contains unit-specific wording like "Microseconds spent on appending to the log segment file". Fixed by using prototype_->label(), producing "Log Append Latency (count)" instead. Now: # HELP kudu_log_append_latency_count Log Append Latency (count) # TYPE kudu_log_append_latency_count counter kudu_log_append_latency_count{...,unit_type="units"} 6 Changes: metrics.cc: Fixed both issues in 3 codepaths — Histogram (new format), Histogram (legacy format), and MeanGauge. metrics-test.cc: Updated existing tests (HistogramPrometheusTest, HistogramLegacyPrometheusTest, MeanGaugePrometheusTest) and added 2 new dedicated tests (HistogramPrometheusCountUnitTest, MeanGaugePrometheusCountUnitTest) that verify: 1. _count carries unit_type="units"; 2. _count does NOT carry the native unit; 3. _count HELP uses label-based text, not description. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h File src/kudu/util/prometheus_writer.h: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h@69 PS3, Line 69: return labels + ","; > nit: add description for this utility function Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h@73 PS3, Line 73: // Returns labels of the form: type="master",id="<uuid>" > nit: add description for this utility function Done http://gerrit.cloudera.org:8080/#/c/24096/1/src/kudu/util/prometheus_writer.cc File src/kudu/util/prometheus_writer.cc: http://gerrit.cloudera.org:8080/#/c/24096/1/src/kudu/util/prometheus_writer.cc@56 PS1, Line 56: > +1 Thanks! Yes, I'm planning to add node/host labels in a follow-up patch. http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc File src/kudu/util/prometheus_writer.cc: http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@31 PS3, Line 31: > style nit for there and elsewhere in this .cc file: add 'using std::string; Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@56 PS3, Line 56: > nit: consider adding DCHECK() here to make sure 'entity_type' == "tablet" Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@66 PS3, Line 66: verPrometheus > Make this a constant to use here and in WriteAsPrometheus() as well. Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@68 PS3, Line 68: rometheusLabels(ent > Make this a constant to use here and in WriteAsPrometheus() as well. Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@73 PS3, Line 73: const unor > nit: change to LOG(DFATAL) -- in release build it will be LOG(ERROR), but i Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@78 PS3, Line 78: labels = "type=\"tserver\""; : } else { > style nit: switch to using FindOrNull() or other helper from src/kudu/gutil Done http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@91 PS3, Line 91: > nit for here and elsewhere when well-known and controlled input is used: is Good point, thanks for raising this. I’ve applied the DCHECK-instead-of-escape approach for uuid (line 86–87), since it is internally generated and guaranteed to contain only hex characters. For the remaining call sites: BuildServerPrometheusLabels (line 81) This is the LOG(DFATAL) fallback path, which should be unreachable in debug builds and is kept as a defensive safeguard in release builds. Escaping here is intentional, as this path handles unexpected states. BuildTableTabletPrometheusLabels (line 113: table_id, table_name) While table_id is also a hex UUID and could theoretically use DCHECK, table_name is user-supplied input. ValidateIdentifier() only enforces valid UTF-8 and a length ? 256, so it may still contain characters like \, ", or \n, which require escaping for Prometheus correctness. Since both attributes are processed together in the loop over kKnownAttrs, splitting the logic just to avoid escaping table_id would introduce additional complexity. Therefore, I kept escaping for both fields to keep the implementation simple. Let me know if you think it's worth separating these paths, or if this approach looks reasonable to you. -- To view, visit http://gerrit.cloudera.org:8080/24096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I690918d06f19c393369b8fc43c3ec90dd2231d3d Gerrit-Change-Number: 24096 Gerrit-PatchSet: 4 Gerrit-Owner: Yan-Daojiang <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Yan-Daojiang <[email protected]> Gerrit-Comment-Date: Thu, 19 Mar 2026 10:23:56 +0000 Gerrit-HasComments: Yes
