Wubabalala opened a new pull request, #16167: URL: https://github.com/apache/dubbo/pull/16167
## What is the purpose of the change? Fixes #16148 Dynamically added metric series in `MetricsNameCountSampler` subclasses (e.g., `ThreadRejectMetricsCountSampler`, `ErrorCodeSampler`) are not exported to Prometheus when the first event arrives after the initial reporter sync cycle. ## Root Cause Analysis `MetricsNameCountSampler.samplesChanged` is initialized to `true` and consumed by the reporter's first `calSamplesChanged()` poll (CAS `true → false`). At that point, `sample()` returns nothing because no actual metric series exist yet. When the first real event arrives later (e.g., a thread pool rejection), `SimpleMetricsCountSampler.inc()` creates a new metric series entry in `metricCounter`. However, `MetricsNameCountSampler.samplesChanged` is only updated during metric name registration — it is never set back to `true` when a new metric series is first created at runtime. The reporter sees `false` on subsequent polls and never re-registers the new metric series to the Prometheus registry. ## Brief changelog - **`SimpleMetricsCountSampler`**: Refactored `getAtomicCounter()` into `incrementAndGetCreated()` which returns `true` when a new metric series is created for the first time (detected via reference equality against the candidate `AtomicLong`). - **`MetricsNameCountSampler`**: Override `inc()` to call `incrementAndGetCreated()` and set `samplesChanged = true` only when a new metric series is created. This avoids unnecessary re-registration for updates to already-registered series. Also made `addMetricName()` idempotent. ## How to verify **Thread pool reject path:** 1. Start a Dubbo 3.3.x provider with `dubbo.metrics.enable-threadpool: true` and `dubbo.protocol.threads: 2`. 2. Wait 3+ minutes (so the initial `samplesChanged` flag is consumed). 3. Send enough concurrent requests to trigger thread pool exhaustion. 4. Check `/actuator/prometheus` for `dubbo_thread_pool_reject_thread_count` — it should now appear. **Error code path** is covered by the unit test `ErrorCodeSampleTest.testErrorCodeMetricChangesAfterFirstLateEvent`, which verifies the same timing sequence with error code events. ## New tests - `ErrorCodeSampleTest.testErrorCodeMetricChangesAfterFirstLateEvent`: Verifies the exact timing sequence — initial flag consumed, then first error event sets flag, repeat of same error does not, new error code sets flag again. - `PrometheusMetricsThreadPoolTest.testThreadPoolRejectMetricsExportedAfterLateFirstEvent`: End-to-end test — simulates the full startup → first poll → late event timeline and asserts that Prometheus scrape output contains the late-arriving reject metric series. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
