kezhuw commented on code in PR #2279:
URL: https://github.com/apache/zookeeper/pull/2279#discussion_r2226456903
##########
zookeeper-metrics-providers/zookeeper-prometheus-metrics/src/main/java/org/apache/zookeeper/metrics/prometheus/PrometheusMetricsProvider.java:
##########
@@ -714,14 +714,21 @@ public Thread newThread(final Runnable runnable) {
}
}
+ private static class PrometheusRejectedExecutionHandler implements
RejectedExecutionHandler {
+ private static boolean maxQueueSizeExceeded = false;
+
+ @Override
+ public void rejectedExecution(final Runnable r, final
ThreadPoolExecutor e) {
+ if (!maxQueueSizeExceeded) {
+ maxQueueSizeExceeded = true;
+ LOG.warn("Prometheus metrics queue size exceeded the max");
Review Comment:
> It's important to know whether rejection happens, but don't see too many
benefits of knowing when the rejection stops and restarts.
My concern is that: people could lost the rejection if we log it only once
as it could be too far from inspection time. ZooKeeper servers in production
are supposed to run for a long time, logs could be truncated or archived
periodically. If we log it only once at the first occurence, then it would be
hard or even impossible to notice that the metrics backend is overloading now.
> To avoid the unnecessary overhead of logging the same warn msg repeatedly
even it's rate limited
Is it possible to optimize this as the log string is a const.
I personally think rejection policy deserve rate limit as it could happen
shortly and periodically due to system load.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]