kezhuw commented on code in PR #2279:
URL: https://github.com/apache/zookeeper/pull/2279#discussion_r2226456903


##########
zookeeper-metrics-providers/zookeeper-prometheus-metrics/src/main/java/org/apache/zookeeper/metrics/prometheus/PrometheusMetricsProvider.java:
##########
@@ -714,14 +714,21 @@ public Thread newThread(final Runnable runnable) {
         }
     }
 
+    private static class PrometheusRejectedExecutionHandler implements 
RejectedExecutionHandler {
+        private static boolean maxQueueSizeExceeded = false;
+
+        @Override
+        public void rejectedExecution(final Runnable r, final 
ThreadPoolExecutor e) {
+            if (!maxQueueSizeExceeded) {
+                maxQueueSizeExceeded = true;
+                LOG.warn("Prometheus metrics queue size exceeded the max");

Review Comment:
   > It's important to know whether rejection happens, but don't see too many 
benefits of knowing when the rejection stops and restarts.
   
   My concern is that: people could lost the rejection if we log it only once 
as it could be too far from inspection time. ZooKeeper servers in production 
are supposed to run for a long time, logs could be truncated or archived 
periodically. If we log it only once at the first occurence, then it would be 
hard or even impossible to notice that the metrics backend is overloading now.
   
   > To avoid the unnecessary overhead of logging the same warn msg repeatedly 
even it's rate limited
   
   Is it possible to optimize this as the log string is a const.
   
   I personally think rejection policy deserve rate limit as it could happen 
shortly and periodically due to system load.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to