li4wang commented on PR #2086:
URL: https://github.com/apache/zookeeper/pull/2086#issuecomment-1853151200
We also looked into how to enable Prometheus metrics in production and did
quite a lot perf tests and profiling recently. The
1. The metrics queue size is 1M by default it can be tuned. 1M of queue size
seems too large. We reduced the queue size from 1M to 100K, the max GC pause
was reduced 78% and the GC counts was reduced 80%
2. We also noticed that when the thread pool queue is full, a large number
of RejectedExecutionException instances was created, which added more GC
overhead. This is because `ThreadPoolExecutor` uses `AbortPolicy` as the
`RejectedExecutionHandler`. AbortPolicy instantiates RejectedExecutionException
object and makes two quite involved `toString` calls.
```
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
throw new RejectedExecutionException("Task " + r.toString() +
" rejected from " +
e.toString());
}
```
3. We created patch that uses the `DiscardPolicy` instead of `AbortPolicy`,
which silently drop the rejected task instead of throwing
`RejectedExecutionException`. With the patch, the max GC paused was reduced
further about 7% and GC counts was reduced about 61% for 100K queue size. As
a result, the latency of read operation was reduced 59% and throughput
increased 140% .
2.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]