moonming commented on issue #12275:
URL: https://github.com/apache/apisix/issues/12275#issuecomment-2933136673

   @zhaoqiang1980 Thanks for the detailed report!
   
   This is actually a known issue. It tends to occur more frequently when the 
shared dictionary (prometheus-metrics) is configured with a relatively small 
size. Under high concurrency and with a large number of metrics, the shared 
dict becomes a hotspot and introduces lock contention.
   
   The most straightforward mitigation is to increase the size of the shared 
dict to reduce contention.
   
   I think a more robust solution would be to implement a graceful degradation 
mechanism in the prometheus plugin. For example, when it detects that the 
shared memory is full and lock contention is impacting performance, it could 
temporarily pause metrics collection for 5 minutes. This may result in some 
metrics loss, but would prevent the CPU from hitting 100% and affecting overall 
system stability.
   
   We’d love to hear what others think about this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to