Re: [I] help request: My APISIX is experiencing 100% CPU utilization and has become unresponsive [apisix]

via GitHub Mon, 02 Jun 2025 20:18:16 -0700


zhaoqiang1980 commented on issue #12275:
URL: https://github.com/apache/apisix/issues/12275#issuecomment-2933240266


   > [@zhaoqiang1980](https://github.com/zhaoqiang1980) Thanks for the detailed 
report!
   > 
   > This is actually a known issue. It tends to occur more frequently when the 
shared dictionary (prometheus-metrics) is configured with a relatively small 
size. Under high concurrency and with a large number of metrics, the shared 
dict becomes a hotspot and introduces lock contention.
   > 
   > The most straightforward mitigation is to increase the size of the shared 
dict to reduce contention.
   > 
   > I think a more robust solution would be to implement a graceful 
degradation mechanism in the prometheus plugin. For example, when it detects 
that the shared memory is full and lock contention is impacting performance, it 
could temporarily pause metrics collection for 5 minutes. This may result in 
some metrics loss, but would prevent the CPU from hitting 100% and affecting 
overall system stability.
   > 
   > We’d love to hear what others think about this approach.
   
   --------------
    @moonming  Thanks for your response. we would increase the size of the 
shared dict.  Otherwise  we will attempt to introduce the higher version of the 
TTL capability into the current version to try to address the issue where 
metrics only increase and do not decrease. and thanks @bzp2010 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] help request: My APISIX is experiencing 100% CPU utilization and has become unresponsive [apisix]

Reply via email to