> It turned out this was a combination of two problems which made it > much more difficult to figure out. > > First of all I didn't have enough apache2 processes. That seems like > it should have been obvious but it wasn't for two reasons. Firstly, > my apache2 processes are always idle or nearly idle, even when traffic > levels are high. But it must be the case that each request made to > nginx which is then handed off to apache2 monopolizes an apache2 > process even though my backend application server is the one using all > the CPU instead of apache2. The other thing that made it difficult to > track down was the way munin graphs apache2 processes. On my graph, > busy and free processes only appeared as tiny dots at the bottom > because apache2's ServerLimit is drawn on the same graph which is many > times greater than the number of busy and free processes. It would be > better to draw MaxClients instead of ServerLimit since I think > MaxClients is more likely to be tuned. It at least appears in the > default config file on Gentoo. Since busy and free apache2 processes > were virtually invisible on the munin graph, I wasn't able to > correlate their ebb and flow with my server's response times. > > Once I fixed the apache2 problem, I was sure I had it nailed. That's > when I emailed here a few days ago to say I think I got it. But it > turned out there was another problem and that was Odoo (formerly known > as OpenERP) which is also running in a reverse proxy configuration > behind nginx. Whenever someone uses Odoo on my server, it absolutely > destroys performance for my non-Odoo website. That would have been > really easy to test and I did test stopping the odoo service early on, > but I ruled it out when the problem persisted after stopping Odoo > which I now realize must have been because of the apache2 problem.
The root of the Odoo problem was that I didn't have keepalive enabled between the nginx reverse proxy server and the Odoo server. nginx enables keepalive by default for the client side (HTTP/1.1) but not for the upstream side (HTTP/1.0). I still see TCP Queuing spikes in munin with Odoo usage, but they no longer slow down the apache2/nginx reverse proxy running my main site. - Grant