On 2014-03-11 17:39, Georgi Todorov wrote:
On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote:Actually, sometime last night something happened and puppet stopped processing requests altogether. Stopping and starting httpd fixed this, but this could be just some bug in one of the new versions of software I upgraded to. I'll keep monitoring. So, unfortunately issue is not fixed :(. For whatever reason, everything ran great for a day. Catalog compiles were taking around 7 seconds, client runs finished in about 20s - happy days. Then overnight, the catalog compile times jumped to 20-30 seconds and client runs were now taking 200+ seconds. Few hours later, and there would be no more requests arriving at the puppet master at all. Is my http server flaking out? Running some --trace --evaltrace and strace it looks like most of the time is spent stat-ing: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 83.01 5.743474 9 673606 612864 stat 7.72 0.534393 7 72102 71510 lstat 6.76 0.467930 77988 6 wait4 That's a pretty poor "hit" rate (7k out of 74k stats)... I've increased the check time to 1 hour on all clients, and the master seems to be keeping up for now - catalog compile avg 8 seconds, client run avg - 15 seconds, queue size = 0; Here is what a client run looks like when the server is keeping up: Notice: Finished catalog run in *11.93* seconds Changes: Events: Resources: Total: 522 Time: Filebucket: 0.00 Cron: 0.00 Schedule: 0.00 Package: 0.00 Service: 0.68 Exec: 1.07 *File: 1.72* Config retrieval: 13.35 Last run: 1415032387 Total: 16.82 Version: Config: 1415031292 Puppet: 3.7.2 And when the server is just about dead: Notice: Finished catalog run in 214.21 seconds Changes: Events: Resources: Total: 522 Time: Cron: 0.00 Filebucket: 0.00 Schedule: 0.01 Package: 0.02 Service: 1.19 File: 128.94 Last run: 1415027092 Total: 159.21 Exec: 2.25 Config retrieval: 26.80 Version: Config: 1415025705 Puppet: 3.7.2 Probably 500 of the "Resources" are autofs maps using https://github.com/pdxcat/puppet-module-autofs/commits/master So there is definitely some bottle neck on the system, the problem is I can't figure out what it is. Is disk IO (iostat doesn't seem to think so), is it CPU (top looks fine), is it memory (ditto), is http/passenger combo not up to the task, is the postgres server not keeping up? There are so many components that it is hard for me to do a proper profile to find where the bottleneck is. Any ideas? So far I've timed the ENC script that pulls the classes for a node - takes less than 1 second. From messages the catalog compile is from 7 seconds to 25 seconds (worst case, overloaded server). Anyway, figured I'd share that, unfortunately ruby was not the issue. Back to poking around and testing.
You move away from Ruby 1.8.7 was a good move. That is essentially the same as installing more hardware since Ruby versions after 1.8.7 are faster.
It may be worth trying running with Ruby 1.9.3 (p448 or later) just to ensure it is not a Ruby 2x issue. It should be about par with Ruby 2x in terms of performance.
That is, I am thinking that with the slow Ruby 1.8.7 you were simply running out of compute resources, and then on Ruby 2x you may have hit something else.
- henrik -- Visit my Blog "Puppet on the Edge" http://puppet-on-the-edge.blogspot.se/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/m397pp%24rgk%241%40ger.gmane.org. For more options, visit https://groups.google.com/d/optout.
