Re: [Puppet Users] Re: Puppetmaster can't keep up with our 1400 nodes.

Henrik Lindberg Mon, 03 Nov 2014 16:50:20 -0800

On 2014-03-11 17:39, Georgi Todorov wrote:

On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote:


      Actually, sometime last night something happened and puppet
    stopped processing requests altogether. Stopping and starting httpd
    fixed this, but this could be just some bug in one of the new
    versions of software I upgraded to. I'll keep monitoring.


So, unfortunately issue is not fixed :(. For whatever reason, everything
ran great for a day. Catalog compiles were taking around 7 seconds,
client runs finished in about 20s - happy days. Then overnight, the
catalog compile times jumped to 20-30 seconds and client runs were now
taking 200+ seconds. Few hours later, and there would be no more
requests arriving at the puppet master at all. Is my http server flaking
out?

Running some --trace --evaltrace and strace it looks like most of the
time is spent stat-ing:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  83.01    5.743474           9    673606    612864 stat
   7.72    0.534393           7     72102     71510 lstat
   6.76    0.467930       77988         6           wait4

That's a pretty poor "hit" rate (7k out of 74k stats)...

I've increased the check time to 1 hour on all clients, and the master
seems to be keeping up for now - catalog compile avg 8 seconds, client
run avg - 15 seconds, queue size = 0;

  Here is what a client run looks like when the server is keeping up:

Notice: Finished catalog run in *11.93* seconds
Changes:
Events:
Resources:
             Total: 522
Time:
        Filebucket: 0.00
              Cron: 0.00
          Schedule: 0.00
           Package: 0.00
           Service: 0.68
              Exec: 1.07
*File: 1.72*
    Config retrieval: 13.35
          Last run: 1415032387
             Total: 16.82
Version:
            Config: 1415031292
            Puppet: 3.7.2


And when the server is just about dead:
Notice: Finished catalog run in 214.21 seconds
Changes:
Events:
Resources:
             Total: 522
Time:
              Cron: 0.00
        Filebucket: 0.00
          Schedule: 0.01
           Package: 0.02
           Service: 1.19
              File: 128.94
          Last run: 1415027092
             Total: 159.21
              Exec: 2.25
    Config retrieval: 26.80
Version:
            Config: 1415025705
            Puppet: 3.7.2


Probably 500 of the "Resources" are autofs maps
using https://github.com/pdxcat/puppet-module-autofs/commits/master

So there is definitely some bottle neck on the system, the problem is I
can't figure out what it is. Is disk IO (iostat doesn't seem to think
so), is it CPU (top looks fine), is it memory (ditto), is http/passenger
combo not up to the task, is the postgres server not keeping up? There
are so many components that it is hard for me to do a proper profile to
find where the bottleneck is. Any ideas?

So far I've timed  the ENC script that pulls the classes for a node -
takes less than 1 second.
 From messages the catalog compile is from 7 seconds to 25 seconds
(worst case, overloaded server).

Anyway, figured I'd share that, unfortunately ruby was not the issue.
Back to poking around and testing.

You move away from Ruby 1.8.7 was a good move. That is essentially thesame as installing more hardware since Ruby versions after 1.8.7 are faster.

It may be worth trying running with Ruby 1.9.3 (p448 or later) just toensure it is not a Ruby 2x issue. It should be about par with Ruby 2x interms of performance.

That is, I am thinking that with the slow Ruby 1.8.7 you were simplyrunning out of compute resources, and then on Ruby 2x you may have hitsomething else.


- henrik

--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/m397pp%24rgk%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Re: Puppetmaster can't keep up with our 1400 nodes.

Reply via email to