Re: [Puppet Users] Puppetmaster can't keep up with our 1400 nodes.

Garrett Honeycutt Thu, 30 Oct 2014 12:06:01 -0700

On 10/30/14 10:45 AM, Georgi Todorov wrote:
> Hi group,
> 
> We have a VM with 24 E7-8857 v2 @ 3.00GHz cores and 32G of ram (on big
> ESX hosts and fast backend) that is our foreman/puppetmaster with the
> following tuning params:
> 
> Passanger:
>   PassengerMaxRequests 10000
>   PassengerStatThrottleRate 180 
>   PassengerMaxRequestQueueSize 300
>   PassengerMaxPoolSize 18
>   PassengerMinInstances 1
>   PassengerHighPerformance on
> 
> PGSQL:
> constraint_exclusion = on
> checkpoint_completion_target = 0.9
> checkpoint_segments = 16
> max_connections = 100
> maintenance_work_mem = 1GB
> effective_cache_size = 22GB
> work_mem = 192MB
> wal_buffers = 8MB
> shared_buffers = 7680MB
> 
> Apache
>   StartServers        50
>   MinSpareServers     5
>   MaxSpareServers     20
>   ServerLimit         256
>   MaxClients          256
>   MaxRequestsPerChild 4000
> 
> 
> IPv6 disabled
> vm.swappiness = 0
> SELinux disabled
> iptables flushed.
> 
> We have about 1400 hosts that checkin every 30 minutes and report facts.
> Facter execution time is less than 1 second on the nodes. 
> 
> The bottleneck seems to be 
> Passenger RackApp: /etc/puppet/rack 
> 
> There is one of these for each passenger proc that sits at 100% all the
> time. A typical strace of it looks like this:
> 
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  96.17   42.235808        1320     31988     15964 futex
>   3.17    1.393038           0   5722020           rt_sigprocmask
>   0.51    0.225576          14     16157         3 select
>   0.12    0.051727           1     93402     83142 stat
>   0.01    0.006303           0     13092     13088 lstat
>   0.01    0.003000        1500         2           fsync
> ...
> 
> Here are the versions of software we've moved through:
> Master OS: Centos 6.5, 6.6
> Foreman: 1.4.9, 1.5.1, 1.6.2
> puppet: 3.5.1, 3.6.2, 3.7.2
> Ruby: 1.8.7 (centos...)
> Passenger: 4.0.18, 4.0.53
> 
> Settings we've tried in various combinations:
>   PassengerMaxPoolSize 12, 18, 24
>   PassengerMaxRequestQueueSize 150, 200, 250, 350
>   PassengerStatThrottleRate 120, 180
>   ServerLimit 256, 512
>   MaxClients 256, 512
> 
> Requests in queue are always maxed out and a lot of nodes just timeout.
> 
> What am I missing? Our node count doesn't seem to be that big, our
> catalogs are fairly small too (basically just a bunch of autofs maps via
> module and 2-3 files). 
> 
> Thanks!
>


Hi Georgi,

How long does it take to compile a catalog? Is your VM server over
subscribed? Here's the formula for figuring out how many cores you need
dedicated to compiling catalogs. Note this is *dedicated* to compiling,
so minus two for the OS, if you run Dashboard minus the number of
workers, if you are running PuppetDB and Postgres, minus a few more.

Take a look at my post[1] to ask.puppetlabs.com regarding sizing.

cores = (nodes) * (check-ins per hour) * (seconds per catalog) /
(seconds per hour)

Another way to look at this is how many nodes should the current
hardware support.

nodes = (cores) * (seconds per hour) / (check-ins per hour) / (seconds
per catalog)


[1] -
http://ask.puppetlabs.com/question/3/where-can-i-find-information-about-sizing-for-puppet-servers/?answer=101#post-id-101

Best regards,
-g

-- 
Garrett Honeycutt
@learnpuppet
Puppet Training with LearnPuppet.com
Mobile: +1.206.414.8658

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/54528BD4.5010800%40garretthoneycutt.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Puppetmaster can't keep up with our 1400 nodes.

Reply via email to