Hi group,

We have a VM with 24 E7-8857 v2 @ 3.00GHz cores and 32G of ram (on big ESX 
hosts and fast backend) that is our foreman/puppetmaster with the following 
tuning params:

Passanger:
  PassengerMaxRequests 10000
  PassengerStatThrottleRate 180 
  PassengerMaxRequestQueueSize 300
  PassengerMaxPoolSize 18
  PassengerMinInstances 1
  PassengerHighPerformance on

PGSQL:
constraint_exclusion = on
checkpoint_completion_target = 0.9
checkpoint_segments = 16
max_connections = 100
maintenance_work_mem = 1GB
effective_cache_size = 22GB
work_mem = 192MB
wal_buffers = 8MB
shared_buffers = 7680MB

Apache
  StartServers        50
  MinSpareServers     5
  MaxSpareServers     20
  ServerLimit         256
  MaxClients          256
  MaxRequestsPerChild 4000


IPv6 disabled
vm.swappiness = 0
SELinux disabled
iptables flushed.

We have about 1400 hosts that checkin every 30 minutes and report facts. 
Facter execution time is less than 1 second on the nodes. 

The bottleneck seems to be 
Passenger RackApp: /etc/puppet/rack 

There is one of these for each passenger proc that sits at 100% all the 
time. A typical strace of it looks like this:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.17   42.235808        1320     31988     15964 futex
  3.17    1.393038           0   5722020           rt_sigprocmask
  0.51    0.225576          14     16157         3 select
  0.12    0.051727           1     93402     83142 stat
  0.01    0.006303           0     13092     13088 lstat
  0.01    0.003000        1500         2           fsync
...

Here are the versions of software we've moved through:
Master OS: Centos 6.5, 6.6
Foreman: 1.4.9, 1.5.1, 1.6.2
puppet: 3.5.1, 3.6.2, 3.7.2
Ruby: 1.8.7 (centos...)
Passenger: 4.0.18, 4.0.53

Settings we've tried in various combinations:
  PassengerMaxPoolSize 12, 18, 24
  PassengerMaxRequestQueueSize 150, 200, 250, 350
  PassengerStatThrottleRate 120, 180
  ServerLimit 256, 512
  MaxClients 256, 512

Requests in queue are always maxed out and a lot of nodes just timeout.

What am I missing? Our node count doesn't seem to be that big, our catalogs 
are fairly small too (basically just a bunch of autofs maps via module and 
2-3 files). 

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/159df117-0e8d-4a7c-99af-f1f029c393de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to