On a couple clusters that have been running for a little while (without 
fencing), I'm seeing runaway server.rb processes using 100% of a single CPU 
core each.

When I look at ps, I can see that these have something to do with pcsd:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      6103  0.0  0.3 1076744 59200 ?       Ssl  Apr06  59:09 /usr/bin/ruby 
-C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root     17548 99.3  0.2 873648 46308 ?        Rl   Apr18 43356:57  \_ 
/usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > 
/dev/null &
root     16688 98.9  0.3 941160 49472 ?        Rl   May01 24300:52  \_ 
/usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > 
/dev/null &
root      6009 98.8  0.3 942188 49688 ?        R    May02 22607:08  \_ 
/usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > 
/dev/null &
root     15556 98.8  0.3 1076344 51836 ?       R    May03 21410:12  \_ 
/usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > 
/dev/null &

Running strace on one of the processes shows that they are looping on 
sched_yield().

What are these processes and what is causing them to occur?  It appears that 
killing them frees up the CPU without detrimental impact on the cluster...

Thanks,
-- 
Casey
_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to