On a couple clusters that have been running for a little while (without fencing), I'm seeing runaway server.rb processes using 100% of a single CPU core each.
When I look at ps, I can see that these have something to do with pcsd: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 6103 0.0 0.3 1076744 59200 ? Ssl Apr06 59:09 /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null & root 17548 99.3 0.2 873648 46308 ? Rl Apr18 43356:57 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null & root 16688 98.9 0.3 941160 49472 ? Rl May01 24300:52 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null & root 6009 98.8 0.3 942188 49688 ? R May02 22607:08 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null & root 15556 98.8 0.3 1076344 51836 ? R May03 21410:12 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null & Running strace on one of the processes shows that they are looping on sched_yield(). What are these processes and what is causing them to occur? It appears that killing them frees up the CPU without detrimental impact on the cluster... Thanks, -- Casey _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
