On 18/05/18 20:04 +0000, Shobe, Casey wrote: > On a couple clusters that have been running for a little while > (without fencing), I'm seeing runaway server.rb processes using 100% > of a single CPU core each. > > When I look at ps, I can see that these have something to do with > pcsd: > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 6103 0.0 0.3 1076744 59200 ? Ssl Apr06 59:09 > /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > > /dev/null & > root 17548 99.3 0.2 873648 46308 ? Rl Apr18 43356:57 \_ > /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > > /dev/null & > root 16688 98.9 0.3 941160 49472 ? Rl May01 24300:52 \_ > /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > > /dev/null & > root 6009 98.8 0.3 942188 49688 ? R May02 22607:08 \_ > /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > > /dev/null & > root 15556 98.8 0.3 1076344 51836 ? R May03 21410:12 \_ > /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > > /dev/null & > > Running strace on one of the processes shows that they are looping > on sched_yield().
Can you share some HW specs with us, at least the architecture to start with -- x86_64=amd64, arm (gen/mode?), something else? The suspicion here is that just the first one may be sufficiently free from code porting glitches, I mean at the Ruby interpreter level or lower. -- Jan (Poki)
pgpS6uciNBQiB.pgp
Description: PGP signature
_______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
