On Jul 28, 2008, at 1:05 PM, Rainer Traut wrote:
Andrew Beekhof schrieb:
On Jul 28, 2008, at 12:24 PM, Rainer Traut wrote:
sys: Centos 5 x86_64, 2 nodes
# rpm -qa|grep heartbeat
pacemaker-heartbeat-0.6.5-8.2
heartbeat-ldirectord-2.1.3-23.1
heartbeat-resources-2.1.3-23.1
heartbeat-common-2.1.3-23.1
heartbeat-2.1.3-23.1
One cluster member constanly reboots with these logs:
Jul 28 12:11:47 n02asp7 ccm: [8768]: ERROR: socket_wait_conn_new:
unlink failure(/var/run/heartbeat/ccm/ccm): Permission denied
Jul 28 12:11:47 n02asp7 ccm: [8768]: ERROR: socket_wait_conn_new:
trying to create in /var/run/heartbeat/ccm/ccm bind:: Permission
denied
Jul 28 12:11:47 n02asp7 ccm: [8768]: ERROR: Can't create wait
channel: Resource temporarily unavailable
Jul 28 12:11:47 n02asp7 heartbeat: [8756]: WARN: Managed /usr/
lib64/heartbeat/ccm process 8768 exited with return code 1.
Jul 28 12:11:47 n02asp7 stonithd: [8771]: info: Signing in with
heartbeat.
Jul 28 12:11:47 n02asp7 heartbeat: [8756]: EMERG: Rebooting
system. Reason: /usr/lib64/heartbeat/ccm
and:
# ls -la /var/run/heartbeat/ccm/ccm
srwxrwxrwx 1 hacluster haclient 0 12. Jul 14:04 /var/run/heartbeat/
ccm/ccm
what about the directories it in?
what user is ccm running as?
uid 90, gid 90:
Jul 28 12:11:47 n02asp7 heartbeat: [8756]: info: Starting child
client "/usr/lib64/heartbeat/ccm" (90,90)
Jul 28 12:11:47 n02asp7 heartbeat: [8768]: info: Starting "/usr/
lib64/heartbeat/ccm" as uid 90 gid 90 (pid 8768)
# ls -la /var/run/heartbeat/ccm
insgesamt 8
drwxr-x--- 2 root root 4096 18. Jul 17:30 .
drwxr-xr-x 5 root root 4096 28. Jul 12:13 ..
srwxrwxrwx 1 hacluster haclient 0 12. Jul 14:04 ccm
And indeed:
# chown hacluster:haclient /var/run/heartbeat/ccm/
fixes thes problem.
Let me guess, it's because ccm now drops root privileges?
to be honest, i thought it always did...
i dont think anyone's touched the ccm code in a very long time.
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker