Andrew Beekhof schrieb:
On Jul 28, 2008, at 12:24 PM, Rainer Traut wrote:
sys: Centos 5 x86_64, 2 nodes
# rpm -qa|grep heartbeat
pacemaker-heartbeat-0.6.5-8.2
heartbeat-ldirectord-2.1.3-23.1
heartbeat-resources-2.1.3-23.1
heartbeat-common-2.1.3-23.1
heartbeat-2.1.3-23.1
One cluster member constanly reboots with these logs:
Jul 28 12:11:47 n02asp7 ccm: [8768]: ERROR: socket_wait_conn_new:
unlink failure(/var/run/heartbeat/ccm/ccm): Permission denied
Jul 28 12:11:47 n02asp7 ccm: [8768]: ERROR: socket_wait_conn_new:
trying to create in /var/run/heartbeat/ccm/ccm bind:: Permission denied
Jul 28 12:11:47 n02asp7 ccm: [8768]: ERROR: Can't create wait channel:
Resource temporarily unavailable
Jul 28 12:11:47 n02asp7 heartbeat: [8756]: WARN: Managed
/usr/lib64/heartbeat/ccm process 8768 exited with return code 1.
Jul 28 12:11:47 n02asp7 stonithd: [8771]: info: Signing in with
heartbeat.
Jul 28 12:11:47 n02asp7 heartbeat: [8756]: EMERG: Rebooting system.
Reason: /usr/lib64/heartbeat/ccm
and:
# ls -la /var/run/heartbeat/ccm/ccm
srwxrwxrwx 1 hacluster haclient 0 12. Jul 14:04
/var/run/heartbeat/ccm/ccm
what about the directories it in?
what user is ccm running as?
uid 90, gid 90:
Jul 28 12:11:47 n02asp7 heartbeat: [8756]: info: Starting child client
"/usr/lib64/heartbeat/ccm" (90,90)
Jul 28 12:11:47 n02asp7 heartbeat: [8768]: info: Starting
"/usr/lib64/heartbeat/ccm" as uid 90 gid 90 (pid 8768)
# ls -la /var/run/heartbeat/ccm
insgesamt 8
drwxr-x--- 2 root root 4096 18. Jul 17:30 .
drwxr-xr-x 5 root root 4096 28. Jul 12:13 ..
srwxrwxrwx 1 hacluster haclient 0 12. Jul 14:04 ccm
And indeed:
# chown hacluster:haclient /var/run/heartbeat/ccm/
fixes thes problem.
Let me guess, it's because ccm now drops root privileges?
Rainer
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker