On Oct 14, 2008, at 3:15 PM, Roderick van Domburg wrote:

Hello everyone,

We have been running cman+gfs2 and heartbeat+pacemaker simultaneously on our systems. This worked great until we updated to heartbeat-2.99.2 and pacemaker-1.0.0 yesterday, which crashes while calling is_openais_cluster(). Previously we ran heartbeat-2.99.1 and pacemaker-0.7.3 successfully.

Not so much a core dump (unexpected termination) as an assertion failure (self initiated "lets get out of here NOW").

What you're seeing is me in the middle of refreshing all the packages... specifically I haven't enabled Heartbeat support in the Pacemaker packages which is why you're seeing:

Oct 14 14:50:55 node1 stonithd: [1492]: ERROR: crm_abort: is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 : is_openais_cluster()

Which is basically Pacemaker saying "You're trying to run me on top of Heartbeat and I wasn't built to support that".
A saner error might not be a bad idea.

I'll go enable Heartbeat support now.




I'll post this to the linux-ha list too.

/var/log/messages:

Oct 14 14:49:55 node1 logd: [1455]: info: logd started with default configuration. Oct 14 14:49:55 node1 logd: [1463]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Oct 14 14:49:55 node1 logd: [1455]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Oct 14 14:49:55 node1 heartbeat: [1479]: info: Enabling logging daemon
Oct 14 14:49:55 node1 heartbeat: [1479]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf)
Oct 14 14:49:55 node1 heartbeat: [1479]: info: ******************
Oct 14 14:49:55 node1 heartbeat: [1479]: info: Configuration validated. Starting heartbeat 2.99.2 Oct 14 14:49:55 node1 heartbeat: [1480]: info: heartbeat: version 2.99.2 Oct 14 14:49:55 node1 heartbeat: [1480]: info: Heartbeat generation: 1219055953 Oct 14 14:49:55 node1 heartbeat: [1480]: info: glib: UDP multicast heartbeat started for group 239.0.0.45 port 694 interface eth0 (ttl=1 loop=0) Oct 14 14:49:55 node1 heartbeat: [1480]: info: G_main_add_TriggerHandler: Added signal manual handler Oct 14 14:49:55 node1 heartbeat: [1480]: info: G_main_add_TriggerHandler: Added signal manual handler Oct 14 14:49:55 node1 heartbeat: [1480]: notice: Using watchdog device: /dev/watchdog Oct 14 14:49:55 node1 heartbeat: [1480]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 14 14:49:55 node1 heartbeat: [1480]: info: Local status now set to: 'up'
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: node node2: is dead
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Comm_now_up(): updating status to active Oct 14 14:50:55 node1 heartbeat: [1480]: info: Local status now set to: 'active' Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/ccm" (498,496) Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/cib" (498,496) Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/lrmd -r" (0,0) Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/stonithd" (0,0) Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/attrd" (498,496) Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/crmd" (498,496) Oct 14 14:50:55 node1 heartbeat: [1489]: info: Starting "/usr/lib64/ heartbeat/ccm" as uid 498 gid 496 (pid 1489) Oct 14 14:50:55 node1 heartbeat: [1492]: info: Starting "/usr/lib64/ heartbeat/stonithd" as uid 0 gid 0 (pid 1492) Oct 14 14:50:55 node1 heartbeat: [1491]: info: Starting "/usr/lib64/ heartbeat/lrmd -r" as uid 0 gid 0 (pid 1491) Oct 14 14:50:55 node1 heartbeat: [1493]: info: Starting "/usr/lib64/ heartbeat/attrd" as uid 498 gid 496 (pid 1493) Oct 14 14:50:55 node1 heartbeat: [1490]: info: Starting "/usr/lib64/ heartbeat/cib" as uid 498 gid 496 (pid 1490) Oct 14 14:50:55 node1 heartbeat: [1494]: info: Starting "/usr/lib64/ heartbeat/crmd" as uid 498 gid 496 (pid 1494) Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Oct 14 14:50:55 node1 stonithd: [1492]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Oct 14 14:50:55 node1 stonithd: [1492]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_TriggerHandler: Added signal manual handler Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 14 14:50:55 node1 attrd: [1493]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Oct 14 14:50:55 node1 attrd: [1493]: info: main: Starting up....
Oct 14 14:50:55 node1 attrd: [1493]: ERROR: main: HA Signon failed
Oct 14 14:50:55 node1 attrd: [1493]: ERROR: main: Aborting startup
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/ heartbeat/attrd process 1493 exited with return code 100.
Oct 14 14:50:55 node1 ccm: [1489]: info: Hostname: node1
Oct 14 14:50:55 node1 crmd: [1494]: info: main: CRM Hg Version: node: 9a6c6d1dd87154b11fdf9ff7fadf5fd33500bca4
Oct 14 14:50:55 node1 crmd: [1494]: info: crmd_init: Starting crmd
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_TriggerHandler: Added signal manual handler Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 14 14:50:55 node1 stonithd: [1492]: ERROR: crm_abort: is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 : is_openais_cluster() Oct 14 14:50:55 node1 cib: [1490]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: / var/lib/heartbeat/crm/cib.xml.sig) Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler: Added signal handler for signal 12
Oct 14 14:50:55 node1 lrmd: [1491]: info: Started.
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/ heartbeat/stonithd process 1492 killed by signal 6 [SIGABRT - Abort]. Oct 14 14:50:55 node1 heartbeat: [1480]: ERROR: Managed /usr/lib64/ heartbeat/stonithd process 1492 dumped core Oct 14 14:50:55 node1 heartbeat: [1480]: ERROR: Respawning client "/ usr/lib64/heartbeat/stonithd": Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/usr/lib64/heartbeat/stonithd" (0,0) Oct 14 14:50:56 node1 cib: [1490]: info: startCib: CIB Initialization completed successfully Oct 14 14:50:56 node1 cib: [1490]: CRIT: cib_init: Cannot sign in to the cluster... terminating Oct 14 14:50:56 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/ heartbeat/cib process 1490 exited with return code 100. Oct 14 14:50:56 node1 heartbeat: [1480]: EMERG: Rebooting system. Reason: /usr/lib64/heartbeat/cib Oct 14 14:50:56 node1 crmd: [1494]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Oct 14 14:50:56 node1 crmd: [1494]: info: crmd_init: Starting crmd's mainloop Oct 14 14:50:56 node1 heartbeat: [1495]: info: Starting "/usr/lib64/ heartbeat/stonithd" as uid 0 gid 0 (pid 1495) Oct 14 14:50:56 node1 stonithd: [1495]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Oct 14 14:50:56 node1 stonithd: [1495]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Oct 14 14:50:56 node1 stonithd: [1495]: ERROR: crm_abort: is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 : is_openais_cluster()
Oct 14 14:50:57 node1 kernel: md: stopping all md devices.
Oct 14 14:51:17 node1 syslogd 1.4.1: restart.

This occurs no matter whether cman and openais are running or not.

I have attached the coredump.
Version information:

- CentOS 5.2 x86_64 (2.6.18-92.1.13.el5xen)
- heartbeat-common.x86_64 2.99.2-21.1
- heartbeat-resources.x86_64 2.99.2-21.1
- heartbeat.x86_64 2.99.2-21.1
- libheartbeat2.x86_64 2.99.2-21.1
- pacemaker.x86_64 1.0.0-1.6
- libpacemaker3.x86_64 1.0.0-1.6
- openais.x86_64 0.80.3-19.1
- cman.x86_64 2.0.84-2.el5_2.1

ha.cf:

autojoin none
mcast eth0 239.0.0.45 694 1 0
warntime 15
deadtime 60
initdead 60
keepalive 3
node node1
node node2
crm on
watchdog /dev/watchdog
use_logd on

openais.conf:

totem {
        version: 2
        secauth: on
        threads: 1
        heartbeat_failures_allowed: 3
        interface {
                ringnumber: 0
                bindnetaddr: 10.0.3.1
                mcastaddr: 239.0.0.45
                mcastport: 5405
        }
}

logging {
        debug: off
        timestamp: on
}

amf {
        mode: disabled
}

I have tried switching either to another IP, but to no avail.
Any insights into this behavior?

Kind regards,

Roderick
<core.1492>


_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to