I tried setting corosync.conf to log to log file but after a restart /var/log/corosync just has a .empty file in it referencing bug #588515 At any rate, I turned on debugging in corosync.conf and found that /var/log/pacemaker now has debug messages. /shrug
I’ll put as much info as I can here. I also fixed the bindnetaddr to be just the net address (duh…should have noticed that) so thank you for that. Ah geez. I just now realized that I should be digging in `journalctl` for corosync bits. So here we are: Unfortunately, there doesn’t seem to be a lot here. The time signature was Dec 18 23:44:41 in pacemaker.log for the info I originally posted. In corosync log bits that only showed a bunch of this (not pasting it all unless you want/need it): Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2997-26) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2997] Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2997] Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] HUP conn (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] qb_ipcs_disconnect(2946-2997-27) state:2 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2997] Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] HUP conn (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] qb_ipcs_disconnect(2946-2997-27) state:2 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2997] Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] HUP conn (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] qb_ipcs_disconnect(2946-2997-27) state:2 Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-2946-2997-27-header Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2997-27) Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2997] Dec 18 23:44:41 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 A reference to our errancy nodded occurred 20ms before which bore: Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] entering GATHER state from 15(interface change). Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] Creating commit token because I am the rep. Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] Saving state aru 0 high seq received 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] entering COMMIT state. Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] got commit token Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] entering RECOVERY state. Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] position [0] member 192.168.99.225: Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] previous ring seq 74 rep 192.168.99.225 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] aru 0 high delivered 0 received flag 1 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] Did not need to originate any messages in recovery. Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] got commit token Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] Sending initial ORF token Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] install seq 0 aru 0 high seq received 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] install seq 0 aru 0 high seq received 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] install seq 0 aru 0 high seq received 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] install seq 0 aru 0 high seq received 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] Resetting old ring state Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] recovery to regular 1-0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] waiting_trans_ack changed to 1 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] entering OPERATIONAL state. Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] A new membership (192.168.99.225:120) was formed. Members joined: 1084777441 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [TOTEM ] waiting_trans_ack changed to 0 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2958-18) Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2958] Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2958-19) Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2958] Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] IPC credentials authenticated (2946-2958-20) Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] connecting to client [2958] Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168 Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] HUP conn (2946-2958-20) Dec 18 23:44:21 region-ctrl-2 corosync[2946]: [QB ] qb_ipcs_disconnect(2946-2958-20) state:2 This was not from the previous run (earlier email posting) but from a fresh run. - Jim > On Dec 18, 2019, at 13:24, Ken Gaillot <[email protected]> wrote: > > On Wed, 2019-12-18 at 12:21 -0800, JC wrote: >> Adding logs (minus time stamps) >> >> info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> info: get_cluster_type: Detected an active 'corosync' cluster >> info: qb_ipcs_us_publish: server name: pacemakerd >> info: pcmk__ipc_is_authentic_process_active: Could not >> connect to lrmd IPC: Connection refused >> info: pcmk__ipc_is_authentic_process_active: Could not >> connect to cib_ro IPC: Connection refused >> info: pcmk__ipc_is_authentic_process_active: Could not >> connect to crmd IPC: Connection refused >> info: pcmk__ipc_is_authentic_process_active: Could not >> connect to attrd IPC: Connection refused >> info: pcmk__ipc_is_authentic_process_active: Could not >> connect to pengine IPC: Connection refused >> info: pcmk__ipc_is_authentic_process_active: Could not >> connect to stonith-ng IPC: Connection refused >> info: corosync_node_name: Unable to get node name for nodeid >> 1084777441 >> notice: get_node_name: Could not obtain a node name for >> corosync nodeid 1084777441 > > This ID appears to be coming from corosync. You have only to_syslog > turned on in corosync.conf, so look in the system log around this same > time to see what corosync is thinking. It does seem odd; I wonder if -- > purge is missing something. > > BTW you don't need bindnetaddr to be different for each host; it's the > network address (e.g. the .0 for a /24), not the host address. > >> info: crm_get_peer: Created entry ea4ec23e-e676-4798-9b8b- >> 00af39d3bb3d/0x5555f74984d0 for node (null)/1084777441 (1 total) >> info: crm_get_peer: Node 1084777441 has uuid 1084777441 >> info: crm_update_peer_proc: cluster_connect_cpg: Node >> (null)[1084777441] - corosync-cpg is now online >> notice: cluster_connect_quorum: Quorum acquired >> info: crm_get_peer: Created entry 882c0feb-d546-44b7-955f- >> 4c8a844a0db1/0x5555f7499fd0 for node postgres-sb/3 (2 total) >> info: crm_get_peer: Node 3 is now known as postgres-sb >> info: crm_get_peer: Node 3 has uuid 3 >> info: crm_get_peer: Created entry 4e6a6b1e-d687-4527-bffc- >> 5d701ff60a66/0x5555f749a6f0 for node region-ctrl-2/2 (3 total) >> info: crm_get_peer: Node 2 is now known as region-ctrl-2 >> info: crm_get_peer: Node 2 has uuid 2 >> info: crm_get_peer: Created entry 5532a3cc-2577-4764-b9ee- >> 770d437ccec0/0x5555f749a0a0 for node region-ctrl-1/1 (4 total) >> info: crm_get_peer: Node 1 is now known as region-ctrl-1 >> info: crm_get_peer: Node 1 has uuid 1 >> info: corosync_node_name: Unable to get node name for nodeid >> 1084777441 >> notice: get_node_name: Defaulting to uname -n for the local >> corosync node name >> warning: crm_find_peer: Node 1084777441 and 2 share the same >> name: 'region-ctrl-2' >> info: crm_get_peer: Node 1084777441 is now known as region-ctrl-2 >> info: pcmk_quorum_notification: Quorum retained | >> membership=32 members=3 >> notice: crm_update_peer_state_iter: Node region-ctrl-1 state is >> now member | nodeid=1 previous=unknown >> source=pcmk_quorum_notification >> notice: crm_update_peer_state_iter: Node postgres-sb state is now >> member | nodeid=3 previous=unknown source=pcmk_quorum_notification >> notice: crm_update_peer_state_iter: Node region-ctrl-2 state is >> now member | nodeid=1084777441 previous=unknown >> source=pcmk_quorum_notification >> info: crm_reap_unseen_nodes: State of node region-ctrl- >> 2[2] is still unknown >> info: pcmk_cpg_membership: Node 1084777441 joined group >> pacemakerd (counter=0.0, pid=32765, unchecked for rivals) >> info: pcmk_cpg_membership: Node 1 still member of group >> pacemakerd (peer=region-ctrl-1:900, counter=0.0, at least once) >> info: crm_update_peer_proc: pcmk_cpg_membership: Node region- >> ctrl-1[1] - corosync-cpg is now online >> info: pcmk_cpg_membership: Node 3 still member of group >> pacemakerd (peer=postgres-sb:976, counter=0.1, at least once) >> info: crm_update_peer_proc: pcmk_cpg_membership: Node postgres- >> sb[3] - corosync-cpg is now online >> info: pcmk_cpg_membership: Node 1084777441 still member of group >> pacemakerd (peer=region-ctrl-2:3016, counter=0.2, at least once) >> pengine: info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> lrmd: info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> lrmd: info: qb_ipcs_us_publish: server name: lrmd >> pengine: info: qb_ipcs_us_publish: server name: pengine >> cib: info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> attrd: info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> attrd: info: get_cluster_type: Verifying cluster type: >> 'corosync' >> attrd: info: get_cluster_type: Assuming an active 'corosync' >> cluster >> info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> attrd: notice: crm_cluster_connect: Connecting to cluster >> infrastructure: corosync >> cib: info: get_cluster_type: Verifying cluster type: >> 'corosync' >> cib: info: get_cluster_type: Assuming an active 'corosync' >> cluster >> info: get_cluster_type: Verifying cluster type: 'corosync' >> info: get_cluster_type: Assuming an active 'corosync' cluster >> notice: crm_cluster_connect: Connecting to cluster infrastructure: >> corosync >> attrd: info: corosync_node_name: Unable to get node >> name for nodeid 1084777441 >> cib: info: validate_with_relaxng: Creating RNG parser >> context >> crmd: info: crm_log_init: Changed active directory to >> /var/lib/pacemaker/cores >> crmd: info: get_cluster_type: Verifying cluster type: >> 'corosync' >> crmd: info: get_cluster_type: Assuming an active 'corosync' >> cluster >> crmd: info: do_log: Input I_STARTUP received in state >> S_STARTING from crmd_init >> attrd: notice: get_node_name: Could not obtain a node name >> for corosync nodeid 1084777441 >> attrd: info: crm_get_peer: Created entry af5c62c9-21c5- >> 4428-9504-ea72a92de7eb/0x560870420e90 for node (null)/1084777441 (1 >> total) >> attrd: info: crm_get_peer: Node 1084777441 has uuid >> 1084777441 >> attrd: info: crm_update_peer_proc: cluster_connect_cpg: >> Node (null)[1084777441] - corosync-cpg is now online >> attrd: notice: crm_update_peer_state_iter: Node (null) >> state is now member | nodeid=1084777441 previous=unknown >> source=crm_update_peer_proc >> attrd: info: init_cs_connection_once: Connection to >> 'corosync': established >> info: corosync_node_name: Unable to get node name for nodeid >> 1084777441 >> notice: get_node_name: Could not obtain a node name for >> corosync nodeid 1084777441 >> info: crm_get_peer: Created entry 5bcb51ae-0015-4652-b036- >> b92cf4f1d990/0x55f583634700 for node (null)/1084777441 (1 total) >> info: crm_get_peer: Node 1084777441 has uuid 1084777441 >> info: crm_update_peer_proc: cluster_connect_cpg: Node >> (null)[1084777441] - corosync-cpg is now online >> notice: crm_update_peer_state_iter: Node (null) state is now >> member | nodeid=1084777441 previous=unknown >> source=crm_update_peer_proc >> attrd: info: corosync_node_name: Unable to get node >> name for nodeid 1084777441 >> attrd: notice: get_node_name: Defaulting to uname -n for >> the local corosync node name >> attrd: info: crm_get_peer: Node 1084777441 is now known >> as region-ctrl-2 >> info: corosync_node_name: Unable to get node name for nodeid >> 1084777441 >> notice: get_node_name: Defaulting to uname -n for the local >> corosync node name >> info: init_cs_connection_once: Connection to 'corosync': >> established >> info: corosync_node_name: Unable to get node name for nodeid >> 1084777441 >> notice: get_node_name: Defaulting to uname -n for the local >> corosync node name >> info: crm_get_peer: Node 1084777441 is now known as region-ctrl-2 >> cib: notice: crm_cluster_connect: Connecting to cluster >> infrastructure: corosync >> cib: info: corosync_node_name: Unable to get node >> name for nodeid 1084777441 >> cib: notice: get_node_name: Could not obtain a node name >> for corosync nodeid 1084777441 >> cib: info: crm_get_peer: Created entry a6ced2c1-9d51- >> 445d-9411-2fb19deab861/0x55848365a150 for node (null)/1084777441 (1 >> total) >> cib: info: crm_get_peer: Node 1084777441 has uuid >> 1084777441 >> cib: info: crm_update_peer_proc: cluster_connect_cpg: >> Node (null)[1084777441] - corosync-cpg is now online >> cib: notice: crm_update_peer_state_iter: Node (null) >> state is now member | nodeid=1084777441 previous=unknown >> source=crm_update_peer_proc >> cib: info: init_cs_connection_once: Connection to >> 'corosync': established >> cib: info: corosync_node_name: Unable to get node >> name for nodeid 1084777441 >> cib: notice: get_node_name: Defaulting to uname -n for >> the local corosync node name >> cib: info: crm_get_peer: Node 1084777441 is now known >> as region-ctrl-2 >> cib: info: qb_ipcs_us_publish: server name: cib_ro >> cib: info: qb_ipcs_us_publish: server name: cib_rw >> cib: info: qb_ipcs_us_publish: server name: cib_shm >> cib: info: pcmk_cpg_membership: Node 1084777441 >> joined group cib (counter=0.0, pid=0, unchecked for rivals) >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> <https://lists.clusterlabs.org/mailman/listinfo/users> >> >> ClusterLabs home: https://www.clusterlabs.org/ <https://www.clusterlabs.org/> >> > -- > Ken Gaillot <[email protected] <mailto:[email protected]>> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > ClusterLabs home: https://www.clusterlabs.org/ <https://www.clusterlabs.org/>
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
