On Fri, Jul 29, 2022 at 12:52 PM Ross Sponholtz <[email protected]> wrote: > > I’m running a RHEL pacemaker cluster on Azure, and I’ve gotten a failure & > fencing where I get these messages in the log file: > > > warning: vip_ABC_30_monitor_10000 process (PID 1779737) timed out > crit: vip_ABC_30_monitor_10000 process (PID 1779737) will not die! > > > > This resource uses the IPAddr2 resource agent. I’ve looked at the agent > code, and I can’t pinpoint any reason it would hang up, and since the node > gets fenced, I can’t tell why this happens – any ideas on what kinds of > failures could cause this problem? > > > > Thanks, > > Ross >
Are you able to reproduce this? I suggest adding `trace_ra=1` to the resource configuration in order to determine where it's hanging. # pcs resource update vip_ABC trace_ra=1 This will produce a shell trace of each operation in /var/lib/heartbeat/trace_ra/IPaddr2. This is naturally quite a lot of logging, so remove the option when you've gotten what you need. # pcs resource update vip_ABC trace_ra= Also discussed in this article (you should have access if you're on RHEL): - How can I determine exactly what is happening with every operation on a resource in Pacemaker? (https://access.redhat.com/solutions/3182931) > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Regards, Reid Wahl (He/Him) Senior Software Engineer, Red Hat RHEL High Availability - Pacemaker _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
