Re: [ClusterLabs] IPaddr2 resource times out and cant be killed

Strahil Nikolov via Users Mon, 01 Aug 2022 08:35:32 -0700

In clouds you can't just use VIPs.Use azure-lb resource instead.
Best Regards,Strahil Nikolov 
 
 
  On Fri, Jul 29, 2022 at 23:21, Reid Wahl<[email protected]> wrote:   On Fri, 
Jul 29, 2022 at 1:02 PM Reid Wahl <[email protected]> wrote:
>
> On Fri, Jul 29, 2022 at 12:52 PM Ross Sponholtz <[email protected]> 
> wrote:
> >
> > I’m running a RHEL pacemaker cluster on Azure, and I’ve gotten a failure & 
> > fencing where I get these messages in the log file:
> >
> >
> > warning: vip_ABC_30_monitor_10000 process (PID 1779737) timed out
> > crit: vip_ABC_30_monitor_10000 process (PID 1779737) will not die!
> >
> >
> >
> > This resource uses the IPAddr2 resource agent.  I’ve looked at the agent 
> > code, and I can’t pinpoint any reason it would hang up, and since the node 
> > gets fenced, I can’t tell why this happens – any ideas on what kinds of 
> > failures could cause this problem?
> >
> >
> >
> > Thanks,
> >
> > Ross
> >
>
> Are you able to reproduce this? I suggest adding `trace_ra=1` to the
> resource configuration in order to determine where it's hanging.
>
> # pcs resource update vip_ABC trace_ra=1
>
> This will produce a shell trace of each operation in
> /var/lib/heartbeat/trace_ra/IPaddr2. This is naturally quite a lot of
> logging, so remove the option when you've gotten what you need.
>
> # pcs resource update vip_ABC trace_ra=
>
> Also discussed in this article (you should have access if you're on RHEL):
> - How can I determine exactly what is happening with every operation
> on a resource in Pacemaker?
> (https://access.redhat.com/solutions/3182931)


You may also want to set on-fail=block for the stop operation to
prevent the node from getting fenced while you troubleshoot this.

# pcs resource update vip_ABC op stop interval=0s
timeout=<whatever_the_current_timeout_is> on-fail=block

Other than that, trace_ra=1 will generally tell us quite a lot -- I
just hope that it _does_ get written, given that the child process
becomes unkillable.

The IPaddr2 resource agent doesn't do all that much. It runs a few
`ip` commands and sends an ARP refresh. That's about it. Generally
would not expect any of those to hang unless there's a deeper issue.

>
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker



-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] IPaddr2 resource times out and cant be killed

Reply via email to