Hi!

I wanted to inform you of an unpleasant bug in ldirectord of SLES12 SP5:
We had a short network problem while some redundancy paths reconfigured in the 
infrastructure, effectively causing that some network services could not be 
reached.
Unfortunately ldirectord controlled by the cluster reported a failure (the 
director, not the services being directed to):

h11 crmd[28930]:   notice: h11-prm_lvs_mail_monitor_300000:369 [ Use of 
uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord 
line 1830, <CFGFILE> line 21. Error [33159] reading file 
/etc/ldirectord/mail.conf at line 10: invalid address for virtual service\n ]
h11 ldirectord[33266]: Exiting with exit_status 2: config_error: Configuration 
Error

You can guess wat happened:
Pacemaker tried to recover (stop, then start), but the stop failed, too:
h11 lrmd[28927]:   notice: prm_lvs_mail_stop_0:35047:stderr [ Use of 
uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord 
line 1830, <CFGFILE> line 21. ]
h11 lrmd[28927]:   notice: prm_lvs_mail_stop_0:35047:stderr [ Error [36293] 
reading file /etc/ldirectord/mail.conf at line 10: invalid address for virtual 
service ]
h11 crmd[28930]:   notice: Result of stop operation for prm_lvs_mail on h11: 1 
(unknown error)

A stop failure meant that the node was fenced, interrupting all the other 
services.

Examining the logs I also found this interesting type of error:
h11 attrd[28928]:   notice: Cannot update 
fail-count-prm_lvs_rksapds5#monitor_300000[monitor]=(null) because peer UUID 
not known (will retry if learned)

Eventually, here's the code that caused the error:

sub _ld_read_config_virtual_resolve
{
        my($line, $vsrv, $ip_port, $af)=(@_);

        if($ip_port){
                $ip_port=&ld_gethostservbyname($ip_port, $vsrv->{protocol}, 
$af);
                if ($ip_port =~ /(\[[0-9A-Fa-f:]+\]):(\d+)/) {
                        $vsrv->{server} = $1;
                        $vsrv->{port} = $2;
                } elsif($ip_port){
                        ($vsrv->{server}, $vsrv->{port}) = split /:/, $ip_port;
                }
                else {
                        &config_error($line,
                                "invalid address for virtual service");
                }
...

The value returned by ld_gethostservbyname is undefined. I also wonder what the 
program logic is:
If the host looks like an hex address in square brackets, host and port are 
split at the colon; otherwise host and port are split at the colon.
Why not split simply at the last colon if the value is defined, AND THEN check 
if the components look OK?

So the "invalid address for virtual service" is only invalid when the resolver 
service (e.g. via LDAP) is unavailable.
I used host and service names for readability.

(I reported the issue to SLES support)

Regards,
Ulrich



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to