severity 609242 normal thanks My apologies for the long time it took me to answer this email!
> I encounter a serious issue on a Dell R310 server with its both NICs bonded > the Debian way : [...] > auto bond0 > iface bond0 inet static > slaves eth0 eth1 > bond_mode 802.3ad > bond_xmit_hash_policy layer2+3 > bond_miimon 100 > bond_downdelay 5000 > bond_updelay 10000 > address 192.168.1.10 > netmask 255.255.255.0 > gateway 192.168.1.254 > > As said in the subject, ifupdown exits before the bond interface is active. I > tried to raise the updelay to get the slaves active but it has no effect : > > [ 11.426497] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > [ 11.478596] bonding: bond0: link status up for interface eth0, enabling it > in 0 ms. > [ 11.486325] bonding: bond0: link status definitely up for interface eth0. > [ 11.493738] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready According to those log entries, the bond0 interface has become active, because the first Ethernet interface has been detected and is enabled immediately. > [ 11.515333] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex > [ 11.590403] bonding: bond0: link status up for interface eth1, enabling it > in 10000 ms. The second Ethernet interface is also detected, but since the bond0 interface is already active, the kernel waits bond_updelay milliseconds before enabling it. This is correct behaviour. > Starting LDAP connection daemon: nslcd[ 21.581245] bonding: bond0: link > status definitely up for interface eth1. > nslcd: failed to bind to LDAP server ldaps://ldap.eve/: Can't contact LDAP > server: Connection timed out > nslcd: no available LDAP server found > nslcd: no base defined in config and couldn't get one from server > failed! > > You can see on this log that sysv-rc tries to start nslcd daemon *before* > bonding module reports bond0 to be effectively up, causing nslcd to fail to > start... As everything on my setup relies on LDAP for auth, nothing is > working until I locally log as root to manually restart failed services... > > I'm not sure id I should assign this bug to ifenslave or to ifupdown package, > so sorry for the noise if I'm wrong ! This does not seem like a bug in either ifenslave or ifupdown to me. I have reproduced your setup, and I see the same things in the kernel logs. Also, running "ifconfig bond0" and "ethtool bond0" immediately after "ifup bond0" shows that the bond0 interface is properly configured and up. I can send packets immediately to the bond0 device, and they come out of the first slave as expected. Perhaps there is another reason why the LDAP connection right after the bond0 device is up does not work? If you start pinging immediately after ifup bond0, do you see responses immediately as well, or if not, how long does it take for them to arrive? Perhaps you can run tcpdump on both sides to see when packets start to flow? Also, if there is a misconfiguration and your bonding setup somehow needs eth1 to be active as well, then it would actually take bond_updelay milliseconds before you have a working connection. You could try to set bond_updelay to 0 to rule this out. If your machine needs LDAP to work, I would check whether the LDAP daemon can be configured to try to connect to the server indefinitely, as opposed to quiting when the first connection doesn't work. -- Met vriendelijke groet / with kind regards, Guus Sliepen <g...@debian.org>
signature.asc
Description: Digital signature