Hi,
We're encountering a problem in 4.4 LTS where, rarely, the bond link state is not updated when the slave link changes.

I've traced the issue to the arp monitor unable to get the rtnl lock. The sequence resulting in failure is as below.

bond_loadbalance_arp_mon() periodically called, if slave link is _down_, it checks if the slave is sending/receiving packets. If it is, it sets flags to be processed later down the function for bond link update. However, it sets the slave->link right away.

                if (slave->link != BOND_LINK_UP) {
                        if (bond_time_in_interval(bond, trans_start, 1) &&
bond_time_in_interval(bond, slave->last_rx, 1)) {

                                slave->link  = BOND_LINK_UP;
                                slave_state_changed = 1;


Later down the function, it tries to get the rtnl_lock. If it doesn't get it, it rearms and returns.

        if (do_failover || slave_state_changed) {
                if (!rtnl_trylock())
                        goto re_arm; <-- returns here

                if (slave_state_changed) {
                        bond_slave_state_change(bond);

This is the problem. The next time this function is called, the slave->link is already marked UP. And we will never update the bond link state to UP.

Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.

Is this the right way to fix it? If it is, I can submit this formally.

What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't find much via a google search or in Documentation/.

Thanks,
Nithin.

--------------------

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5dca77e..1f60503 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
        rcu_read_unlock();

        if (do_failover || slave_state_changed) {
-               if (!rtnl_trylock())
-                       goto re_arm;
+               rtnl_lock();

                if (slave_state_changed) {
                        bond_slave_state_change(bond);


Reply via email to