On Mon, Nov 26, 2018 at 04:29:58PM -0800, Jay Vosburgh wrote: > Toni Peltonen <pel...@peltzi.fi> wrote: > > >Previously when unbinding a slave the 802.3ad implementation only told > >partner that the port is not suitable for aggregation by setting the port > >aggregation state from aggregatable to individual. This is not enough. If the > >physical layer still stays up and we only unbinded this port from the bond > >there > >is nothing in the aggregation status alone to prevent the partner from > >sending > >traffic towards us. To ensure that the partner doesn't consider this > >port at all anymore we should also disable collecting and distributing to > >signal that this actor is going away. > > > >I have tested this behaviour againts Arista EOS switches with mlx5 cards > >(physical link stays up when even when interface is down) and simulated > >the same situation virtually Linux <-> Linux with two network namespaces > >running two veth device pairs. In both cases setting aggregation to > >individual doesn't alone prevent traffic from being to sent towards this > >port given that the link stays up in partners end. Partner still keeps > >it's end in collecting + distributing state and continues until timeout is > >reached. In most cases this means we are losing the traffic partner sends > >towards our port while we wait for timeout. This is most visible with slow > >periodic time (LAPC rate slow). > > "LAPC" -> "LACP" >
I will fix the typing error and send updated version. > >Other open source implementations like Open VSwitch and libreswitch, and > >vendor implementations like Arista EOS, seem to disable collecting + > >distributing to when doing similar port disabling/detaching/removing change. > >With this patch kernel implementation would behave the same way and ensure > >partner doesn't consider our actor viable anymore. > > After re-reading the relevant bits of 802.1AX (particularly > 5.4.9 on recordPDU and update_Selected) I'm going to suggest also > clearing AD_STATE_SYNCHRONIZATION, based on: > > Partner_Oper_Port_State.Synchronization is also set to TRUE if > the value of Actor_State.Aggregation in the received PDU is set > to FALSE (i.e., indicates an Individual link), > Actor_State.Synchronization in the received PDU is set to TRUE, > and LACP will actively maintain the link. > > Per the above, learing _SYNC in the LACPDU should un-sync the > port, inducing the Mux state machine (figure 5-15) to exit C_D state and > go to ATTACHED state (disabling Coll/Dist). > > But, either way, as this is a hint to get the link partner to > stop using the port, this looks reasonable to me. After looking more closely into that I agree. It seems that also clearing the AD_STATE_SYNCHRONIZATION would make sense here. I will test this and send a new version with it if it doesn't cause any side effects. > > Acked-by: Jay Vosburgh <jay.vosbu...@canonical.com> > > -J > > >Signed-off-by: Toni Peltonen <pel...@peltzi.fi> > >--- > > drivers/net/bonding/bond_3ad.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c > >index f43fb2f958a5..6776c33753dc 100644 > >--- a/drivers/net/bonding/bond_3ad.c > >+++ b/drivers/net/bonding/bond_3ad.c > >@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave) > > aggregator->aggregator_identifier); > > > > /* Tell the partner that this port is not suitable for aggregation */ > >+ port->actor_oper_port_state &= ~AD_STATE_COLLECTING; > >+ port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING; > > port->actor_oper_port_state &= ~AD_STATE_AGGREGATION; > > __update_lacpdu_from_port(port); > > ad_lacpdu_send(port); > >-- > >2.19.0 > > --- > -Jay Vosburgh, jay.vosbu...@canonical.com