On Mon, Nov 26, 2018 at 04:29:58PM -0800, Jay Vosburgh wrote:
> Toni Peltonen <pel...@peltzi.fi> wrote:
> 
> >Previously when unbinding a slave the 802.3ad implementation only told
> >partner that the port is not suitable for aggregation by setting the port
> >aggregation state from aggregatable to individual. This is not enough. If the
> >physical layer still stays up and we only unbinded this port from the bond 
> >there
> >is nothing in the aggregation status alone to prevent the partner from 
> >sending
> >traffic towards us. To ensure that the partner doesn't consider this
> >port at all anymore we should also disable collecting and distributing to
> >signal that this actor is going away.
> >
> >I have tested this behaviour againts Arista EOS switches with mlx5 cards
> >(physical link stays up when even when interface is down) and simulated
> >the same situation virtually Linux <-> Linux with two network namespaces
> >running two veth device pairs. In both cases setting aggregation to
> >individual doesn't alone prevent traffic from being to sent towards this
> >port given that the link stays up in partners end. Partner still keeps
> >it's end in collecting + distributing state and continues until timeout is
> >reached. In most cases this means we are losing the traffic partner sends
> >towards our port while we wait for timeout. This is most visible with slow
> >periodic time (LAPC rate slow).
> 
>       "LAPC" -> "LACP"
> 

I will fix the typing error and send updated version.

> >Other open source implementations like Open VSwitch and libreswitch, and
> >vendor implementations like Arista EOS, seem to disable collecting +
> >distributing to when doing similar port disabling/detaching/removing change.
> >With this patch kernel implementation would behave the same way and ensure
> >partner doesn't consider our actor viable anymore.
> 
>       After re-reading the relevant bits of 802.1AX (particularly
> 5.4.9 on recordPDU and update_Selected) I'm going to suggest also
> clearing AD_STATE_SYNCHRONIZATION, based on:
> 
>       Partner_Oper_Port_State.Synchronization is also set to TRUE if
>       the value of Actor_State.Aggregation in the received PDU is set
>       to FALSE (i.e., indicates an Individual link),
>       Actor_State.Synchronization in the received PDU is set to TRUE,
>       and LACP will actively maintain the link.
> 
>       Per the above, learing _SYNC in the LACPDU should un-sync the
> port, inducing the Mux state machine (figure 5-15) to exit C_D state and
> go to ATTACHED state (disabling Coll/Dist).
> 
>       But, either way, as this is a hint to get the link partner to
> stop using the port, this looks reasonable to me.  

After looking more closely into that I agree. It seems that also
clearing the AD_STATE_SYNCHRONIZATION would make sense here. I will test
this and send a new version with it if it doesn't cause any side
effects.

> 
> Acked-by: Jay Vosburgh <jay.vosbu...@canonical.com>
> 
>       -J      
> 
> >Signed-off-by: Toni Peltonen <pel...@peltzi.fi>
> >---
> > drivers/net/bonding/bond_3ad.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index f43fb2f958a5..6776c33753dc 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
> >                aggregator->aggregator_identifier);
> > 
> >     /* Tell the partner that this port is not suitable for aggregation */
> >+    port->actor_oper_port_state &= ~AD_STATE_COLLECTING;
> >+    port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING;
> >     port->actor_oper_port_state &= ~AD_STATE_AGGREGATION;
> >     __update_lacpdu_from_port(port);
> >     ad_lacpdu_send(port);
> >-- 
> >2.19.0
> 
> ---
>       -Jay Vosburgh, jay.vosbu...@canonical.com

Reply via email to