Wed, Nov 23, 2016 at 05:59:05PM CET, [email protected] wrote: >On Wed, Nov 23, 2016, at 17:04, Jiri Pirko wrote: >> Wed, Nov 23, 2016 at 05:00:00PM CET, [email protected] wrote: >> >On Wed, Nov 23, 2016, at 15:48, Jiri Pirko wrote: >> >> From: Ido Schimmel <[email protected]> >> >> >> >> Make sure the device has a complete view of the FIB tables by invoking >> >> their dump during module init. >> >> >> >> Signed-off-by: Ido Schimmel <[email protected]> >> >> Signed-off-by: Jiri Pirko <[email protected]> >> >> --- >> >> drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 16 >> >> ++++++++++++++++ >> >> 1 file changed, 16 insertions(+) >> >> >> >> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c >> >> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c >> >> index 14bed1d..36a71d2 100644 >> >> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c >> >> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c >> >> @@ -2027,6 +2027,21 @@ static int mlxsw_sp_router_fib_event(struct >> >> notifier_block *nb, >> >> return NOTIFY_DONE; >> >> } >> >> >> >> +static void mlxsw_sp_router_fib_dump(struct mlxsw_sp *mlxsw_sp) >> >> +{ >> >> + while (!fib_notifier_dump(&mlxsw_sp->fib_nb)) { >> >> + /* Flush pending FIB notifications and then flush the >> >> + * device's table before requesting another dump. Do >> >> + * that with RTNL held, as FIB notification block is >> >> + * already registered. >> >> + */ >> >> + mlxsw_core_flush_owq(); >> >> + rtnl_lock(); >> >> + mlxsw_sp_router_fib_flush(mlxsw_sp); >> >> + rtnl_unlock(); >> >> + } >> >> +} >> > >> >I think it is fine to use this kind of synchronization. >> > >> >But I think that this part of the logic still belongs into the core >> >> Core does not know how driver handles the offloaded fibs. So only driver >> knows how/if he needs to do flush in case of retry. > >Sure, but an abort function can be provided to the kernel anyway and the >driver can care about that.
Ok, how? > >> >kernel. I still think it could happen that we will loop here >> >indefinitely because of a lot of routing updates and as such would need >> >to abort this loop after a number of tries. >> >> In theory, it is possible, howevery quite unlikely. > >I think the "quite unlikely" already got us down the path to not using >rtnl_lock in the first place. > >As I said, I am not sure about this as I didn't try any hardware >offloading before and delays how long it needs to be transferred to >hardware, but having a fail case for that seems like a nice improvement. >At the same time I know of Linux boxes running in internet exchanges >having several peers. The high update rates actually led to bgp >implementation specifying flap damping which is actually nowadays >considered harmful. > >Seriously, while most of the time convergence in routing protocols is >good and most updates only hit the BGP user space table anyway and the >change is suppressed because recursive routing lookup idempotence, quite >unlikely events happen to the internet now and then: >http://research.dyn.com/2009/02/longer-is-not-better/, which caused *a >lot* of flapping and ongoing events on BGP routers throughout the world. > >I agree it is unlikely that you have to refresh your hw dump during this >time, but who knows what customers do and what admins do in case >something like this happens. I just don't favor to looping endlessly >trying to sync up and getting into a stable state but tell the admin to >detach the control plane from the forwarding plane and sync up then. > >That said, I think a sysctl for a maximum number of loops respected by >drivers that needs to do so, should be enough for the time being. Okay. Point taken.
