On Mon, Jul 25, 2022 at 12:16:50PM +0200, Claudio Jeker wrote: > The nexthop validation or actually invalidation is buggy in bgpd since > revision 1.90 of rde_decide.c. When I removed re->active and replaced it > with a value that is calculated on the spot I did not realize that this > calculation depends on the current nexthop state and not on the state used > on the previous decision process. > > The decision process used prefix_best() to get the previous best prefix. > Now if that prefix just became invalid (because of a nexthop state change) > prefix_best() will return NULL and oldbest and newbest will be NULL too. > Because of this no update will be sent out because prefix_evaluate() > thinks nothing changed. > > This is also the reason why prefix_set_dmetric() sees bad dmetrics during > nexthop flaps. > > To solve this issue I decided to introduce a NEXTHOP_VALID flag in struct > prefix (as part of the nhflags). Which is then used to validate the > nexthop state. When a nexthop changes state nexthop_runner calls > prefix_evaluate_nexthop() which adjust the NEXTHOP_VALID flag but this is > done after removing the prefix from the evaluation list and before > inserting it again. Because of this prefix_best() works correctly and > oldbest is not NULL in the case mentioned above. > > I decided to set the initial state of NEXTHOP_VALID in nexthop_link() and I > clear it in nexthop_unlink(). The link call happens after nhflags is set > and prefix_nhflags() masks out the NEXTHOP_VALID flag. > > With this nexthops that turn invalid are properly removed from the FIB and > Adj-RIB-Out.
Wow, that's nasty. The diff makes sense and looks correct to me. ok tb