On Mon, Jul 25, 2022 at 12:16:50PM +0200, Claudio Jeker wrote:
> The nexthop validation or actually invalidation is buggy in bgpd since
> revision 1.90 of rde_decide.c. When I removed re->active and replaced it
> with a value that is calculated on the spot I did not realize that this
> calculation depends on the current nexthop state and not on the state used
> on the previous decision process.
> 
> The decision process used prefix_best() to get the previous best prefix.
> Now if that prefix just became invalid (because of a nexthop state change)
> prefix_best() will return NULL and oldbest and newbest will be NULL too.
> Because of this no update will be sent out because prefix_evaluate()
> thinks nothing changed.
> 
> This is also the reason why prefix_set_dmetric() sees bad dmetrics during
> nexthop flaps.
> 
> To solve this issue I decided to introduce a NEXTHOP_VALID flag in struct
> prefix (as part of the nhflags). Which is then used to validate the
> nexthop state. When a nexthop changes state nexthop_runner calls
> prefix_evaluate_nexthop() which adjust the NEXTHOP_VALID flag but this is
> done after removing the prefix from the evaluation list and before
> inserting it again. Because of this prefix_best() works correctly and
> oldbest is not NULL in the case mentioned above.
> 
> I decided to set the initial state of NEXTHOP_VALID in nexthop_link() and I
> clear it in nexthop_unlink(). The link call happens after nhflags is set
> and prefix_nhflags() masks out the NEXTHOP_VALID flag.
> 
> With this nexthops that turn invalid are properly removed from the FIB and
> Adj-RIB-Out.

Wow, that's nasty. The diff makes sense and looks correct to me.

ok tb

Reply via email to