Hoi, To close out my monologue -- I sent https://gerrit.fd.io/r/c/vpp/+/38854 to make VPP's Linux Controlplane plugin aware of NLM_F_REPLACE messages. Rolled that out at AS8283 this morning, and our duplicate FIB entry issue is gone. Nothing to see here, moving along :)
groet, Pim On Sat, May 20, 2023 at 11:50 PM Pim van Pelt <[email protected]> wrote: > Hoi, > > I think I've found the answer to my question by taking a look at git > history in netlink handling. > > This commit: > > commit 8235c4747dcc92de2ea991f78cdf9c6b8fa7f522 > > Author: Ondrej Zajicek (work) <[email protected]> > > Date: Mon Jul 15 16:23:18 2019 +0200 > > > Netlink: Use route replace for IPv4 > > > Started using NL_OP_REPLACE for IPv4, but it kept it disabled for IPv6, > and then this commit: > > > commit 722daa950046a7ad307fd7aca8e0506f30b3d000 > > Author: Ondrej Zajicek <[email protected]> > > Date: Mon Jul 25 00:11:40 2022 +0200 > > > Netlink: Simplify handling of IPv6 ECMP routes > > > started using for IPv6 as well, where this commit: > > > commit ddb1bdf2819ce69248d5a51e71d803f13548b217 > > Author: Ondrej Zajicek <[email protected]> > > Date: Tue Jul 26 18:45:20 2022 +0200 > > > Netlink: Restrict route replace for IPv6 > > > added a nice guard in nl_allow_replace() -- this explains the replace > semantics (which 'ip monitor route' does not show), and answers my > question. For my application, I'll have to take a good look at consuming > messages with flag NLM_F_CREATE|NLM_F_REPLACE set; and otherwise perhaps > add the ability to Bird2/Bird3 to holdback and > issue NL_OP_DELETE + NL_OP_ADD. > > For the curious, the application is Vector Packet Processing [ref > <https://ipng.ch/s/articles/2021/09/02/vpp-5.html>] which consumes > Netlink messages from the Linux kernel, and uses them to program a > userspace dataplane, see [Linux Control Plane > <https://s3-docs.fd.io/vpp/23.06/developer/plugins/lcp.html>] for > details. Until now, this system consumes RTM_NEWROUTE and RTM_DELROUTE but > is not yet capable of consuming this replacing logic. I'll take a look at > adding that. > > groet, > Pim > > > > On Sat, May 20, 2023 at 11:10 PM Pim van Pelt <[email protected]> wrote: > >> Hoi, >> >> As a quick followup why I'm asking about versions -- on a Bird2.0.7, I do >> see the delete-before-insert: >> >> root@chgtg0:~# ip -6 monitor route | grep 2001:678:d78::6 >> >> >> # Raise OSPFv3 cost to prefer tf-0-0 >> >> *Deleted* 2001:678:d78::6 via fe80::21b:21ff:febd:c718 dev xe0-3.3102.20 >> proto bird metric 32 pref medium >> >> 2001:678:d78::6 via fe80::6eb3:11ff:fe20:e0c4 dev tf0-0 proto bird metric >> 32 pref medium >> >> >> # Lower OSPFv3 cost to prefer xe0-3.3102.20 again >> >> *Deleted* 2001:678:d78::6 via fe80::6eb3:11ff:fe20:e0c4 dev tf0-0 proto >> bird metric 32 pref medium >> >> 2001:678:d78::6 via fe80::21b:21ff:febd:c718 dev xe0-3.3102.20 proto bird >> metric 32 pref medium >> >> groet, >> Pim >> >> On Sat, May 20, 2023 at 10:51 PM Pim van Pelt <[email protected]> wrote: >> >>> Hoi folks, >>> >>> At Coloclue AS8283, we upgraded from Bird1.6.8 to Bird2.0.12 this week. >>> We use two separate processes, one for IPv4 and one for IPv6 - and 2.0.7 in >>> Debian is missing the ability to select 'accept ipv4' and 'accept ipv6' in >>> BFD, so we installed backports and version 2.0.12). >>> >>> I am wondering if Bird2 later than 2.0.7 perhaps has an optimization >>> when swapping routes? I would expect a swap to be "delete + add" but I am >>> seeing only "add with new nexthop" appear in Netlink. >>> >>> Considering the following topology with link names and OSPFv3 costs >>> associated: >>> >>> dcg-1 bond0.130 ---- bond0.130 eun-2 >>> >>> | 2000 | >>> >>> enp1s0f3 enp1s0f2 >>> >>> | | >>> >>> | 10 10 | >>> >>> | | >>> >>> enp1s0f3 enp1s0f3 >>> >>> | 1000 | >>> >>> dcg-2 eno2.3469 ---- eno2.3469 eun-3 >>> >>> If I restart the OSPFv3 protocol, I see that the topology settles in the >>> expected way. What I observed with bird 2.0.12 is that there is a deletion >>> of the currently selected route followed by one addition, when the shortest >>> path reveales (dcg1 - dcg2 - eun3 - eun2, ospf_metric1 is 1020, this is >>> fine): >>> >>> root@dcg-1:~# birdc -s /run/bird/bird6.ctl restart ospf1 >>> >>> root@dcg-1:~# ip -6 monitor route | grep 2a02:898:0:300::3 >>> >>> Deleted 2a02:898:0:300::3 via fe80::669d:99ff:feb1:31af dev bond0.130 >>> proto bird metric 32 pref medium >>> >>> 2a02:898:0:300::3 via fe80::669d:99ff:feb1:3910 dev enp1s0f3 proto bird >>> metric 32 pref medium >>> >>> Now I lower the cost of the dcg-1 -- eun-2 link from 2000 to 100, so >>> that it becomes preferred (cost ospf_metric is 120): >>> >>> root@dcg-1:~# birdc -s /run/bird/bird6.ctl reconfigure ospf1 >>> >>> root@dcg-1:~# ip -6 monitor route | grep 2a02:898:0:300::3 >>> >>> *[[ HERE ]]* >>> >>> 2a02:898:0:300::3 via fe80::669d:99ff:feb1:31af dev bond0.130 proto bird >>> metric 32 pref medium >>> >>> I would expect this new addition of the installed route on bond0.130 to >>> be *preceded by a deletion* of the previous route from enp1s0f3, but >>> this is not the case (marked in red with [[ HERE ]]). >>> >>> To anyone's knowledge: *Has this behavior changed between 2.0.7 and >>> 2.0.12 ?* >>> >>> groet, >>> Pim >>> -- >>> Pim van Pelt <[email protected]> >>> PBVP1-RIPE - http://www.ipng.nl/ >>> >> >> >> -- >> Pim van Pelt <[email protected]> >> PBVP1-RIPE - http://www.ipng.nl/ >> > > > -- > Pim van Pelt <[email protected]> > PBVP1-RIPE - http://www.ipng.nl/ > -- Pim van Pelt <[email protected]> PBVP1-RIPE - http://www.ipng.nl/
