[ cc Ben ] On 4/15/21 9:51 AM, Rob Dover wrote: > Hi there, > > I'm working on an application that's programming IPSec connections via XFRM > on VRFs. I'm seeing some strange behaviour in cases where there is an > enslaved interface on the VRF - was wondering if anyone has seen something > like this before or perhaps knows how this is supposed to work?
Ben was / is looking at ipsec and VRF. Maybe he has some thoughts. > > In our setup, we have a VRF and an enslaved (sidebar: should I be using a > different term for this? I would prefer to use something with fewer negative > historic connotations if possible!) interface like so: for the sidebar, you can just say that a netdev is a member of the L3 domain. iproute2 supports 'vrf' keyword for better user semantics than 'master' Any chance you can create a shell script that creates your setup using network namespaces? tools/testing/selftests/net/fcnal-test.sh has some helpers -- create_vrf, create_ns and connect_ns -- which simplify the 'namespace as a node' concept and configuring the interconnects. A standalone script would allow runs across kernel versions and make it easier for others (me) to debug. > > ``` > # ip link show vrf-1 > 33: vrf-1: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode > DEFAULT group default qlen 1000 > link/ether 2a:71:ba:bd:33:4d brd ff:ff:ff:ff:ff:ff > > # ip link show master vrf-1 > 32: serv1: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrf-1 state > UP mode DEFAULT group default qlen 1000 > link/ether 00:13:3e:00:16:68 brd ff:ff:ff:ff:ff:ff > ``` > > The serv1 interface has some associated IPs but the vrf-1 interface does not: > > ``` > # ip addr show dev vrf-1 > 33: vrf-1: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP group > default qlen 1000 > link/ether 2a:71:ba:bd:33:4d brd ff:ff:ff:ff:ff:ff > > # ip addr show dev serv1 > 32: serv1: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrf-1 state > UP group default qlen 1000 > link/ether 00:13:3e:00:16:68 brd ff:ff:ff:ff:ff:ff > inet 10.248.0.191/16 brd 10.248.255.255 scope global serv1 > valid_lft forever preferred_lft forever > inet 10.248.0.250/16 brd 10.248.255.255 scope global secondary serv1 > valid_lft forever preferred_lft forever > inet6 fd5f:5d21:845:1401:213:3eff:fe00:1668/64 scope global > valid_lft forever preferred_lft forever > inet6 fe80::213:3eff:fe00:1668/64 scope link > valid_lft forever preferred_lft forever > ``` > > We're trying to program XFRM using these addresses to send and receive IPSec > traffic in transport mode. The interesting question is which interface the > XFRM state should be programmed on. I started off by programming the > following policies and SAs on the VRF: > > ``` > # ip xfrm policy show > src 10.254.13.16/32 dst 10.248.0.191/32 sport 37409 dport 5080 dev vrf-1 > dir in priority 2147483648 ptype main > tmpl src 0.0.0.0 dst 0.0.0.0 > proto esp reqid 0 mode transport > src 10.248.0.191/32 dst 10.254.13.16/32 sport 16381 dport 37409 dev vrf-1 > dir out priority 2147483648 ptype main > tmpl src 0.0.0.0 dst 0.0.0.0 > proto esp reqid 0 mode transport > # ip xfrm state show > src 10.254.13.16 dst 10.248.0.191 > proto esp spi 0x03a0392c reqid 3892838400 mode transport > replay-window 0 > auth-trunc hmac(md5) 0x00112233445566778899aabbccddeeff 96 > enc cbc(des3_ede) 0xfeefdccdbaab98897667544532231001feefdccdbaab9889 > anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 > sel src 0.0.0.0/0 dst 0.0.0.0/0 sport 37409 dport 5080 dev vrf-1 > src 10.248.0.191 dst 10.254.13.16 > proto esp spi 0x00124f80 reqid 0 mode transport > replay-window 0 > auth-trunc hmac(md5) 0x00112233445566778899aabbccddeeff 96 > enc cbc(des3_ede) 0xfeefdccdbaab98897667544532231001feefdccdbaab9889 > anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 > sel src 0.0.0.0/0 dst 0.0.0.0/0 sport 16381 dport 37409 dev vrf-1 > ``` > > Having programmed these, I can then send ESP packets from 10.254.13.16:37409 > -> 10.248.0.191:5080 and they are successfully decoded and passed up to my > application. However, when I try to send UDP packets out again from > 10.248.0.191:16381 -> 10.254.13.16:37409, the packets are not encrypted but > sent out in the clear! > > Now, I've done some experimentation and found that if I program the outbound > XFRM policy (eg. 10.248.0.191->10.254.13.16) to be on serv1 rather than > vrf-1, the packets are correctly encrypted. But if I program the inbound XFRM > policy (eg. 10.254.13.16->10.248.0.191) to be on serv1 rather than vrf-1, the > inbound packets are not passed up to my application! That leaves me in a > situation where I need to program the inbound and outbound XFRM policies > asymmetrically in order to get my traffic to be sent properly, like so: > > ``` > # ip xfrm policy show > src 10.254.13.16/32 dst 10.248.0.191/32 sport 37409 dport 5080 dev vrf-1 > dir in priority 2147483648 ptype main > tmpl src 0.0.0.0 dst 0.0.0.0 > proto esp reqid 0 mode transport > src 10.248.0.191/32 dst 10.254.13.16/32 sport 16381 dport 37409 dev serv1 > dir out priority 2147483648 ptype main > tmpl src 0.0.0.0 dst 0.0.0.0 > proto esp reqid 0 mode transport > # ip xfrm state show > src 10.254.13.16 dst 10.248.0.191 > proto esp spi 0x03a0392c reqid 3892838400 mode transport > replay-window 0 > auth-trunc hmac(md5) 0x00112233445566778899aabbccddeeff 96 > enc cbc(des3_ede) 0xfeefdccdbaab98897667544532231001feefdccdbaab9889 > anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 > sel src 0.0.0.0/0 dst 0.0.0.0/0 sport 37409 dport 5080 dev vrf-1 > src 10.248.0.191 dst 10.254.13.16 > proto esp spi 0x00124f80 reqid 0 mode transport > replay-window 0 > auth-trunc hmac(md5) 0x00112233445566778899aabbccddeeff 96 > enc cbc(des3_ede) 0xfeefdccdbaab98897667544532231001feefdccdbaab9889 > anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 > sel src 0.0.0.0/0 dst 0.0.0.0/0 sport 16381 dport 37409 dev serv1 > ``` > > It feels like I'm doing something wrong here - the asymmetrical programming > of the interfaces doesn't seem like the 'correct' approach. Seems like there > are three possibilities: > (a) this is standard behaviour, I just have to live with it (although maybe > there are some tweaks I can make to settings to change things?), > (b) somewhere along the line the way the application is passing packets down > to the kernel is incorrect and that's what's causing the mismatch, > (c) this is a bug in how the kernel works and it's not attributing the > packets to the appropriate interface. > > Any idea which of these is the right answer? > > I'm running with the kernel that ships with Centos8 (4.18.0-240.1.1), so I > know I'm a bit out of date! But I've done a trawl through recent changes to > the kernel code in this area and I can't see anything that would have > obviously changed the behaviour I'm seeing (feel free to correct me if I've > missed something!). > > Thanks for your help, > Rob Dover >