I've been able to narrow the scope down, the issue is with macsec itself. I setup two hosts with a macsec link between them, and let a couple iperf3 sessions blast traffic across. At approximately 4.2 billion packets / 6TB data transferred one end stopped transmitting packets. Doing a tcpdump on the impacted node's macsec0 interface shows packets coming in from the remote node, in this case arp requests, and arp replies from the local host, but watching the interface counters for macsec0 no packets are being recorded as transmitting. Again, nothing in dmesg implying an error.
Deleting the macsec interface via ip link delete macsec0 and re-creating it gets traffic flowing again without a reboot. Meanwhile my traffic control bridge without macsec has shuffled 19TB via 22 billion packets and not skipped a beat, so it appears my initial assumption of it being the culprit was wrong. To replicate, setup two hosts with a direct ethernet link between each other. - Bring up macsec between the two hosts, setup a dedicated /30 on the link. - Use iperf3 or another traffic generating tool over the /30, one session for each direction. - Wait for traffic to stop. My test bed is on Ubuntu Server 18.04 currently, kernel 4.15.0-36. I'm going to spin up a vanilla kernel on 4.15 and then -current to see if this is an Ubuntu-ism from their patches, specific to 4.15, or a general issue with macsec. The script I used on each host (keys, rxmacs and IPs updated as appropriate): #!/bin/bash # Interfaces: # dif = Egress physical interface (Dest) # eif = Encrypted interface dif=ens224 eif=macsec0 # MACSec Keys: # txkey = Transmit (Local) key # rxkey = Receive (Remote) key # rxmac = Receive (Remote) MAC addy txkey=60995924232808431491190820961556 rxkey=87345530111733181210202106249824 rxmac=00:0c:29:c5:95:df # Clear any existing IP config ifconfig $dif 0.0.0.0 # Bring up macsec: echo "* Enable MACSec" modprobe macsec ip link add link "$dif" "$eif" type macsec ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey" ip macsec add "$eif" rx address "$rxmac" port 1 ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey" ip link set "$eif" type macsec encrypt on # Bring up the interfaces: echo "* Light tunnel NICS" ip link set "$dif" up ip link set "$eif" up # Set IP ifconfig $eif 192.168.211.1/30 echo " --=[ MACSec Up ]=--" On Thu, Oct 11, 2018 at 10:05 AM Josh Coombs <jcoo...@staff.gwi.net> wrote: > > I'm actually leaning towards macsec now. I'm at 6TB transferred in a > double hop, no macsec over the bridge setup without triggering the > fault. I'm going to let it continue to churn and setup a second > testbed that JUST uses macsec without traffic control bridging to see > if I can trip the issue there. That should determine if it's macsec > itself, or an interaction between macsec and traffic control. > > Joshua Coombs > GWI > > office 207-494-2140 > www.gwi.net > > On Wed, Oct 10, 2018 at 12:39 PM Cong Wang <xiyou.wangc...@gmail.com> wrote: > > > > On Wed, Oct 10, 2018 at 8:54 AM Josh Coombs <jcoo...@staff.gwi.net> wrote: > > > > > > 2.3 billion 1 byte packets failed to re-create the bug. To try and > > > simplify the setup I removed macsec from the equation, using a single > > > host in the middle as the bridge. Interestingly, rather than 1.3Gbits > > > a second in both directions, it ran around 8Mbits a second. Switching > > > the filter from u32 to matchall didn't change the performance. Going > > > back to the four machine test bed, again removing macsec and just > > > bridging through radically decreased the throughput to around 8Mbits. > > > Flip on macsec for the bridge and 1.3Gbits? > > > > This is a great narrow down! We can rule out macsec for guilty. > > > > Can you share a minimum reproducer for this problem? If so I can take > > a look.