Re: ixgbe: driver drops packets routed from an IPSec interface with a "bad sa_idx" error

Shannon Nelson Tue, 10 Sep 2019 14:44:45 -0700

On 9/9/19 11:45 AM, Michael Marley wrote:

On 2019-09-09 14:21, Shannon Nelson wrote:
On 9/6/19 11:13 AM, Michael Marley wrote:
(This is also reported athttps://bugzilla.kernel.org/show_bug.cgi?id=204551, but it wasrecommended that I send it to this list as well.)
I have a put together a router that routes traffic from severallocal subnets from a switch attached to an i82599ES card through anIPSec VPN interface set up with StrongSwan. (The VPN is running onan unrelated second interface with a different driver.) Trafficfrom the local interfaces to the VPN works as it should andeventually makes it through the VPN server and out to the Internet. The return traffic makes it back to the router and tcpdump shows itleaving by the i82599, but the traffic never actually makes it ontothe wire and I instead get one of
enp1s0: ixgbe_ipsec_tx: bad sa_idx=64512 handle=0
for each packet that should be transmitted. (The sa_idx and handlevalues are always the same.)
I realized this was probably related to ixgbe's IPSec offloadingfeature, so I tried with the motherboard's integrated e1000e deviceand didn't have the problem. I tried using ethtool to disable allthe IPSec-related offloads (tx-esp-segmentation, esp-hw-offload,esp-tx-csum-hw-offload), but the problem persisted. I then triedrecompiling the kernel with CONFIG_IXGBE_IPSEC=n and that workedaround the problem.
I was also able to find another instance of the same problemreported in Debian athttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930443. Thatperson seems to be having exactly the same issue as me, down to thesa_idx and handle values being the same.
If there are any more details I can provide to make this easier totrack down, please let me know.
Thanks,

Michael Marley
Hi Michael,

Thanks for pointing this out.  The issue this error message is
complaining about is that the handle given to the driver is a bad
value.  The handle is what helps the driver find the right encryption
information, and in this case is an index into an array, one array for
Rx and one for Tx, each of which have up to 1024 entries.  In order to
encode them into a single value, 1024 is added to the Tx values to
make the handle, and 1024 is subtracted to use the handle later.  Note
that the bad sa_idx is 64512, which happens to also be -1024; if the
Tx handle given to ixgbe for xmit is 0, we subtract 1024 from that and
get this bad sa_idx value.

That handle is supposed to be an opaque value only used by the
driver.  It looks to me like either (a) the driver is not setting up
the handle correctly when the SA is first set up, or (b) something in
the upper levels of the ipsec code is clearing the handle value. We
would need to know more about all the bits in your SA set up to have a
better idea what parts of the ipsec code are being exercised when this
problem happens.

I currently don't have access to a good ixgbe setup on which to
test/debug this, and I haven't been paying much attention lately to
what's happening in the upper ipsec layers, so my help will be
somewhat limited.  I'm hoping the the Intel folks can add a little
help, so I've copied Jeff Kirsher on this (they'll probably point back
to me since I wrote this chunk :-) ).  I've also copied Stephen
Klassert for his ipsec thoughts.

In the meantime, can you give more details on the exact ipsec rules
that are used here, and are there any error messages coming from ixgbe
regarding the ipsec rule setup that might help us identify what's
happening?

Thanks,
sln
Hi Shannon,
Thanks for your response! I apologize, I am a bit of a newbie toIPSec myself, so I'm not 100% sure what is the best way to provide theinformation you need, but here is the (slightly-redacted) output ofswanctl --list-sas first from the server and then from the client:
<servername>: #24, ESTABLISHED, IKEv2, 3cb75c180ee5dc68_icc7dae551b603bb7_r*
  local  '<serverip>' @ <serverip>[4500]
  remote '<clientip>' @ <clientip>[4500]
  AES_GCM_16-256/PRF_HMAC_SHA2_512/ECP_384
  established 174180s ago
<servername>: #110, reqid 12, INSTALLED, TUNNEL-in-UDP,ESP:AES_GCM_16-256/ECP_384
    installed 469s ago
    in  c51a0f11 (-|0x00000064), 1548864 bytes, 19575 packets, 6s ago
out c3bd9741 (-|0x00000064), 23618807 bytes, 22865 packets, 7sago
    local  0.0.0.0/0 ::/0
    remote 0.0.0.0/0 ::/0
<clientname>: #1, ESTABLISHED, IKEv2, 3cb75c180ee5dc68_i*cc7dae551b603bb7_r
  local  '<clientip>' @ <clientip>[4500]
  remote '<serverip>' @ <serverip>[4500]
  AES_GCM_16-256/PRF_HMAC_SHA2_512/ECP_384
  established 174013s ago
<clientname>: #54, reqid 1, INSTALLED, TUNNEL-in-UDP,ESP:AES_GCM_16-256/ECP_384
    installed 303s ago, rekeying in 2979s, expires in 3657s
in c3bd9741 (-|0x00000064), 23178523 bytes, 20725 packets, 0sago
    out c51a0f11 (-|0x00000064), 1429124 bytes, 17719 packets, 0s ago
    local  0.0.0.0/0 ::/0
    remote 0.0.0.0/0 ::/0
It might also be worth mentioning that I am using an xfrm interface todo "regular" routing rather than the policy-based routing thatStrongSwan/IPSec normally uses. If there is anything else that wouldhelp more, I would be happy to provide it.
Just to be clear though, I'm not trying to run IPSec on the ixgbeinterface at all. The ixgbe adapter is being used to connect therouter to the switch on the LAN side of the network. IPSec is runningon the WAN interface without any hardware acceleration (besidesAES-NI). The problem occurs when a computer on the LAN tries toaccess the WAN. The outgoing packets work as expected and theincoming packets are routed back out through the ixgbe device towardthe LAN client, but the driver drops the packets with the sa_idx error.
I hope this helps.

Thanks,

Michael

I'm not familiar with StrongSwan and its configurations, but I'mguessing that if you didn't expressly enable it, perhaps StrongSwanenabled the ipsec offload capability. I would suggest turning it off toat least get you passed the immediate issue. If there isn't an obviousconfiguration knob in StrongSwan, perhaps you can at least use ethtoolto disable the offload, which should be off be default anyway.

You can check it with "ethtool -k ethX | grep esp-hw-offload" and see ifit is set. You can disable it with "ethtool -K ethX esp-hw-offload off"


Meanwhile, can you please send the output of the following commands:
uname -a
ip xfrm s
ip xfrm p
dmesg | grep ixgbe

And any other /var/log/syslog or /var/log/messages that look suspiciousand might give any more insight to what's happening.


Thanks,
sln

Re: ixgbe: driver drops packets routed from an IPSec interface with a "bad sa_idx" error

Reply via email to