This patchset is optimizing the ICMP-reply code path, for ICMP packets
that gets rate limited. A remote party can easily trigger this code
path by sending packets to port number with no listening service.
Generally the patchset moves the sysctl_icmp_msgs_per_sec ratelimit
checking to earlier in the code path and removes an allocation.
Use-case: The specific case I experienced this being a bottleneck is,
sending UDP packets to a port with no listener, which obviously result
in kernel replying with ICMP Destination Unreachable (type:3), Port
Unreachable (code:3), which cause the bottleneck.
After Eric and Paolo optimized the UDP socket code, the kernels PPS
processing capabilities is lower for no-listen ports, than normal UDP
sockets. This is bad for capacity planning when restarting a service.
UDP no-listen benchmark 8xCPUs using pktgen_sample04_many_flows.sh:
Baseline: 6.6 Mpps
Patch: 14.7 Mpps
Driver mlx5 at 50Gbit/s.
---
Jesper Dangaard Brouer (3):
Revert "icmp: avoid allocating large struct on stack"
net: reduce cycles spend on ICMP replies that gets rate limited
net: for rate-limited ICMP replies save one atomic operation
net/ipv4/icmp.c | 125 +++++++++++++++++++++++++++++++++----------------------
net/ipv6/icmp.c | 68 +++++++++++++++++++++---------
2 files changed, 123 insertions(+), 70 deletions(-)
--