From: Jesper Dangaard Brouer <[email protected]>

On servers with many IPv4 addresses, __ip_dev_find() becomes visible in
perf profiles on the unconnected UDP sendmsg path. The call chain is:

  udpv6_sendmsg / udp_sendmsg
    ip_route_output_flow
      ip_route_output_key_hash_rcu
        __ip_dev_find              <-- source address validation

__ip_dev_find() calls inet_lookup_ifaddr_rcu() which walks a hash chain
in inet_addr_lst. With the current fixed table size of 256 buckets, a
host with ~700 IPv4 addresses averages ~2.8 entries per chain, adding
unnecessary cache misses under RCU on every unconnected send.

Add CONFIG_INET_ADDR_HASH_BUCKETS (default 256, range 64-16384, EXPERT)
so hosts with many addresses can size the table appropriately. The value
is rounded up to the nearest power of 2 at compile time via
order_base_2(). Memory cost is one hlist_head pointer per bucket per net
namespace.

Reported-by: Ivan Babrou <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---
 net/ipv4/Kconfig   | 16 ++++++++++++++++
 net/ipv4/devinet.c |  2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index df922f9f5289..3c5e5e74b3e4 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -402,6 +402,22 @@ config INET_IPCOMP
 
          If unsure, say Y.
 
+config INET_ADDR_HASH_BUCKETS
+       int "IPv4 address hash table size" if EXPERT
+       range 64 16384
+       default 256
+       help
+         Number of hash buckets for looking up local IPv4 addresses,
+         e.g. during route output to validate the source address via
+         __ip_dev_find().  Rounded up to the nearest power of 2.
+
+         Hosts with many IPv4 addresses benefit from a larger table to reduce
+         hash chain lengths. This is particularly relevant when sending using
+         unconnected UDP sockets.
+
+         The default of 256 is fine for most systems.  A value of 1024
+         suits hosts with ~500+ addresses.
+
 config INET_TABLE_PERTURB_ORDER
        int "INET: Source port perturbation table size (as power of 2)" if 
EXPERT
        default 16
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 58fe7cb69545..9e3da06fb618 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -108,7 +108,7 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = 
{
        [IFA_PROTO]             = { .type = NLA_U8 },
 };
 
-#define IN4_ADDR_HSIZE_SHIFT   8
+#define IN4_ADDR_HSIZE_SHIFT   order_base_2(CONFIG_INET_ADDR_HASH_BUCKETS)
 #define IN4_ADDR_HSIZE         (1U << IN4_ADDR_HSIZE_SHIFT)
 
 static u32 inet_addr_hash(const struct net *net, __be32 addr)
-- 
2.43.0


Reply via email to