On Fri, 2016-08-19 at 18:09 -0700, David Miller wrote: > From: Eric Dumazet <eric.duma...@gmail.com> > Date: Fri, 19 Aug 2016 12:56:56 -0700 > > > On Fri, 2016-08-19 at 16:47 -0300, Thadeu Lima de Souza Cascardo wrote: > >> Instead of using flow stats per NUMA node, use it per CPU. When using > >> megaflows, the stats lock can be a bottleneck in scalability. > > > > ... > > > >> > >> flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow) > >> - + (nr_node_ids > >> + + (num_possible_cpus() > >> * sizeof(struct flow_stats *)), > >> 0, 0, NULL); > >> if (flow_cache == NULL) > > > > This is bogus. > > > > Please use nr_cpu_ids instead of num_possible_cpus(). > > Then how would hot-plugged cpus be handled properly? > > The only other way is you'd have to free all objects in the current > flow_cache, destroy it, then make a new kmem_cache and reallocate > all of the existing objects and hook them back up every time a cpu > up event occurs. > > That doesn't make any sense at all. > > Therefore, the size of the objects coming out of "sw_flow" have > to accomodate all possible cpu indexes.
Here is an example of a system I had in the past. 8 cpus (num_possible_cpus() returns 8) Cpus were numbered : 0, 1, 2, 3, 8, 9, 10, 11 : (nr_cpu_ids = 12) I am pretty sure they are still alive. Since you want an array indexed by cpu numbers, you need to size it by nr_cpu_ids, otherwise array[11] will crash / access garbage. This is why you find many nr_cpu_ids in the linux tree, not num_possible_cpus() to size arrays. # git grep -n nr_cpu_ids -- net net/bridge/netfilter/ebtables.c:900: vmalloc(nr_cpu_ids * sizeof(*(newinfo->chainstack))); net/bridge/netfilter/ebtables.c:1132: countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; net/bridge/netfilter/ebtables.c:1185: countersize = COUNTER_OFFSET(repl->nentries) * nr_cpu_ids; net/bridge/netfilter/ebtables.c:2200: countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; net/core/dev.c:3483: if (next_cpu < nr_cpu_ids) { net/core/dev.c:3588: * - Current CPU is unset (>= nr_cpu_ids). net/core/dev.c:3596: (tcpu >= nr_cpu_ids || !cpu_online(tcpu) || net/core/dev.c:3603: if (tcpu < nr_cpu_ids && cpu_online(tcpu)) { net/core/dev.c:3651: if (rflow->filter == filter_id && cpu < nr_cpu_ids && net/core/neighbour.c:2737: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) { net/core/neighbour.c:2751: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { net/core/net-procfs.c:122: while (*pos < nr_cpu_ids) net/core/sysctl_net_core.c:70: rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1; net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:360: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) { net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:375: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { net/ipv4/route.c:254: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) { net/ipv4/route.c:267: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { net/ipv6/icmp.c:918: kzalloc(nr_cpu_ids * sizeof(struct sock *), GFP_KERNEL); net/netfilter/nf_conntrack_netlink.c:1265: for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) { net/netfilter/nf_conntrack_netlink.c:2003: if (cb->args[0] == nr_cpu_ids) net/netfilter/nf_conntrack_netlink.c:2006: for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) { net/netfilter/nf_conntrack_netlink.c:3211: if (cb->args[0] == nr_cpu_ids) net/netfilter/nf_conntrack_netlink.c:3214: for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) { net/netfilter/nf_conntrack_standalone.c:309: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) { net/netfilter/nf_conntrack_standalone.c:324: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { net/netfilter/nf_synproxy_core.c:255: for (cpu = *pos - 1; cpu < nr_cpu_ids; cpu++) { net/netfilter/nf_synproxy_core.c:270: for (cpu = *pos; cpu < nr_cpu_ids; cpu++) { net/netfilter/x_tables.c:1064: size = sizeof(void **) * nr_cpu_ids; net/sunrpc/svc.c:167: unsigned int maxpools = nr_cpu_ids;