Re: [RFC PATCH] openvswitch: use percpu flow stats

Eric Dumazet Fri, 19 Aug 2016 21:08:33 -0700

On Fri, 2016-08-19 at 18:09 -0700, David Miller wrote:
> From: Eric Dumazet <eric.duma...@gmail.com>
> Date: Fri, 19 Aug 2016 12:56:56 -0700
> 
> > On Fri, 2016-08-19 at 16:47 -0300, Thadeu Lima de Souza Cascardo wrote:
> >> Instead of using flow stats per NUMA node, use it per CPU. When using
> >> megaflows, the stats lock can be a bottleneck in scalability.
> > 
> > ...
> > 
> >>  
> >>    flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
> >> -                                 + (nr_node_ids
> >> +                                 + (num_possible_cpus()
> >>                                      * sizeof(struct flow_stats *)),
> >>                                   0, 0, NULL);
> >>    if (flow_cache == NULL)
> > 
> > This is bogus.
> > 
> > Please use nr_cpu_ids instead of num_possible_cpus().
> 
> Then how would hot-plugged cpus be handled properly?
> 
> The only other way is you'd have to free all objects in the current
> flow_cache, destroy it, then make a new kmem_cache and reallocate
> all of the existing objects and hook them back up every time a cpu
> up event occurs.
> 
> That doesn't make any sense at all.
> 
> Therefore, the size of the objects coming out of "sw_flow" have
> to accomodate all possible cpu indexes.


Here is an example of a system I had in the past.

8 cpus  (num_possible_cpus() returns 8)

Cpus were numbered : 0, 1, 2, 3, 8, 9, 10, 11  : (nr_cpu_ids = 12)

I am pretty sure they are still alive.

Since you want an array indexed by cpu numbers, you need to size it by
nr_cpu_ids, otherwise array[11] will crash / access garbage.

This is why you find many nr_cpu_ids in the linux tree, not
num_possible_cpus() to size arrays.

# git grep -n nr_cpu_ids -- net
net/bridge/netfilter/ebtables.c:900:                    vmalloc(nr_cpu_ids * 
sizeof(*(newinfo->chainstack)));
net/bridge/netfilter/ebtables.c:1132:   countersize = 
COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
net/bridge/netfilter/ebtables.c:1185:   countersize = 
COUNTER_OFFSET(repl->nentries) * nr_cpu_ids;
net/bridge/netfilter/ebtables.c:2200:   countersize = 
COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
net/core/dev.c:3483:    if (next_cpu < nr_cpu_ids) {
net/core/dev.c:3588:             *   - Current CPU is unset (>= nr_cpu_ids).
net/core/dev.c:3596:                (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
net/core/dev.c:3603:            if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {
net/core/dev.c:3651:            if (rflow->filter == filter_id && cpu < 
nr_cpu_ids &&
net/core/neighbour.c:2737:      for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/core/neighbour.c:2751:      for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/core/net-procfs.c:122:      while (*pos < nr_cpu_ids)
net/core/sysctl_net_core.c:70:                          rps_cpu_mask = 
roundup_pow_of_two(nr_cpu_ids) - 1;
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:360:      for (cpu = 
*pos-1; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:375:      for (cpu = 
*pos; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/route.c:254:   for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/route.c:267:   for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/ipv6/icmp.c:918:            kzalloc(nr_cpu_ids * sizeof(struct sock *), 
GFP_KERNEL);
net/netfilter/nf_conntrack_netlink.c:1265:      for (cpu = cb->args[0]; cpu < 
nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_netlink.c:2003:      if (cb->args[0] == nr_cpu_ids)
net/netfilter/nf_conntrack_netlink.c:2006:      for (cpu = cb->args[0]; cpu < 
nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_netlink.c:3211:      if (cb->args[0] == nr_cpu_ids)
net/netfilter/nf_conntrack_netlink.c:3214:      for (cpu = cb->args[0]; cpu < 
nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_standalone.c:309:    for (cpu = *pos-1; cpu < 
nr_cpu_ids; ++cpu) {
net/netfilter/nf_conntrack_standalone.c:324:    for (cpu = *pos; cpu < 
nr_cpu_ids; ++cpu) {
net/netfilter/nf_synproxy_core.c:255:   for (cpu = *pos - 1; cpu < nr_cpu_ids; 
cpu++) {
net/netfilter/nf_synproxy_core.c:270:   for (cpu = *pos; cpu < nr_cpu_ids; 
cpu++) {
net/netfilter/x_tables.c:1064:  size = sizeof(void **) * nr_cpu_ids;
net/sunrpc/svc.c:167:   unsigned int maxpools = nr_cpu_ids;

Re: [RFC PATCH] openvswitch: use percpu flow stats

Reply via email to