At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> wrote: >On Thu, May 10, 2018 at 4:28 AM, <gfree.w...@vip.163.com> wrote: >> From: Gao Feng <gfree.w...@vip.163.com> >> >> The skb flow limit is implemented for each CPU independently. In the >> current codes, the function skb_flow_limit gets the softnet_data by >> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not >> the current cpu when enable RPS. As the result, the skb_flow_limit checks >> the stats of current CPU, while the skb is going to append the queue of >> another CPU. It isn't the expected behavior. >> >> Now pass the softnet_data as a param to softnet_data to make consistent. > >The local cpu softnet_data is used on purpose. The operations in >skb_flow_limit() on sd fields could race if not executed on the local cpu.
I think the race doesn't exist because of the rps_lock. The enqueue_to_backlog has hold the rps_lock before skb_flow_limit. > >Flow limit tries to detect large ("elephant") DoS flows with a fixed >four-tuple. >These would always hit the same RPS cpu, so that cpu being backlogged They may hit the different target CPU when enable RFS. Because the app could be scheduled to another CPU, then RFS tries to deliver the skb to latest core which has hot cache. >may be an indication that such a flow is active. But the flow will also always >arrive on the same initial cpu courtesy of RSS. So storing the lookup table The RSS couldn't make sure the irq is handled by same cpu. It would be balanced between the cpus. >on the initial CPU is also fine. There may be false positives on other CPUs >with the same RPS destination, but that is unlikely with a highly concurrent >traffic server mix ("mice"). If my comment is right, the flow couldn't always arrive one the same initial cpu, although it may be sent to one same target cpu. > >Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature >for the cpus on which traffic initially lands, not the RPS destination cpus. >See also Documentation/networking/scaling.txt > >That said, I had to reread the code, as it does seem sensible that the >same softnet_data is intended to be used both when testing qlen and >flow_limit. In most cases, user configures the same RPS map with flow_limit like 0xff. Because user couldn't predict which core the evil flow would arrive on. Take an example, there are 2 cores, cpu0 and cpu1. One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the target cpu is cpu1. Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue length of cpu0. Certainly it could pass the check of skb_flow_limit because there is no any evil flow on cpu0. Then the cpu0 inserts the skb into the queue of cpu1. As a result, the skb_flow_limit doesn't work as expected. BTW, I have already sent the v2 patch which only adds the "Fixes: tag". Best Regards Feng