On Tue, 2016-12-06 at 19:31 +0100, Paolo Abeni wrote: > cacheline 2 boundary (128 bytes) is 8 bytes before sk_lock: cacheline 2 > includes also skc_refcnt and skc_rxhash from __sk_common (I use 'pahole > -E ...' to get the full blown output). skc_rxhash is read for each > packet in inet_recvmsg()/sock_rps_record_flow() if CONFIG_RPS is set. I > get a cache miss per packet there and inet_recvmsg() in my test takes > about 8% of the whole u/s processing time.
Wait a minute, this sk->sk_rxhash should only be read on connected socket. Relying on it being 0 was okay only if we did not care of false sharing. And UDP sockets used to grab socket refcount, so we had false sharing a _lot_ in the past. We must fix this if not already done properly. Can you take care of this problem ? Thanks !