On Tue, 2017-12-19 at 19:01 +0800, zhangliping wrote: > At 2017-12-18 22:45:30, "Paolo Abeni" <pab...@redhat.com> wrote: > > Understood, thanks. Still the time spent in 'udp4_lib_lookup2' looks > > quite different/higher than what I observe in my tests. Are you using > > x86_64? if not, do you see many cache misses in udp4_lib_lookup2? > > Yes, x86_64. Here is the host's lscpu output info: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 12 > On-line CPU(s) list: 0-11 > Thread(s) per core: 1 > Core(s) per socket: 6 > CPU socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 62 > Stepping: 4 > CPU MHz: 2095.074 > BogoMIPS: 4196.28 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 15360K > NUMA node0 CPU(s): 0-5 > NUMA node1 CPU(s): 6-11 > > Btw, my guest OS is Centos 3.10.0-514.26.2.el7.x86_64, is this kernel > too old to be tested?
Understood. Yes, such kernel is a bit too old. So the perf trace you reported refer to the CentOS kernel? If you try a current vanilla kernel (or an upcoming rhel 7.5, for shameless self promotion) you should see much better figures (and a smaller differenct with your patch in) Cheers, Paolo