Hi David, Florian, Results of slowing down CPU follows bellow.
On 16-05-2018 20:01, Florian Fainelli wrote: > On 05/16/2018 11:56 AM, David Miller wrote: >> From: Jose Abreu <jose.ab...@synopsys.com> >> Date: Wed, 16 May 2018 13:50:42 +0100 >> >>> David raised some rightfull constrains about the use of indirect callbacks >>> in >>> the code. I did iperf tests with and without patches 3-12 and the >>> performance >>> remained equal. I guess for 1Gb/s and because my setup has a powerfull >>> processor these patches don't affect the performance. >> Does your cpu need Spectre v1 and v2 workarounds which cause indirect calls >> to >> be extremely expensive? > Given how widespread stmmac is within the ARM CPU's ecosystem, the > answer is more than likely yes. > > To get a better feeling of whether your indirect branches introduce a > difference, either don't run the CPU at full speed (e.g: use cpufreq to > slow it down), and/or profile the number of cycles and instruction cache > hits/miss ratio for the functions called in hot-path. It turns out my CPU has every single vulnerability detected so far :D --- # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: PTI # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Mitigation: __user pointer sanitization # cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 Vulnerable: Minimal generic ASM retpoline --- I'm not sure if workaround is active for spectre_v2 though, because it just says "vulnerable" ... Now, I'm using an 8 core Intel running @ 3.4 GHz: --- # cat /proc/cpuinfo | grep -i mhz cpu MHz : 3988.358 cpu MHz : 3991.775 cpu MHz : 3995.003 cpu MHz : 3996.003 cpu MHz : 3995.113 cpu MHz : 3996.512 cpu MHz : 3954.454 cpu MHz : 3937.402 --- So, following Florian advice I turned off 7 cores and changed CPU freq to the minimum allowed (800MHz): --- # cat /sys/bus/cpu/devices/cpu0/cpufreq/scaling_min_freq 800000 --- --- # for file in /sys/bus/cpu/devices/cpu*/cpufreq/scaling_governor; do echo userspace > $file; done # for file in /sys/bus/cpu/devices/cpu*/cpufreq/scaling_setspeed; do echo 800000 > $file; done # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > /sys/devices/system/cpu/cpu2/online # echo 0 > /sys/devices/system/cpu/cpu3/online # echo 0 > /sys/devices/system/cpu/cpu4/online # echo 0 > /sys/devices/system/cpu/cpu5/online # echo 0 > /sys/devices/system/cpu/cpu6/online # echo 0 > /sys/devices/system/cpu/cpu7/online --- --- # cat /proc/cpuinfo | grep -i mhz cpu MHz : 900.076 --- And these are the iperf results: --- *With* patches 3-12, 8xCPU @ 3.4GHz: iperf = 0.0-60.0 sec 6.62 GBytes 948 Mbits/sec 0.045 ms 37/4838564 (0.00076%) *With* patches 3-12, 1xCPU @ 800MHz: iperf = 0.0-60.0 sec 6.62 GBytes 947 Mbits/sec 0.000 ms 18/4833009 (0%) *Without* patches 3-12, 8xCPU @ 3.4GHz: iperf = 0.0-60.0 sec 6.60 GBytes 945 Mbits/sec 0.049 ms 31/4819455 (0.00064%) *Without* patches 3-12, 1xCPU @ 800MHz: iperf = 0.0-60.0 sec 6.62 GBytes 948 Mbits/sec 0.000 ms 0/4837257 (0%) --- Given that the difference between better/worst is < 1%, I think we can conclude patches 3-13 don't affect the overall performance. I didn't profile the cache hits/miss though ... Any comments? Unfortunately I don't have access to an ARM board to test this yet ... Thanks and Best Regards, Jose Miguel Abreu