On 8/4/20 5:55 AM, Ard Biesheuvel wrote:
On Mon, 3 Aug 2020 at 21:11, Ben Greear <[email protected]> wrote:Hello, This helps a bit...now download sw-crypt performance is about 150Mbps, but still not as good as with my patch on 5.4 kernel, and fpu is still high in perf top: 13.89% libc-2.29.so [.] __memset_sse2_unaligned_erms 6.62% [kernel] [k] kernel_fpu_begin 4.14% [kernel] [k] _aesni_enc1 2.06% [kernel] [k] __crypto_xor 1.95% [kernel] [k] copy_user_generic_string 1.93% libjvm.so [.] SpinPause 1.01% [kernel] [k] aesni_encrypt 0.98% [kernel] [k] crypto_ctr_crypt 0.93% [kernel] [k] udp_sendmsg 0.78% [kernel] [k] crypto_inc 0.74% [kernel] [k] __ip_append_data.isra.53 0.65% [kernel] [k] aesni_cbc_enc 0.64% [kernel] [k] __dev_queue_xmit 0.62% [kernel] [k] ipt_do_table 0.62% [kernel] [k] igb_xmit_frame_ring 0.59% [kernel] [k] ip_route_output_key_hash_rcu 0.57% [kernel] [k] memcpy 0.57% libjvm.so [.] InstanceKlass::oop_follow_contents 0.56% [kernel] [k] irq_fpu_usable 0.56% [kernel] [k] mac_do_update If you'd like help setting up a test rig and have an ath10k pcie NIC or ath9k pcie NIC, then I can help. Possibly hwsim would also be a good test case, but I have not tried that.I don't think this is likely to be reproducible on other micro-architectures, so setting up a test rig is unlikely to help. I'll send out a v2 which implements a ahash instead of a shash (and implements some other tweaks) so that kernel_fpu_begin() is only called twice for each packet on the cbcmac path. Do you have any numbers for the old kernel without your patch? This pathological FPU preserve/restore behavior could be caused be the optimizations, or by other changes that landed in the meantime, so I would like to know if kernel_fpu_begin() is as prominent in those traces as well.
This same patch makes i7 mobile processors able to handle 1Gbps+ software decrypt rates, where without the patch, the rate was badly constrained and CPU load was much higher, so it is definitely noticeable on other processors too. The weak processor on the current test rig is convenient because the problem is so noticeable even at slower wifi speeds. We can do some tests on 5.4 with our patch reverted. Thanks, Ben -- Ben Greear <[email protected]> Candela Technologies Inc http://www.candelatech.com
