Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks
On 2018-07-25 06:57:42 [+], Vakul Garg wrote: > I tested this patch. It helped but didn't regain the performance to previous > level. > Are there more files remaining to be fixed? (In your original patch series > for adding > preemptability check, there were lot more files changed than this series with > 4 files). > > Instead of using hardcoded 32 block/16 block limit, should it be controlled > using Kconfig? > I believe that on different cores, these values could be required to be > different. What about PREEMPT_NONE (server)? Sebastian
Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks
On 2018-07-25 07:04:55 [+], Vakul Garg wrote: > > > > What about PREEMPT_NONE (server)? > > Why not have best of both the worlds :) the NEON code gets interrupted because another tasks wants to schedule and the scheduler allows. With "low latency desktop" this gets right done away. The lower levels won't schedule so fast. So if you seek for performance, the lower level should give you more. If you seek for low latency… Sebastian
Re: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks
On 2018-07-25 11:54:53 [+0200], Ard Biesheuvel wrote: > Indeed. OTOH, if the -rt people (Sebastian?) turn up and say that a > 1000 cycle limit to the quantum of work performed with preemption > disabled is unreasonably low, we can increase the yield block counts > and approach the optimal numbers a bit closer. But with diminishing > returns. So I tested on SoftIron Overdrive 1000 which has A57 cores. I added this series and didn't notice any spikes. This means cyclictest reported a max value of like ~20us (which means the crypto code was not noticeable). I played a little with it and tcrypt tests for aes/sha1 and also no huge spikes. So at this point this looks fantastic. I also setup cryptsetup / dm-crypt with the usual xts(aes) mode and saw no spikes. At this point, on this hardware if you want to raise the block count, I wouldn't mind. I remember on x86 the SIMD accelerated ciphers led to ~1ms+ spikes once dm-crypt started its jobs. Sebastian
Re: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks
On 2018-07-26 09:25:40 [+0200], Ard Biesheuvel wrote: > Thanks a lot. > > So 20 us ~= 20,000 cycles on my 1 GHz Cortex-A53, and if I am > understanding you correctly, you wouldn't mind the quantum of work to > be in the order 16,000 cycles or even substantially more? I have currently that one box and it does not seem to be a problem. So it reports now on idle around 20us max. So if add "only" 20us to NEON / your preempt-disable section then we may end up at 20+20 = 40us. At this point I am not sure how "bad" it is. It works, it does not seem that much and you can disable it if you don't want the extra 20us here. > That is good news, but it is also rather interesting, given that these > algorithms run at ~4 cycles per byte, meaning that you'd manage an > entire 4 KB page without ever yielding. (GCM is used on network > packets, XTS on disk sectors which are all smaller than that) > > Do you remember how you found out NEON use is a problem for -rt on > arm64 in the first place? Which algorithm did you test at the time to > arrive at this conclusion? I *think* that yield got in there by chance. The main problem was back at the time that within the neon begin/end section there was the scatter list walk. That walk may invoke kmap() / kmalloc() / kfree() and is not allowed on RT within a preempt-disable section. This was my main concern. > Note that AES-GCM using ordinary SIMD instructions runs at 29 cpb, and > plain AES at ~20 (on A53), so perhaps it would make sense to > distinguish between algos using crypto instructions and ones using > plain SIMD. I was looking at AES-CE and AES-NEON (aes-neon-blk / aes_ce_blk) with modprobe tcrypt mode=200 sec=1 and mode=403 +404 for the sha1/256 test. Sebastian