On Thu, 1 Jun 2023 at 14:33, Ard Biesheuvel <[email protected]> wrote: > > Use the AArch64 PMULL{2}.P64 instructions to implement PCLMULQDQ instead > of emulating them in C code if the host supports this. This is used in > the implementation of GCM, which is widely used in IPsec VPN and HTTPS. > > Somewhat surprising results: on my ThunderX2, enabling this on top of > the AES acceleration I sent out earlier, the speedup is substantial. > > (1420 is a typical IPsec block size - in HTTPS, GCM operates on much > larger block sizes but the kernel mode benchmarks are not the best place > to measure its performance in this mode) > > tcrypt: testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption > > No acceleration > tcrypt: test 5 (160 bit key, 1420 byte blocks): 10046 operations in 1 seconds > (14265320 bytes) > > AES acceleration > tcrypt: test 5 (160 bit key, 1420 byte blocks): 13970 operations in 1 seconds > (19837400 bytes) > > AES + PMULL acceleration > tcrypt: test 5 (160 bit key, 1420 byte blocks): 24372 operations in 1 seconds > (34608240 bytes) >
User space benchmark (using OS's qemu-x86_64 vs one built with these changes applied) Speedup is about 5x ard@gambale:~/build/openssl$ apps/openssl speed -evp aes-128-gcm Doing AES-128-GCM for 3s on 16 size blocks: 1692138 AES-128-GCM's in 2.98s Doing AES-128-GCM for 3s on 64 size blocks: 665012 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 256 size blocks: 203784 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 1024 size blocks: 49397 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 8192 size blocks: 6447 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 16384 size blocks: 3058 AES-128-GCM's in 3.00s version: 3.2.0-dev built on: Thu Jun 1 17:06:09 2023 UTC options: bn(64,64) compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-GCM 9085.30k 14186.92k 17389.57k 16860.84k 17604.61k 16700.76k ard@gambale:~/build/openssl$ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-gcm Doing AES-128-GCM for 3s on 16 size blocks: 2703271 AES-128-GCM's in 2.99s Doing AES-128-GCM for 3s on 64 size blocks: 1537884 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 256 size blocks: 653008 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 1024 size blocks: 203579 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 8192 size blocks: 29020 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 16384 size blocks: 14716 AES-128-GCM's in 2.99s version: 3.2.0-dev built on: Thu Jun 1 17:06:09 2023 UTC options: bn(64,64) compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-GCM 14465.66k 32808.19k 55723.35k 69488.30k 79243.95k 80637.77k
