On Sun, May 6, 2018 at 6:01 PM, Bruno Haible <br...@clisp.org> wrote: > Now, here's a draft patch for adding support for AF_ALG also for the > sha1_buffer etc. functions. > > But I have a problem here: On 4 different systems, I don't get a speedup > from this patch. > > To benchmark it, I use this set of commands: > > $ ./gnulib-tool --create-testdir --dir=testdir --single-configure --symlink > crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 > $ cd testdir > $ mkdir without; (cd without; ../configure CPPFLAGS=-Wall CFLAGS=-O2 > --without-linux-crypto; make && make check) > $ mkdir with; (cd with; ../configure CPPFLAGS=-Wall CFLAGS=-O2 > --with-linux-crypto; make && make check) > > $ without/gltests/bench-md5 100 1000000 > real 0.391257 > user 0.388 > sys 0.004 > $ with/gltests/bench-md5 100 1000000 > real 9.800789 > user 1.088 > sys 8.648 > $ without/gltests/bench-md5 1000 100000 > real 0.289286 > user 0.288 > sys 0.000 > $ with/gltests/bench-md5 1000 100000 > real 1.220016 > user 0.104 > sys 1.116 > $ without/gltests/bench-md5 10000 10000 > real 0.270131 > user 0.268 > sys 0.000 > $ with/gltests/bench-md5 10000 10000 > real 0.375399 > user 0.020 > sys 0.352 > $ without/gltests/bench-md5 100000 1000 > real 0.280091 > user 0.276 > sys 0.000 > $ with/gltests/bench-md5 100000 1000 > real 0.295650 > user 0.000 > sys 0.292 > $ without/gltests/bench-md5 100000 1000 > real 0.276514 > user 0.276 > sys 0.000 > $ with/gltests/bench-md5 100000 1000 > real 0.292350 > user 0.000 > sys 0.292 > $ without/gltests/bench-md5 1000000 100 > real 0.261845 > user 0.260 > sys 0.004 > $ with/gltests/bench-md5 1000000 100 > real 0.265650 > user 0.000 > sys 0.260 > [and similarly for sha1 etc.] > > Tested this on > - Intel Xeon X5450 > - Intel Xeon E5-2603 v3 > - Intel Core i7-2600 > - Intel Core m3-6Y30 > On all four, no speedup is visible. > > On machines without crypto instructions or crypto devices, I would expect > that > - sha1_stream gets slightly faster with than without linux-crypto > (because the copy of data from the file to user-space is optimized away). > - sha1_buffer is slightly slower with than without linux-crypto > (because of the overhead of copying the data from user to kernel space). > > Whereas on machines with crypto instructions or crypto devices, I would > expect a significant benefit for both functions. > > You showed us significant benefits for sha1_stream, whereas I see no benefit > for sha1_buffer. How is this possible? > > In <https://en.wikipedia.org/wiki/AES_instruction_set> I read that there are > specialized instructions for AES. Does it mean that there are NO specialized > instructions for MD5, SHA-1, SHA-224 ... SHA-512? In this case, all the work > we have done is futile for Intel CPUs and only beneficial for embedded CPUs?? > > Can you try this comparison on the Intel Xeon you have access to, please? > > Bruno
Hi Bruno, I've checked out latest gnulib, and after double checking that commit 761523ddea70f0456b556c09868910686751fff5 was there I ran this: matteo@turbo:~/src/gnulib/testdir$ strace -e trace=%network with/gltests/bench-md5 10000000 100 real 1.138617 user 1.139 sys 0.000 +++ exited with 0 +++ matteo@turbo:~/src/gnulib/testdir$ strace -e trace=%network with/gltests/bench-sha1 10000000 100 real 1.259929 user 1.260 sys 0.000 +++ exited with 0 +++ It seems that kernel API are not used in this test, or I'm running them the wrong way? -- Matteo Croce per aspera ad upstream