http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
Bill Pringlemeir <bpringlemeir at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bpringlemeir at gmail dot | |com --- Comment #12 from Bill Pringlemeir <bpringlemeir at gmail dot com> 2013-04-20 15:47:36 UTC --- (In reply to comment #11) > Note that using -O3 for embedded targets isn't recommended; use -Os instead. In this case the code is computationally intensive. It doesn't make sense to compile with '-Os' for cryptographic algorithms. However, I think that a performance increase can be achieved by working with gcc. I have worked on an ARM project where two different developers choose 'TomsFastMath' and 'libgcrypt' as a crypto-base. It seems that 'libgcrypt' was performing better on the ARM. I believe this is because it used 'gcc' inline assembler to map op-codes not available in 'C'. Gcc's inline assembler is very nice as you don't have to do register allocation and all the other nice things that 'gcc' does for us. http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/longlong.h;hb=HEAD The use of the carry bit for multi-precision arithmetic gives a large advantage for algorithms such as RSA cites as being worse with ARMcc versus 'gcc' on the ARM. For the original issue which the bug was filed (x86 sha), I can understand your frustration. I also tried to expand the SHA to handle 64 bits at a time as you have done with MMX ('__builtin_ia32_pslld', etc). It is difficult to get this to work with 'gcc'; I only had a 30% speed up versus 32bit versions.