http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

           Summary: Major performance regression in parallel SSE2 impl of
                    SHA256 hash algorithm
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: jgar...@pobox.com


Created attachment 22805
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22805
4-way SHA256 implementation, whose performance decreases markedly 4.4.x ->
4.5.x

OS: Fedora 14

My "cpuminer" open source project is -very- sensitive to performance of
generated code, and experiences a severe performance regression going from gcc
4.4.x to 4.5.x.

Our program core is essentially
     for (n = 0; n < 0xffffff; n++)
          sha256( sha256( data ) )      /* one iteration of inner loop */

Building with gcc 4.4.5 -or- Fedora 13 gcc (4.4.x derivative), we achieve
     1850.85 kilo-iterations per second

Building with gcc 4.5.1 -or- Fedora 14 gcc (4.5.x derivative), we achieve
     1389.82 kilo-iterations per second

This is a significant performance decrease, and the only variable is the
compiler.  I have presented x86_64 data below, but similar slowdowns are seen
on i686-mingw in Fedora 13 (fast gcc 4.4.x) or Fedora 14 (slow gcc 4.5.x).

This interesting variant of the standard SHA256 algorithm is implemented using
Intel/AMD SSE2-specific operations, effectively running four (4) SHA256
iterations in parallel, generating four (4) SHA256 hashes on four distinct
datasets.

See attachment sha256_4way.i.

--------------------------------------------------------------------------
fast, working gcc -v:
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/gcc-4.4.5/configure --prefix=/garz/gcc44
--enable-languages=c
Thread model: posix
gcc version 4.4.5 (GCC) 

--------------------------------------------------------------------------
slow, broken gcc -v:
Using built-in specs.
COLLECT_GCC=/garz/gcc45/bin/gcc
COLLECT_LTO_WRAPPER=/garz/gcc45/libexec/gcc/x86_64-unknown-linux-gnu/4.5.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../src/gcc-4.5.1/configure --prefix=/garz/gcc45
--enable-languages=c
Thread model: posix
gcc version 4.5.1 (GCC)

Reply via email to