[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #20 from Jeff Garzik 2010-12-18 21:25:46 UTC --- (In reply to comment #16) > I don't think it is a good idea to change inliner heuristics in 4.5 at this > point. If it is always a good idea to inline that function, it should be > __a

[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #19 from Jeff Garzik 2010-12-18 21:17:09 UTC --- (In reply to comment #14) > Created attachment 22813 [details] > A new patch > > Try this. This patch successfully fixes the performance regression in 4.5.1. Thanks!

[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #18 from Jeff Garzik 2010-12-18 21:16:31 UTC --- argh, please ignore comment #17. misquote.

[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #17 from Jeff Garzik 2010-12-18 21:15:28 UTC --- (In reply to comment #8) > -if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_MD) > +/* Do not special case builtins where we see the body. > + This just confuse inliner.

[Bug target/47000] Major performance regression in parallel SSE2 impl of SHA256 hash algorithm

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #12 from Jeff Garzik 2010-12-18 19:09:25 UTC --- Any other patches for me to try, for gcc 4.5.1?

[Bug target/47000] Major performance regression in parallel SSE2 impl of SHA256 hash algorithm

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #11 from Jeff Garzik 2010-12-18 19:08:45 UTC --- (In reply to comment #4) > GCC 4.6 (trunk revision 167996) also inlines ROTR. Is it possible for the > reporter to measure the number of k-iters with a recent snapshot of the trunk? La

[Bug target/47000] Major performance regression in parallel SSE2 impl of SHA256 hash algorithm

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #10 from Jeff Garzik 2010-12-18 18:49:35 UTC --- (In reply to comment #8) > Can you try > > -- > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index af1adf4..dd00de6 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c >

[Bug target/47000] Major performance regression in parallel SSE2 impl of SHA256 hash algorithm

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #9 from Jeff Garzik 2010-12-18 18:24:09 UTC --- (In reply to comment #2) > What compiler options are you using? Pretty basic: -O3 -Wall -msse2 -g Sometimes -O3 -Wall -g -march=native, on a quad core Intel box Results are the same:

[Bug c/47000] Major performance regression in parallel SSE2 impl of SHA256 hash algorithm

2010-12-17 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #1 from Jeff Garzik 2010-12-18 07:48:23 UTC --- Besides the attached sha256_4way.i, the full source code is at http://yyz.us/bitcoin/cpuminer-0.2.2.tar.gz It's really quite small and easy to build and use. A sample RPC destination,

[Bug c/47000] New: Major performance regression in parallel SSE2 impl of SHA256 hash algorithm

2010-12-17 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 Summary: Major performance regression in parallel SSE2 impl of SHA256 hash algorithm Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: major Priority: P3