[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2009-07-24 Thread whaley at cs dot utsa dot edu
--- Comment #25 from whaley at cs dot utsa dot edu 2009-07-24 17:05 --- Richard, >GCC does not assume the stack is aligned to 16 bytes if it cannot prove that >it is. If this is true now, it is a change from previous behavior. When I reported this problem, gcc *assumed* 1

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-15 Thread whaley at cs dot utsa dot edu
--- Comment #19 from whaley at cs dot utsa dot edu 2008-12-15 23:39 --- >There is the problem, LSB did the incorrect thing of thinking the written >standard applied to what really was being done when the LSB was doing its >work. Standards are made to be amended. Witness

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-15 Thread whaley at cs dot utsa dot edu
--- Comment #17 from whaley at cs dot utsa dot edu 2008-12-15 22:01 --- >LSB was written years after we had already did this back in gcc 3.0. >Please check the history before saying gcc followed a written standard >when none existed when this change was done. LSB was m

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-15 Thread whaley at cs dot utsa dot edu
--- Comment #15 from whaley at cs dot utsa dot edu 2008-12-15 21:32 --- >GCC chose to change the *unwritten* standard for the ABI in use for IA32 >GNU/Linux. This is not true. Prior to this change, gcc followed the *written* standard provided by the LSB. You chose to viola

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-15 Thread whaley at cs dot utsa dot edu
--- Comment #13 from whaley at cs dot utsa dot edu 2008-12-15 14:52 --- >No; "The nice thing about standards is that there are so many to choose from" >is a well-known saying. And also one without application here. I am aware of no other standard for Linux ABI other

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-11 Thread whaley at cs dot utsa dot edu
--- Comment #11 from whaley at cs dot utsa dot edu 2008-12-12 01:48 --- >LSB may be a starting point for plausible hypotheses about the ABIs, but >you need to evaluate it critically to see whether each statement is >actually an accurate description of fact. I.e., you are s

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-11 Thread whaley at cs dot utsa dot edu
--- Comment #8 from whaley at cs dot utsa dot edu 2008-12-12 00:51 --- >I suppose that by "32-bit ABI for the x86" you mean a document with >1990-1996 SCO copyrights. I was going by the linux standards base, which still links to: http://www.caldera.com/developers/d

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-11 Thread whaley at cs dot utsa dot edu
--- Comment #6 from whaley at cs dot utsa dot edu 2008-12-11 23:42 --- >GCC can and will realign the loop in 4.4 and above if the function needs a >bigger alignment than the required 4 byte. So again I don't see any issues >here really. Is this the response to another

[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-11 Thread whaley at cs dot utsa dot edu
--- Comment #4 from whaley at cs dot utsa dot edu 2008-12-11 23:25 --- >aligning the stack to 16 bytes is complaint It might be complaint, but it certainly isn't compliant. The ABI says that you can assume 4-byte alignment, and not all 4-byte alignments are 16-byte aligned (o

[Bug fortran/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-11 Thread whaley at cs dot utsa dot edu
--- Comment #1 from whaley at cs dot utsa dot edu 2008-12-11 23:01 --- Created an attachment (id=16893) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16893&action=view) tarfile demonstrating problem tarfile containing Makefile, align.f, and printvp.c that can show this

[Bug fortran/38496] New: Gcc misaligns arrays when stack is forced follow the x8632 ABI

2008-12-11 Thread whaley at cs dot utsa dot edu
x8632 ABI Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http

[Bug target/32523] disastrous scheduling for POWER5

2007-06-28 Thread whaley at cs dot utsa dot edu
--- Comment #8 from whaley at cs dot utsa dot edu 2007-06-28 14:18 --- I've been doing further testing on the g5 (the only machine where I have local and root access), and this problem does not occur with stock gcc 4.1.1 either. Therefore, whatever problem is avoided by throwing

[Bug target/32523] disastrous scheduling for POWER5

2007-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #7 from whaley at cs dot utsa dot edu 2007-06-28 05:25 --- This problem affects the g5/970 as well: Darwin. uname -a Darwin etl-g52.cs.utsa.edu 8.10.0 Darwin Kernel Version 8.10.0: Wed May 23 16:50:59 PDT 2007; root:xnu-792.21.3~1/RELEASE_PPC Power Macintosh powerpc Darwin

[Bug target/32524] unable to build 4.2 on OS X G5

2007-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #2 from whaley at cs dot utsa dot edu 2007-06-28 05:23 --- Fixed, thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32524

[Bug target/32523] disastrous scheduling for POWER5

2007-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #6 from whaley at cs dot utsa dot edu 2007-06-27 19:09 --- Andrew, OK, I installed stock gnu gcc 3.4.6: 78n04 TEST/MMBENCH_PPC> ~/local/gcc-3.4.6/bin/gcc -v Reading specs from /u/noibm122/local/gcc-3.4.6/lib/gcc/powerpc64-unknown-linux-gnu/3.4.6/specs Configured w

[Bug c/32523] disastrous scheduling for POWER5

2007-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #4 from whaley at cs dot utsa dot edu 2007-06-27 17:00 --- Andrew, >PowerPC970FX is not a direct descendent of Power5 Sorry, completely misremembered this. Since Power4 didn't suffer as bad as Power5 (I think it lost maybe 10% rather than 50), maybe the 970 will

[Bug c/32524] New: unable to build 4.2 on OS X G5

2007-06-27 Thread whaley at cs dot utsa dot edu
uild 4.2 on OS X G5 Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32524

[Bug c/32523] disastrous scheduling for POWER5

2007-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #1 from whaley at cs dot utsa dot edu 2007-06-27 16:21 --- Created an attachment (id=13794) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13794&action=view) Makefile and source demonstrating problem Creates directory MMBENCH_PPC. Edit the Makefile and set G

[Bug c/32523] New: disastrous scheduling for POWER5

2007-06-27 Thread whaley at cs dot utsa dot edu
you need. Cheers, Clint -- Summary: disastrous scheduling for POWER5 Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org

[Bug target/30599] long double declaration rounds to double instead

2007-06-20 Thread whaley at cs dot utsa dot edu
--- Comment #6 from whaley at cs dot utsa dot edu 2007-06-20 23:17 --- Anybody have enough __asm__ foo to write a inline assembly macro taking a long double operand and returning one, which I can use to call fsqrt directly in inline assembly? I'm scoping the docs, but have never

[Bug target/30599] long double declaration rounds to double instead

2007-06-20 Thread whaley at cs dot utsa dot edu
--- Comment #5 from whaley at cs dot utsa dot edu 2007-06-20 22:17 --- It may be C99, but since it doesn't work on 90% of the machines in the world, it is a bit of a stretch to call it portable. My point is no standard mandates you round down a long double (where you don't ro

[Bug target/30599] long double declaration rounds to double instead

2007-06-20 Thread whaley at cs dot utsa dot edu
--- Comment #3 from whaley at cs dot utsa dot edu 2007-06-20 21:52 --- Turns out the proposed solution of using sqrtl is not portable. In particular, all code using it fails to link on Windows using cygwin. Any idea how to make this work portably? I still don't understand why

[Bug rtl-optimization/323] optimized code gives strange floating point results

2007-03-09 Thread whaley at cs dot utsa dot edu
--- Comment #92 from whaley at cs dot utsa dot edu 2007-03-09 20:22 --- I'd like to welcome the newest members of the bug 323 community, where all x87 floating point errors in gcc come to die! All floating point errors that use the x87 are welcome, despite the fact that many of

[Bug target/30599] long double declaration rounds to double instead

2007-01-26 Thread whaley at cs dot utsa dot edu
--- Comment #1 from whaley at cs dot utsa dot edu 2007-01-26 16:21 --- Created an attachment (id=12963) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12963&action=view) Can be compiled to .s as described in report to duplicate error -- http://gcc.gnu.org/bugzilla/show_

[Bug target/30599] New: long double declaration rounds to double instead

2007-01-26 Thread whaley at cs dot utsa dot edu
: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30599

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64

2006-12-19 Thread whaley at cs dot utsa dot edu
--- Comment #10 from whaley at cs dot utsa dot edu 2006-12-19 17:18 --- Guys, In the interests of full disclosure, I did some quick timings on the Core2Duo, and as I kind of suspected, scalar SSE crushed x87 there. I was pretty sure scalar SSE could achieve 2 flop/cycle, while Intel

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64

2006-12-19 Thread whaley at cs dot utsa dot edu
--- Comment #9 from whaley at cs dot utsa dot edu 2006-12-19 16:04 --- Ian, Thanks for the info. I see I failed to consider the cross-register moves you mentioned. However, can't those be moved through memory, where something destined for a 64-bit register is first written fro

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64

2006-12-18 Thread whaley at cs dot utsa dot edu
--- Comment #7 from whaley at cs dot utsa dot edu 2006-12-19 00:31 --- >Depends on what you mean by fixable by the programmer because most people don't know anything about precusion issues. Most people don't know programming at all, so I guess you are suggesting that er

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64

2006-12-18 Thread whaley at cs dot utsa dot edu
--- Comment #5 from whaley at cs dot utsa dot edu 2006-12-18 22:14 --- I cannot, of course, force you to admit it, but 323 is a bug fixable by the programmer, and this one is not. The other requires a lot of work in the compiler, and this does not. So, viewing them as the same can be

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64

2006-12-18 Thread whaley at cs dot utsa dot edu
--- Comment #3 from whaley at cs dot utsa dot edu 2006-12-18 21:16 --- BTW, in case it isn't obvious, here's the fix that I typically use for problems like bug 323 that I cannot when it is gcc itself that is unpredictably spilling the computation: void test(double x

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64

2006-12-18 Thread whaley at cs dot utsa dot edu
--- Comment #2 from whaley at cs dot utsa dot edu 2006-12-18 20:43 --- Hi, While it may be decided not to fix this problem, this is not a duplicate of bug 323, and so it should be closed for another reason if you want to ignore it. 323 has a problem because of the function call, where

[Bug target/30255] New: register spills in x87 unit need to be 80-bit, not 64

2006-12-18 Thread whaley at cs dot utsa dot edu
Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255

[Bug middle-end/28684] Imprecise -funsafe-math-optimizations definition

2006-08-17 Thread whaley at cs dot utsa dot edu
--- Comment #2 from whaley at cs dot utsa dot edu 2006-08-17 14:17 --- Richard, Thanks for confirmation. There's no chance of this happening soon, I guess? I'm working on a release of ATLAS (fast linear algebra), and I can't enable gcc vectorization until its nece

[Bug target/27827] [4.0 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-11 Thread whaley at cs dot utsa dot edu
--- Comment #67 from whaley at cs dot utsa dot edu 2006-08-11 15:22 --- Uros, >Slightly offtopic, but to put some numbers to comment #8 and comment #11, >equivalent SSE code now reaches only 50% of x87 single performance and 60% of >x87 double performance on AMD x86_64 FYI,

[Bug c/28684] New: Imprecise -funsafe-math-optimizations definition

2006-08-10 Thread whaley at cs dot utsa dot edu
FIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28684

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-10 Thread whaley at cs dot utsa dot edu
--- Comment #62 from whaley at cs dot utsa dot edu 2006-08-10 15:15 --- Paolo, >The IEEE standard mandates particular rules for performing operations on >infinities, NaNs, signed zeros, denormals, ... The C standard, by >mandating no reassociation, ensures that you don&#x

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-10 Thread whaley at cs dot utsa dot edu
--- Comment #60 from whaley at cs dot utsa dot edu 2006-08-10 14:08 --- Paolo, Thanks for the explanation of what -funsafe is presently doing. >You are also confusing -funsafe-math-optimizations with -ffast-math. No, what I'm doing is reading the man page (the closest th

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-09 Thread whaley at cs dot utsa dot edu
--- Comment #58 from whaley at cs dot utsa dot edu 2006-08-09 23:01 --- Andrew, >Except for the fact IEEE compliant fp does not allow for reordering at all >except >in some small cases. For an example is (a + b) + (-a) is not the same as (a + >(-a)) + b, >so reorderi

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-09 Thread whaley at cs dot utsa dot edu
--- Comment #56 from whaley at cs dot utsa dot edu 2006-08-09 21:33 --- Dorit, >This flag is needed in order to allow vectorization of reduction (summation >in your case) of floating-point data. OK, but this is a bd flag to require. From the computational scientist'

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-09 Thread whaley at cs dot utsa dot edu
--- Comment #54 from whaley at cs dot utsa dot edu 2006-08-09 16:08 --- Dorit, OK, I've posted a new tarfile with a safe kernel code where the loop is not unrolled, so that the vectorizer has a chance. With this kernel, I can make it vectorize code, but only if I throw the -fu

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-09 Thread whaley at cs dot utsa dot edu
--- Comment #53 from whaley at cs dot utsa dot edu 2006-08-09 15:52 --- Created an attachment (id=12047) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12047&action=view) benchmark wt vectorizable kernel -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-09 Thread whaley at cs dot utsa dot edu
--- Comment #52 from whaley at cs dot utsa dot edu 2006-08-09 14:33 --- Paolo, >In some sense, this is the peephole I would rather *not* do. But the answer >is yes. :-) Ahh, got it :) >So, do you now agree that the bug would be fixed if the patch that is in GCC >4.2 wa

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-08 Thread whaley at cs dot utsa dot edu
--- Comment #50 from whaley at cs dot utsa dot edu 2006-08-08 18:36 --- Guys, I've been scoping this a little closer on the Athlon64X2. I have found that the patched gcc can achieve as much as 93% of theoretical peak (5218Mflop on a 2800Mhz Athlon64X2!) for in-cache matmul whe

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-08 Thread whaley at cs dot utsa dot edu
--- Comment #49 from whaley at cs dot utsa dot edu 2006-08-08 16:43 --- Paolo, >Yes, so far so good and this part has already been committed. But does >a *single* load-and-execute instruction execute faster than the two >instructions in a load+execute sequence? As I said, i

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread whaley at cs dot utsa dot edu
--- Comment #45 from whaley at cs dot utsa dot edu 2006-08-08 02:59 --- Guys, OK, with Dorit's -fdump-tree-vect-details, I made a little progress on vectorization. In order to get vectorization to work, I had to add the flag '-funsafe-math-optimizations'. I will

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread whaley at cs dot utsa dot edu
--- Comment #44 from whaley at cs dot utsa dot edu 2006-08-07 21:56 --- Guys, OK, the mystery of why my hand-patched gcc didn't work is now cleared up. My first clue was that neither did the SVN-build gcc! Turns out, your peephole opt is only done if I throw the flag -O3 rather

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread whaley at cs dot utsa dot edu
--- Comment #41 from whaley at cs dot utsa dot edu 2006-08-07 17:19 --- Paolo, >Actually, the peephole phase may not change the register usage, but it >could peruse a scratch register if available. But it would be much more >controversial (even if backed by your hard numbers

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread whaley at cs dot utsa dot edu
--- Comment #39 from whaley at cs dot utsa dot edu 2006-08-07 16:47 --- Paolo, OK, never mind about all the questions on assembly/patches/SVN/gcc3 perf: I checked out the main branch, and vi'd the patched file, and I see that your patch is there. I am presently building the SVN g

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread whaley at cs dot utsa dot edu
--- Comment #38 from whaley at cs dot utsa dot edu 2006-08-07 15:32 --- Paolo, Thanks for all the help. I'm not sure I understand everything perfectly though, so there's some questions below . . . >I don't see how the last fmul[sl] can be removed without increasing

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-06 Thread whaley at cs dot utsa dot edu
--- Comment #36 from whaley at cs dot utsa dot edu 2006-08-06 15:03 --- Paola, Thanks for working on this. We are making progres, but I have some mixed results. I timed the assemblies you provided directly. I added a target "asgexe" that builds the same benchmark, assumin

[Bug target/27827] [4.0/4.1/4.2 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-05 Thread whaley at cs dot utsa dot edu
--- Comment #35 from whaley at cs dot utsa dot edu 2006-08-05 18:26 --- Created an attachment (id=12020) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12020&action=view) new Makefile targets OK, this is same benchmark again, now creating MMBENCHS directory. In addition

[Bug target/27827] [4.0/4.1/4.2 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-05 Thread whaley at cs dot utsa dot edu
--- Comment #33 from whaley at cs dot utsa dot edu 2006-08-05 14:24 --- Paolo, Can you post the assembly and the patch as attachments? If necessary, I can hack the benchmark to call the assembly routines on a couple of platforms. Also, did you see what I did wrong in applying the

[Bug target/27827] [4.0/4.1/4.2 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-04 Thread whaley at cs dot utsa dot edu
--- Comment #31 from whaley at cs dot utsa dot edu 2006-08-04 16:24 --- Paolo, Thanks for the update. I attempted to apply this patch, but apparantly I failed, as it made absolutely no difference. I mean, not only did it not change performance, but if you diff the assembly, you get

[Bug c/28519] New: unkillable, nonstandard warning

2006-07-27 Thread whaley at cs dot utsa dot edu
Status: UNCONFIRMED Severity: trivial Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28519

[Bug target/27827] [4.0/4.1/4.2 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-07-04 Thread whaley at cs dot utsa dot edu
--- Comment #29 from whaley at cs dot utsa dot edu 2006-07-04 13:15 --- Guys, The integer and fp differences do not appear to be strongly related. In particular, on my P4e, gcc 4's integer code is actually faster than gcc 3's. Further, if you look at the assemblies of t

[Bug target/27827] [4.0/4.1/4.2 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-28 Thread whaley at cs dot utsa dot edu
--- Comment #28 from whaley at cs dot utsa dot edu 2006-06-29 04:17 --- Guys, If you are looking for the reason that the new code might be slower, my feeling from the benchmark data is that involves hiding the cost of the loads. Notice that, except for the cases where the double

[Bug target/27827] [4.0/4.1/4.2 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-28 Thread whaley at cs dot utsa dot edu
--- Comment #26 from whaley at cs dot utsa dot edu 2006-06-28 19:57 --- Created an attachment (id=11773) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11773&action=view) raw runs table is generated from As promised, here is the raw data I built the table out of, includin

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #24 from whaley at cs dot utsa dot edu 2006-06-27 16:44 --- Guys, OK, here is a table summarizing the performance you can see using the mmbench4s.tar.gz. I believe this covers a strong majority of the x86 architectures in use today (there are some specialty processors such

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-27 Thread whaley at cs dot utsa dot edu
--- Comment #23 from whaley at cs dot utsa dot edu 2006-06-27 14:20 --- Uros, OK, I made the stupid assumption that the P4 would behave like the P4e, should've known better :) I got access to a Pentium 4 (family=15, model=2), and indeed I can repeat the several surprising thing

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-26 Thread whaley at cs dot utsa dot edu
--- Comment #21 from whaley at cs dot utsa dot edu 2006-06-26 15:03 --- Uros, Thanks for the reply; I think some confusion has set in (see below) :) >And the results are a bit suprising (this is the exact output of your test): Note that you are running the opposite of my test c

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-25 Thread whaley at cs dot utsa dot edu
--- Comment #19 from whaley at cs dot utsa dot edu 2006-06-26 00:55 --- Thanks for the info. I'm sorry to hear that no performance regression tests are done, but I guess it kind of explains why these problems reoccur :) As to not unrolling, the fully unrolled case is almost a

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-25 Thread whaley at cs dot utsa dot edu
--- Comment #17 from whaley at cs dot utsa dot edu 2006-06-25 13:17 --- OK, thanks for the reply. I will assume gcc 4 won't be fixed in the near future. My guess is this will make icc an easier compiler for users, which I kind of hate, which is why I worked as much as I did on

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-24 Thread whaley at cs dot utsa dot edu
--- Comment #15 from whaley at cs dot utsa dot edu 2006-06-24 18:10 --- Hi, Can someone tell me if anyone is looking into this problem with the hopes of fixing it? I just noticed that despite the posted code demonstrating the problem, and verification on: Pentium Pro, Pentium III

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-13 Thread whaley at cs dot utsa dot edu
--- Comment #14 from whaley at cs dot utsa dot edu 2006-06-14 02:40 --- OK, I got access to some older machines, and it appears that Core is the only architecture that likes gcc 4's code. More precisely, I have confirmed that the following architectures run significantly slower

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-07 Thread whaley at cs dot utsa dot edu
--- Comment #13 from whaley at cs dot utsa dot edu 2006-06-07 22:28 --- Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3 Guys, Just got access to a CoreDuo machine, and tested things there. I had to do some hand-translation of assemblies, as I didn't

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-01 Thread whaley at cs dot utsa dot edu
--- Comment #12 from whaley at cs dot utsa dot edu 2006-06-01 18:43 --- Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3 Uros, >gcc version 3.4.6 >vs. >gcc version 4.2.0 20060601 (experimental) > >-fomit-frame-pointer -O -msse2 -mfpmath=sse &g

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-01 Thread whaley at cs dot utsa dot edu
--- Comment #11 from whaley at cs dot utsa dot edu 2006-06-01 16:26 --- Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3 Uros, OK, I originally replied a couple of hours ago, but that is not appearing on bugzilla for some reason, so I'll try again, this

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-06-01 Thread whaley at cs dot utsa dot edu
--- Comment #10 from whaley at cs dot utsa dot edu 2006-06-01 16:02 --- Created an attachment (id=11571) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11571&action=view) Same benchmark, but with single precision timing included Here's the same benchmark, but can ti

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-05-31 Thread whaley at cs dot utsa dot edu
--- Comment #8 from whaley at cs dot utsa dot edu 2006-05-31 14:12 --- Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3 Uros, >IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure >luck. As far as understanding from first prin

[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-05-30 Thread whaley at cs dot utsa dot edu
--- Comment #6 from whaley at cs dot utsa dot edu 2006-05-31 01:09 --- Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3 Yes, I agree it is an x86/x86_64 issue. I have not yet scoped the performance of any of the other architectures with gcc 4 vs. 3: since 90% of