--- Comment #8 from Joey dot ye at intel dot com 2008-11-21 12:00 ---
In short, set A={-favx, -ffma}, set B={-f3dnow, -f3dnowa, -fsse4a, -fsse5}. Any
option combination from both sets should be prohibited.
Please add more options into these set in case I missed any.
--
http
--- Comment #4 from Joey dot ye at intel dot com 2008-11-28 03:39 ---
142250 doesn't fix this regression. 416.gamess and 481.wrf still fail.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
--- Comment #6 from Joey dot ye at intel dot com 2008-11-28 15:11 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01428.html fixed this
regression.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
--- Comment #8 from Joey dot ye at intel dot com 2008-12-01 02:18 ---
Yes. It fixes 416/481 on 32 bits and 481 on 64 bits.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
--- Comment #12 from Joey dot ye at intel dot com 2008-12-10 03:01 ---
Fixed at trunk 142631
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
--- Comment #45 from Joey dot ye at intel dot com 2008-12-30 01:49 ---
(In reply to comment #44)
> Does anyone have new numbers?
Fixed on both i386/x86_64:
x86_64:
4.4 (trunk 142847): 5.4s
4.3.2 release: 5.4s
4.2.4 release: 5.4s
i386:
4.4 (trunk 142847): 2.7s
4.3.2 rele
--- Comment #6 from Joey dot ye at intel dot com 2008-12-30 02:50 ---
(In reply to comment #4)
> Revision 141860 caused 30% slowdown on 454.calculix in SPEC CPU 2006
> with -O2 -ffast-math on Linux/Intel64.
This regression has been fixed in some revision between 142187 and
--- Comment #5 from Joey dot ye at intel dot com 2009-01-07 02:45 ---
More places with BIGGEST_ALIGN:
$ grep -r "(aligned)" .|grep attribute|grep -v testsuite|grep -v texi
./libstdc++-v3/libsupc++/eh_alloc.cc:typedef char
one_buffer[EMERGENCY_OBJ_SIZE] __attribute_
--- Comment #7 from Joey dot ye at intel dot com 2009-01-14 10:08 ---
(In reply to comment #5)
> Joern, re. comment #4, Richi refers to my patch to enable PRE at -Os, see
> [1].
> An extension to this patch that we tested on x86 machines, is to disable PRE
> for sc
--- Comment #2 from Joey dot ye at intel dot com 2009-01-21 02:40 ---
Following case isn't vecterized with -O3 on x86_64 either, although arrays are
aligned:
#include
float __attribute__((aligned(16))) in1[] = {
1.2, 3.5, 1.7, 2.8
};
float __attribute__((align
--- Comment #20 from Joey dot ye at intel dot com 2009-01-26 11:49 ---
(In reply to comment #10)
> This is caused by stack alignment change, revision 138335. Joey and
> Xuepeng will look into it after holiday, Feb. 1.
This must be stack alignment change. Looks we didn't h
Summary: Missing mf-runtime.h after make -j2 install
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: libmudflap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey d
--- Comment #2 from Joey dot ye at intel dot com 2007-08-20 08:53 ---
(In reply to comment #1)
> Nobody does "make install" with -j.
I guess so, that's why I set it "minor". But does that mean error is expected
with -j? My script had -j by accident and it cos
--- Comment #15 from Joey dot ye at intel dot com 2008-08-05 01:01 ---
(In reply to comment #12)
> I think the problem is in
> /* Set offset to aligned because the realigned frame tarts from here. */
> if (stack_realign_fp)
> offset = (offset + stack_alignme
--- Comment #9 from Joey dot ye at intel dot com 2008-08-06 08:05 ---
Fixed
--
Joey dot ye at intel dot com changed:
What|Removed |Added
Status|NEW
--- Comment #3 from Joey dot ye at intel dot com 2008-08-07 07:55 ---
Although 138318 fixes the compiler ICE, it miscompile with -O3 -ffast-math on
x86-64:
Running 172.mgrid ref base o3 default
*** Miscompare of mgrid.out, see
/home/jye2/cpu2000/benchspec/CFP2000/172.mgrid/run
--- Comment #6 from Joey dot ye at intel dot com 2008-08-11 05:52 ---
(In reply to comment #4)
> If you remove -ffast-math, does it miscompare?
Passes without -ffast-math.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
o: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37124
--- Comment #1 from Joey dot ye at intel dot com 2008-08-19 08:19 ---
Check out such code in i386.c:
/* Figure out whether to use ordered or unordered fp comparisons.
Return the appropriate mode to use. */
enum machine_mode
ix86_fp_compare_mode (enum rtx_code code ATTRIBUTE_UNUSED
--- Comment #7 from Joey dot ye at intel dot com 2008-08-27 08:07 ---
Created an attachment (id=16155)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16155&action=view)
Test case from 2006.434.zeusmp
Though fail to extract a smaller case, hopeful it helpful.
Compile with g
--- Comment #8 from Joey dot ye at intel dot com 2008-08-27 08:11 ---
GDB output:
(gdb) b tranx1_
Breakpoint 1 at 0x43a670
(gdb) r
Breakpoint 1, 0x0043a670 in tranx1_ ()
(gdb) b *0x43accd
Breakpoint 2 at 0x43accd
(gdb) b *0x43acf4
Breakpoint 3 at 0x43acf4
(gdb) b *0x43ad2f
--- Comment #11 from Joey dot ye at intel dot com 2008-08-28 06:14 ---
(In reply to comment #4)
> We got
> Running 416.gamess ref base lnx32-gcc default
> 416.gamess: copy #0 non-zero return code (rc=0, signal=11)
> 416.gamess: copy #0 non-zero return code (rc=0, sign
riority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571
--- Comment #1 from Joey dot ye at intel dot com 2008-09-18 16:01 ---
Root cause is that instruction length of fused jcc is set to 16, which prevent
the block from merging and copying. For some reason Core2 runs poorly with a
unmerged branch block under certain circonstances.
Following
--- Comment #17 from Joey dot ye at intel dot com 2008-10-23 08:42 ---
CPU2006/454.calculix has about 10% regression with IRA + core2 + fpmath=sse on
Core2 ix86:
IRAIRA_core2 NO_IRA_core2
454.calculix 1.00 0.901.01
Revision: trunk 140514
Options in
--- Comment #18 from Joey dot ye at intel dot com 2008-10-24 08:36 ---
Created an attachment (id=16536)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16536&action=view)
Reduced performance case from cpu2006/454.calculix
50% regression with IRA core2 on trunk revsion 140
--- Comment #21 from Joey dot ye at intel dot com 2008-10-25 04:14 ---
To me scheduler is irrelevant here. GCC has no core2 pipeline description so
the instruction scheduling doesn't looks optimized. But for OOO processor like
core2, IMHO scheduling shouldn't make that much
--- Comment #23 from Joey dot ye at intel dot com 2008-10-28 01:19 ---
(In reply to comment #22)
> Created an attachment (id=16571)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view) [edit]
> A patch to re-enable regmove
> After applying this pa
--- Comment #1 from Joey dot ye at intel dot com 2008-01-22 06:38 ---
This patch should fix it:
Index: gcc/tree-nested.c
===
--- gcc/tree-nested.c (revision 131342)
+++ gcc/tree-nested.c (working copy)
@@ -183,6 +183,10
referenced by nested function
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gn
--- Comment #5 from Joey dot ye at intel dot com 2008-01-23 01:45 ---
(In reply to comment #2)
> I bet if you put jj in struct and don't have a nested function, this will be
> the same issue.
Not the same. In fact it passes if not referenced by a nested function. The
root
--- Comment #1 from Joey dot ye at intel dot com 2009-02-04 02:17 ---
GCC doesn't follow x86-64 psABI on this case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39082
--- Comment #1 from Joey dot ye at intel dot com 2009-02-10 05:35 ---
Argument need 32 bytes alignment, No way to guarantee the argument won't be
spilled. That's why stack adjustment is there.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
--- Comment #10 from Joey dot ye at intel dot com 2009-02-11 01:03 ---
(In reply to comment #9)
> Created an attachment (id=17279)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279&action=view) [edit]
> A patch to add a new -malign-double= option
This patch l
--- Comment #5 from Joey dot ye at intel dot com 2009-02-12 01:45 ---
Stack realign is finalized by
stack_realign = (incoming_stack_boundary
< (current_function_is_leaf
? crtl->max_used_stack_slot_ali
--- Comment #7 from Joey dot ye at intel dot com 2009-02-12 02:26 ---
Created an attachment (id=17283)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17283&action=view)
A patch to fix this problem
Impact to other test unknown. Test undergoing.
HJ, can you also help to ver
--- Comment #6 from Joey dot ye at intel dot com 2009-02-12 02:33 ---
(In reply to comment #5)
> If ACCUMULATE_OUTGOING_ARGS is off, ECX will be used
> for stack alignment and it may lead to code size
> increase due to register spill since ia32 has very
> few registe
--- Comment #9 from Joey dot ye at intel dot com 2009-02-12 02:40 ---
(In reply to comment #8)
> We still have push and mov. I guess it may be the best we can do.
I believe so too.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
--- Comment #10 from Joey dot ye at intel dot com 2009-02-12 15:20 ---
(In reply to comment #8)
> We still have push and mov. I guess it may be the best we can do.
> But please run full 32 and 64bit testsuite with your patch as well
> as under emx-avx-sim.
full 32/64 bit test
--- Comment #12 from Joey dot ye at intel dot com 2009-02-16 08:49 ---
Created an attachment (id=17305)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17305&action=view)
New patch attached
Test finished. No regression with emx_avx_sim. Wait to checkin to 4.5
--
Joey do
--- Comment #20 from Joey dot ye at intel dot com 2009-02-17 09:18 ---
(In reply to comment #19)
> Just for the record, here is an unsuccessful attempt to avoid stack
> realignment
> just because of DImode for -m32 or because of DFmode at -m32 -Os. This patch
> unfortunat
--- Comment #31 from Joey dot ye at intel dot com 2009-02-23 03:15 ---
How about this patch?
1. Only reduce DI mode when -Os
2. Ignore TYPE_USER_ALIGN, so that stack realign happens for case in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137#c28, which IMHO is
acceptable.
Index
--- Comment #3 from Joey dot ye at intel dot com 2009-02-27 02:53 ---
(In reply to comment #2)
> Created an attachment (id=17368)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17368&action=view) [edit]
> A patch
> Does this patch make sense?
It works fi
--- Comment #35 from Joey dot ye at intel dot com 2009-03-04 01:41 ---
(In reply to comment #32)
> I don't see the reason for && optimize_function_for_size_p (cfun), care to
> back
> up with benchmarks that forcing dynamic realignment for long long variables
&g
--- Comment #47 from Joey dot ye at intel dot com 2009-03-12 06:51 ---
(In reply to comment #46)
> Created an attachment (id=17444)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17444&action=view) [edit]
> gcc.target/i386/stackalign/longlong-2.c for -mnostackalig
--- Comment #4 from Joey dot ye at intel dot com 2007-07-04 01:17 ---
126198 brought the regression
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32598
--- Comment #1 from Joey dot ye at intel dot com 2007-07-13 09:21 ---
Created an attachment (id=13909)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13909&action=view)
Reduced testcase
GCC crashes with gcc -O2 -fsee case-see.c -c
Fails at all recent 4.3 trunk.
--
fault when compile CPU2000 with -fsee
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at int
--- Comment #2 from Joey dot ye at intel dot com 2007-07-13 09:27 ---
Root cause looks like at see.c line 1643:
emit_insn_after (merged_ref, ref);
delete_insn (ref);
where merged_ref and ref have the same INSN_UID. delete_insn will clear the df
information of that UID
--- Comment #28 from Joey dot ye at intel dot com 2007-10-23 02:23 ---
Got similar result on x86_64, Core 2 improves 24% from 129469 to 129504. That's
great.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
--- Comment #8 from Joey dot ye at intel dot com 2008-01-17 10:11 ---
A small case and patch are available at
http://gcc.gnu.org/ml/gcc-patches/2008-01/msg00747.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34709
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078
--- Comment #5 from Joey dot ye at intel dot com 2008-04-29 10:41 ---
Can be related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078, where I do
have a small case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36074
--- Comment #8 from Joey dot ye at intel dot com 2008-04-30 10:53 ---
(In reply to comment #6)
> (In reply to comment #4)
> > > have you tried to compile with -march=core2 -mfpmath=sse -msse?
> > Yes, I've compiled it as following:
> > % g++ -g -O3 -m
--- Comment #9 from Joey dot ye at intel dot com 2008-04-30 10:56 ---
(In reply to comment #8)
> -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline
> change
Correction: -m32 is a must, but doesn't fix all. Options I'm using:
g++ -g -O3 -mar
--- Comment #11 from Joey dot ye at intel dot com 2008-05-01 04:31 ---
Tim,
Since it doesn't link, I can only check the .s file. There are a couple of
constructor called Environment, which one is the problemetic function?
grep Environment kernel_build.s|grep glob
...
.
--- Comment #13 from Joey dot ye at intel dot com 2008-05-05 07:22 ---
It is helpful. Root cause is that memory allocated by new is only aligned to 8
bytes under i386. In your case, object Environment is allocated by new and its
constructor tried to use movdqa to initialize its members
--- Comment #14 from Joey dot ye at intel dot com 2008-05-05 07:29 ---
HJ,
AVX will have the similar problem on x86_64, whose new only returns object
aligned at 16 bytes. Dynamically allocated __m256 won't be guaranteed at 32
bytes boundary.
--
http://gcc.gnu.org/bug
--- Comment #1 from Joey dot ye at intel dot com 2008-07-11 05:46 ---
Created an attachment (id=15897)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15897&action=view)
Small test case reduced from cpu2006.464.h264ref
/home/jye2/work/bug-37665> gcc -v
Using built-in spe
--- Comment #2 from Joey dot ye at intel dot com 2008-07-11 05:49 ---
Effect of line 76
buffer_frame[0] = InitFullness;
is eliminated by optimizer due to bug in GCC.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36765
gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36835
--- Comment #1 from Joey dot ye at intel dot com 2008-07-16 13:14 ---
Fixed by revision 137859
--
Joey dot ye at intel dot com changed:
What|Removed |Added
07 miscompiles 172.mgrid on x86-64
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
--- Comment #2 from Joey dot ye at intel dot com 2008-07-31 10:50 ---
Yes. Just notice that latest trunk passes.
--
Joey dot ye at intel dot com changed:
What|Removed |Added
ry: Trunk 138207 miscompiles 447.dealII
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at in
--- Comment #1 from Joey dot ye at intel dot com 2008-07-31 11:33 ---
Created an attachment (id=15982)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15982&action=view)
Preprocessed test case
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36986
--- Comment #18 from Joey dot ye at intel dot com 2008-08-04 07:24 ---
(In reply to comment #9)
> Joey, I think the problem is the usage of STACK_BOUNDARY / BITS_PER_UNIT
> for stack alignment. On MacOS, STACK_BOUNDARY 128 on ia32. Shouldn't
> we use UNITS_PER_WORD in some
--- Comment #6 from Joey dot ye at intel dot com 2008-08-04 08:28 ---
(In reply to comment #3)
> Joey, when we compute frame layout, we don't count the duplicated
> return address pushed onto stack when DRAP is used. Also when we
> push return address, shouldn't we
--- Comment #7 from Joey dot ye at intel dot com 2008-08-04 09:03 ---
This problem is associated with -mpreferred-stack-boundary=2, rather than with
stack alignment. Following case fails on trunk before merging with stack
branch:
$ cat y1.c
/* PR middle-end/37010 */
/* { dg-do run
--- Comment #8 from Joey dot ye at intel dot com 2008-08-04 09:11 ---
Root cause is that outgoing parameter frame is aligned based on stack pointer.
Namely, address_of_stack_param = SP + offset + fixed_padding.
With -mpreferred-stack-boundary=2, alignment of SP is only 4 bytes
--- Comment #11 from Joey dot ye at intel dot com 2008-08-04 14:11 ---
(In reply to comment #10)
> Did you mean we needed 2 "additional 'and $-16, sp" insns to align the
> stack? I don't think so.
Definitely not.
Solution 1: Just ignore it. __m128 paramete
71 matches
Mail list logo