[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-17 Thread tehila at il dot ibm dot com
--- Comment #9 from tehila at il dot ibm dot com 2008-11-18 07:35 --- This testcase is indeed very slow on SPU, with -O2 and above. I don't see any slowness for -O1. If I turn off the insns scheduler (with -fno-schedule-insns) it is much faster: X4 faster for 1,000 args (ARG3),

[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-25 Thread tehila at il dot ibm dot com
--- Comment #11 from tehila at il dot ibm dot com 2008-11-25 12:17 --- (In reply to comment #10) > If you only get slow compilation at -O2 and above then your problem is > probably > due to PR 37790. The original problem affected -O1 compiles as well as -O2. PR 37790 does

[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-27 Thread tehila at il dot ibm dot com
--- Comment #13 from tehila at il dot ibm dot com 2008-11-27 12:20 --- (In reply to comment #12) Thanks, Andrey. I think there are 2 "issues" here: 1. register-renaming. (more related to this PR, I think) 2. schuedule-insns. Both of them slows compilation. With ARG4, on SPU,

[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-27 Thread tehila at il dot ibm dot com
--- Comment #15 from tehila at il dot ibm dot com 2008-11-27 12:57 --- (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > Thanks, Andrey. > > I think there are 2 "issues" here: > > 1. register-renaming. (mo

[Bug middle-end/31055] New: missed auto-vectorization optimization, when there is float to int conversion

2007-03-06 Thread tehila at il dot ibm dot com
t to int conversion Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tehila at il dot ibm dot com GCC build triplet:

[Bug tree-optimization/32821] New: tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com
il dot ibm dot com GCC build triplet: i386-redhat-linux (also powerpc-*-linux) GCC host triplet: i386-redhat-linux (also powerpc-*-linux) GCC target triplet: i386-redhat-linux (also powerpc-*-linux) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32821

[Bug tree-optimization/32821] tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com
--- Comment #1 from tehila at il dot ibm dot com 2007-07-19 13:38 --- (In reply to comment #0) > #0 first_stmt (bb=0xb7fa75a0) at ../../gcc/gcc/tree-iterator.h:43 > #1 0x0838d46e in dump_generic_bb (file=0x9785710, bb=0xb7fa75a0, indent=0, > flags=16448) at ../../gcc/gcc/tr

[Bug tree-optimization/32821] tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com
--- Comment #2 from tehila at il dot ibm dot com 2007-07-19 13:51 --- (In reply to comment #1) I've just tried to comment out the code: if (dump_flags & TDF_DETAILS) { dump_bb (bb, dump_file, 0); fprintf (dump_file, "\n"); } from

[Bug tree-optimization/32821] tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com
--- Comment #4 from tehila at il dot ibm dot com 2007-07-19 14:15 --- > No, it ICEs when empty BB is to be pretty-printed. A tree pretty-printer > should > be fixed/updated for this situation, this is all this PR is about. Thanks for the quick response. You're right

[Bug tree-optimization/32826] Reduction into a global variable causes a Load Hit Store Hazard (for the Cell)

2007-07-26 Thread tehila at il dot ibm dot com
--- Comment #2 from tehila at il dot ibm dot com 2007-07-26 10:46 --- (In reply to comment #2) Just want a clarification: I see you're compiling on PPU (since you're using -maltivec). Does this problematic also on SPU? Does SPU has this LHS hazard? Another question: lwz

[Bug c/37221] New: GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-24 Thread tehila at il dot ibm dot com
Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tehila at il dot ibm dot com GCC target triplet: Cell SPU http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221

[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-25 Thread tehila at il dot ibm dot com
--- Comment #2 from tehila at il dot ibm dot com 2008-08-25 08:18 --- Andrew, thanks for your response and ideas. >From what we see, if -funroll-loops is on, the loops: for (j = 0; j < 4; j++) arr[j] = mat2[i][j]; and for (k = 0; k < 3; k++)

[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-25 Thread tehila at il dot ibm dot com
--- Comment #3 from tehila at il dot ibm dot com 2008-08-25 08:45 --- (In reply to comment #2) > Andrew, thanks for your response and ideas. > From what we see, if -funroll-loops is on, the loops: > for (j = 0; j < 4; j++) > arr[j] = mat2[i][j]; > and &

[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-25 Thread tehila at il dot ibm dot com
--- Comment #4 from tehila at il dot ibm dot com 2008-08-25 14:52 --- (In reply to comment #2) > Hopefully, if that loop would be unrolled, the SRA will have the opportunity > to do the transformation we expect it to do. I've tried it manually, and that indeed works. i.e

[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-26 Thread tehila at il dot ibm dot com
--- Comment #5 from tehila at il dot ibm dot com 2008-08-26 20:47 --- (In reply to comment #3) > The meaning here is to the second > for (j = 0; j < 4; j++) > loop. > It's loop #4 in cunrolli pass. > > cunrolli doesn't recognize # of iterations = 4. >

[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-02 Thread tehila at il dot ibm dot com
--- Comment #8 from tehila at il dot ibm dot com 2008-09-02 12:47 --- Thank you, Richard! This patch indeed does the work and unrolls the loop. The SRA works fine and we get 40% improvement. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221

[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-02 Thread tehila at il dot ibm dot com
--- Comment #10 from tehila at il dot ibm dot com 2008-09-03 06:58 --- (In reply to comment #9) > If you give the patch bootstrap & testing I'll approve it for trunk. > Richard. Great. I'm bootstraping and testing it on x86 now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221

[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-04 Thread tehila at il dot ibm dot com
--- Comment #11 from tehila at il dot ibm dot com 2008-09-04 19:46 --- (In reply to comment #10) > I'm bootstraping and testing it on x86 now. Bootstrap fails (at least on x86_64) (with ICE). Tehila. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221

[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-08 Thread tehila at il dot ibm dot com
--- Comment #12 from tehila at il dot ibm dot com 2008-09-08 08:21 --- (In reply to comment #11) > (In reply to comment #10) > > I'm bootstraping and testing it on x86 now. > Bootstrap fails (at least on x86_64) (with ICE). > Tehila. It fails at tree-ssa-loop-m

[Bug tree-optimization/24659] Conversions are not vectorized

2007-01-07 Thread tehila at il dot ibm dot com
--- Comment #7 from tehila at il dot ibm dot com 2007-01-07 08:03 --- Right, the vectorizer currently supports conversions only between integral types. Support for type conversions that involve also floating-point types are in the works. -- http://gcc.gnu.org/bugzilla/show_bug.cgi