--- Comment #9 from tehila at il dot ibm dot com 2008-11-18 07:35 ---
This testcase is indeed very slow on SPU, with -O2 and above.
I don't see any slowness for -O1.
If I turn off the insns scheduler (with -fno-schedule-insns) it is much faster:
X4 faster for 1,000 args (ARG3),
--- Comment #11 from tehila at il dot ibm dot com 2008-11-25 12:17 ---
(In reply to comment #10)
> If you only get slow compilation at -O2 and above then your problem is
> probably
> due to PR 37790. The original problem affected -O1 compiles as well as -O2.
PR 37790 does
--- Comment #13 from tehila at il dot ibm dot com 2008-11-27 12:20 ---
(In reply to comment #12)
Thanks, Andrey.
I think there are 2 "issues" here:
1. register-renaming. (more related to this PR, I think)
2. schuedule-insns.
Both of them slows compilation.
With ARG4, on SPU,
--- Comment #15 from tehila at il dot ibm dot com 2008-11-27 12:57 ---
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > Thanks, Andrey.
> > I think there are 2 "issues" here:
> > 1. register-renaming. (mo
t to int conversion
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tehila at il dot ibm dot com
GCC build triplet:
il dot ibm dot com
GCC build triplet: i386-redhat-linux (also powerpc-*-linux)
GCC host triplet: i386-redhat-linux (also powerpc-*-linux)
GCC target triplet: i386-redhat-linux (also powerpc-*-linux)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32821
--- Comment #1 from tehila at il dot ibm dot com 2007-07-19 13:38 ---
(In reply to comment #0)
> #0 first_stmt (bb=0xb7fa75a0) at ../../gcc/gcc/tree-iterator.h:43
> #1 0x0838d46e in dump_generic_bb (file=0x9785710, bb=0xb7fa75a0, indent=0,
> flags=16448) at ../../gcc/gcc/tr
--- Comment #2 from tehila at il dot ibm dot com 2007-07-19 13:51 ---
(In reply to comment #1)
I've just tried to comment out the code:
if (dump_flags & TDF_DETAILS)
{
dump_bb (bb, dump_file, 0);
fprintf (dump_file, "\n");
}
from
--- Comment #4 from tehila at il dot ibm dot com 2007-07-19 14:15 ---
> No, it ICEs when empty BB is to be pretty-printed. A tree pretty-printer
> should
> be fixed/updated for this situation, this is all this PR is about.
Thanks for the quick response.
You're right
--- Comment #2 from tehila at il dot ibm dot com 2007-07-26 10:46 ---
(In reply to comment #2)
Just want a clarification:
I see you're compiling on PPU (since you're using -maltivec).
Does this problematic also on SPU? Does SPU has this LHS hazard?
Another question:
lwz
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tehila at il dot ibm dot com
GCC target triplet: Cell SPU
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221
--- Comment #2 from tehila at il dot ibm dot com 2008-08-25 08:18 ---
Andrew, thanks for your response and ideas.
>From what we see, if -funroll-loops is on, the loops:
for (j = 0; j < 4; j++)
arr[j] = mat2[i][j];
and
for (k = 0; k < 3; k++)
--- Comment #3 from tehila at il dot ibm dot com 2008-08-25 08:45 ---
(In reply to comment #2)
> Andrew, thanks for your response and ideas.
> From what we see, if -funroll-loops is on, the loops:
> for (j = 0; j < 4; j++)
> arr[j] = mat2[i][j];
> and
&
--- Comment #4 from tehila at il dot ibm dot com 2008-08-25 14:52 ---
(In reply to comment #2)
> Hopefully, if that loop would be unrolled, the SRA will have the opportunity
> to do the transformation we expect it to do.
I've tried it manually, and that indeed works.
i.e
--- Comment #5 from tehila at il dot ibm dot com 2008-08-26 20:47 ---
(In reply to comment #3)
> The meaning here is to the second
> for (j = 0; j < 4; j++)
> loop.
> It's loop #4 in cunrolli pass.
> > cunrolli doesn't recognize # of iterations = 4.
>
--- Comment #8 from tehila at il dot ibm dot com 2008-09-02 12:47 ---
Thank you, Richard!
This patch indeed does the work and unrolls the loop.
The SRA works fine and we get 40% improvement.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221
--- Comment #10 from tehila at il dot ibm dot com 2008-09-03 06:58 ---
(In reply to comment #9)
> If you give the patch bootstrap & testing I'll approve it for trunk.
> Richard.
Great.
I'm bootstraping and testing it on x86 now.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221
--- Comment #11 from tehila at il dot ibm dot com 2008-09-04 19:46 ---
(In reply to comment #10)
> I'm bootstraping and testing it on x86 now.
Bootstrap fails (at least on x86_64) (with ICE).
Tehila.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221
--- Comment #12 from tehila at il dot ibm dot com 2008-09-08 08:21 ---
(In reply to comment #11)
> (In reply to comment #10)
> > I'm bootstraping and testing it on x86 now.
> Bootstrap fails (at least on x86_64) (with ICE).
> Tehila.
It fails at tree-ssa-loop-m
--- Comment #7 from tehila at il dot ibm dot com 2007-01-07 08:03 ---
Right, the vectorizer currently supports conversions only between integral
types. Support for type conversions that involve also floating-point types are
in the works.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi
20 matches
Mail list logo