yes, loop vectorizer relies on early passes to straighten out control
flow (unswitch, index splitting, loop distribution, ifcvt etc). Intel
ICC is pretty good at it. For the following simple made up case, icc
vectorizes the loop.
int a[1];
int b[1];
int foo (int n)
{
int i;
for (i
On Fri, May 31, 2013 at 6:54 AM, Jakub Jelinek wrote:
> On Fri, May 31, 2013 at 03:48:59PM +0200, Toon Moene wrote:
>> >If you rewrite the above into:
>> >SUBROUTINE XYZ(A, B, N)
>> >DIMENSION A(N), B(N)
>> >DO I = 1, N
>> >C = B(I)
>> >IF (A(I)> 0.0) THEN
>> > A(I) = C / A(I)
>> >
On 05/31/2013 03:54 PM, Jakub Jelinek wrote:
> I wrote:
But this "inner loop" has at least 3 basic blocks - so what does the
"loop->num_nodes != 2" test exactly codify ?
With the above testcase it has just 2.
Before ifcvt pass it still has 4:
Ah, I missed that subtle part. So my example is
On Fri, May 31, 2013 at 03:48:59PM +0200, Toon Moene wrote:
> >If you rewrite the above into:
> >SUBROUTINE XYZ(A, B, N)
> >DIMENSION A(N), B(N)
> >DO I = 1, N
> >C = B(I)
> >IF (A(I)> 0.0) THEN
> > A(I) = C / A(I)
> >ELSE
> > A(I) = C
> >ENDIF
> >ENDDO
> >END
> >
> >th
On Fri, May 31, 2013 at 3:48 PM, Toon Moene wrote:
> On 05/31/2013 03:41 PM, Jakub Jelinek wrote:
>
>> On Fri, May 31, 2013 at 03:21:51PM +0200, Toon Moene wrote:
>
>
>>> SUBROUTINE XYZ(A, B, N)
>>> DIMENSION A(N), B(N)
>>> DO I = 1, N
>>> IF (A(I)> 0.0) THEN
>>>A(I) = B(I) / A(I)
>>>
On 05/31/2013 03:41 PM, Jakub Jelinek wrote:
On Fri, May 31, 2013 at 03:21:51PM +0200, Toon Moene wrote:
SUBROUTINE XYZ(A, B, N)
DIMENSION A(N), B(N)
DO I = 1, N
IF (A(I)> 0.0) THEN
A(I) = B(I) / A(I)
ELSE
A(I) = B(I)
ENDIF
ENDDO
END
Well, in this case (with -Ofas
On Fri, May 31, 2013 at 03:21:51PM +0200, Toon Moene wrote:
> SUBROUTINE XYZ(A, B, N)
> DIMENSION A(N), B(N)
> DO I = 1, N
>IF (A(I) > 0.0) THEN
> A(I) = B(I) / A(I)
>ELSE
> A(I) = B(I)
>ENDIF
> ENDDO
> END
Well, in this case (with -Ofast) it is just the case that ifcvt
or
On Fri, May 31, 2013 at 3:21 PM, Toon Moene wrote:
> On 05/31/2013 10:20 AM, Richard Biener wrote:
>
>> So - I doubt that you both do not get any ICEs and more performance.
>
>
> I added the second suggested patch:
>
> Index: tree-vect-loop-manip.c
> ===
On 05/31/2013 10:20 AM, Richard Biener wrote:
So - I doubt that you both do not get any ICEs and more performance.
I added the second suggested patch:
Index: tree-vect-loop-manip.c
===
--- tree-vect-loop-manip.c (revision 19
On Fri, May 31, 2013 at 10:20:01AM +0200, Richard Biener wrote:
> The limit is there because a loop with more than one basic-block with code
> necessarily has to have conditionally executed BBs and eventually PHI nodes
> at merge points.
>
> Now, it may be that we properly determine if we can hand
On Thu, May 30, 2013 at 2:46 AM, Dehao Chen wrote:
> Hi,
>
> In tree-vect-loop.c, it limits the vectorization only to loops that have 2
> BBs:
>
> /* Inner-most loop. We currently require that the number of BBs is
> exactly 2 (the header and latch). Vectorizable inner-most loops
Actually, you need another patch to make this work:
Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c (revision 199416)
+++ gcc/tree-vect-loop-manip.c (working copy)
@@ -855,7 +855,6 @@
/* All loops have an o
On 05/30/2013 02:46 AM, Dehao Chen wrote:
In tree-vect-loop.c, it limits the vectorization only to loops that have 2 BBs:
/* Inner-most loop. We currently require that the number of BBs is
exactly 2 (the header and latch). Vectorizable inner-most loops
look like thi
Hi,
In tree-vect-loop.c, it limits the vectorization only to loops that have 2 BBs:
/* Inner-most loop. We currently require that the number of BBs is
exactly 2 (the header and latch). Vectorizable inner-most loops
look like this:
(pre-header)
14 matches
Mail list logo