Re: Question about vectorization limit

Richard Biener Fri, 31 May 2013 01:20:06 -0700

On Thu, May 30, 2013 at 2:46 AM, Dehao Chen <de...@google.com> wrote:
> Hi,
>
> In tree-vect-loop.c, it limits the vectorization only to loops that have 2 
> BBs:
>
>       /* Inner-most loop.  We currently require that the number of BBs is
>          exactly 2 (the header and latch).  Vectorizable inner-most loops
>          look like this:
>
>                         (pre-header)
>                            |
>                           header <--------+
>                            | |            |
>                            | +--> latch --+
>                            |
>                         (exit-bb)  */
>
>       if (loop->num_nodes != 2)
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "not vectorized: control flow in loop.");
>           return NULL;
>         }
>
> Any insights why the limit is set to 2? We found that removing this
> limit actually improve performance for many applications.


The limit is there because a loop with more than one basic-block with code
necessarily has to have conditionally executed BBs and eventually PHI nodes
at merge points.

Now, it may be that we properly determine if we can handle the PHIs in
the non-header BB and that we properly bail out if we hit a conditional
statement.  But especially the latter would mean that we would not vectorize
the loop.

So - I doubt that you both do not get any ICEs and more performance.  Thus,
please provide a testcase where you vectorize a function with more than 2
basic-blocks with your patch (should be trivial to detect those by re-checking
loop->num_nodes after vectorization analysis succeeded).

So in the end the test is to save us from useless analysis work that would
just end up with not vectorizing the loop anyway.

Thanks,
Richard.

> Thanks,
> Dehao

Re: Question about vectorization limit

Reply via email to