On Thu, May 30, 2013 at 2:46 AM, Dehao Chen <de...@google.com> wrote: > Hi, > > In tree-vect-loop.c, it limits the vectorization only to loops that have 2 > BBs: > > /* Inner-most loop. We currently require that the number of BBs is > exactly 2 (the header and latch). Vectorizable inner-most loops > look like this: > > (pre-header) > | > header <--------+ > | | | > | +--> latch --+ > | > (exit-bb) */ > > if (loop->num_nodes != 2) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "not vectorized: control flow in loop."); > return NULL; > } > > Any insights why the limit is set to 2? We found that removing this > limit actually improve performance for many applications.
The limit is there because a loop with more than one basic-block with code necessarily has to have conditionally executed BBs and eventually PHI nodes at merge points. Now, it may be that we properly determine if we can handle the PHIs in the non-header BB and that we properly bail out if we hit a conditional statement. But especially the latter would mean that we would not vectorize the loop. So - I doubt that you both do not get any ICEs and more performance. Thus, please provide a testcase where you vectorize a function with more than 2 basic-blocks with your patch (should be trivial to detect those by re-checking loop->num_nodes after vectorization analysis succeeded). So in the end the test is to save us from useless analysis work that would just end up with not vectorizing the loop anyway. Thanks, Richard. > Thanks, > Dehao