Hello,
I have one example with two very similar loops. cunrolli pass unrolls one loop
completely
but not the other based on slightly different cost estimations. The
not-unrolled loop
get SLP-vectorized, then unrolled by "cunroll" pass, whereas the other unrolled
loop cannot
be vectorized since it is not a loop any more. In the end, there is big
difference of
performance between two loops.
My question is why SLP vectorization has to be performed on loop (it is a
sub-pass under
pass_tree_loop). Conceptually, cannot it be done on any basic block? Our port
are still
stuck at 4.5. But I checked 4.7, it seems still the same. I also checked
functions in
tree-vect-slp.c. They use a lot of loop_vinfo structures. But in some places it
checks
whether loop_vinfo exists to use it or other alternative. I tried to add an
extra SLP
pass after pass_tree_loop, but it didn't work. I wonder how easy to make SLP
works for
non-loop.
Thanks,
Bingfeng Mei
Broadcom UK
void foo (int *__restrict__ temp_hist_buffer,
int * __restrict__ p_hist_buff,
int *__restrict__ p_input)
{
int i;
for(i=0;i<4;i++)
temp_hist_buffer[i]=p_hist_buff[i];
for(i=0;i<4;i++)
temp_hist_buffer[i+4]=p_input[i];
}