Thank you for pointing it out. I didn't realized that alias analysis
has influences on this issue.

The current problem is that the epilogue may be unnecessary if the
loop bound cannot be larger than the number of iterations of the
vectorized loop multiplied by VF when the vectorized loop is supposed
to be executed. My method is incorrect because I assume the vectorized
loop will be executed which is actually guaranteed by loop bound check
(and also alias checks). So if the alias checks exist, my method is
fine as both conditions are met. If there is no alias checks, I must
consider the possibility that the vectorized loop may not be executed
at runtime and then the epilogue should not be eliminated. The warning
appears on epilogue, and with loop bound checks (and without alias
checks) the warning will be gone. So I think the key is alias checks:
my method only works if there is no alias checks.

How about adding one more condition that checks if alias checks are
needed, as the code shown below?

  else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
  || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
      < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
      && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
                   || (unsigned HOST_WIDE_INT)max_stmt_executions_int
       (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;


thanks,
Cong


On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
>> This patch is fixing PR60505 in which the vectorizer may produce
>> unnecessary epilogues.
>>
>> Bootstrapped and tested on a x86_64 machine.
>>
>> OK for trunk?
>
> That looks wrong.  Consider the case where the loop isn't versioned,
> if you disable generation of the epilogue loop, you end up only with
> a vector loop.
>
> Say:
> unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> void
> foo (char *__restrict in, char *__restrict out, int num)
> {
>   int i;
>
>   in = __builtin_assume_aligned (in, 16);
>   out = __builtin_assume_aligned (out, 16);
>   for (i = 0; i < num; ++i)
>     out[i] = (ovec[i] = in[i]);
>   out[num] = ovec[num / 2];
> }
> -O2 -ftree-vectorize.  Now, consider if this function is called
> with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> valid and your patch will cause a wrong-code in this case).
>
>         Jakub

Reply via email to