Thank you for pointing it out. I didn't realized that alias analysis has influences on this issue.
The current problem is that the epilogue may be unnecessary if the loop bound cannot be larger than the number of iterations of the vectorized loop multiplied by VF when the vectorized loop is supposed to be executed. My method is incorrect because I assume the vectorized loop will be executed which is actually guaranteed by loop bound check (and also alias checks). So if the alias checks exist, my method is fine as both conditions are met. If there is no alias checks, I must consider the possibility that the vectorized loop may not be executed at runtime and then the epilogue should not be eliminated. The warning appears on epilogue, and with loop bound checks (and without alias checks) the warning will be gone. So I think the key is alias checks: my method only works if there is no alias checks. How about adding one more condition that checks if alias checks are needed, as the code shown below? else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) || (unsigned HOST_WIDE_INT)max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; thanks, Cong On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <ja...@redhat.com> wrote: > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: >> This patch is fixing PR60505 in which the vectorizer may produce >> unnecessary epilogues. >> >> Bootstrapped and tested on a x86_64 machine. >> >> OK for trunk? > > That looks wrong. Consider the case where the loop isn't versioned, > if you disable generation of the epilogue loop, you end up only with > a vector loop. > > Say: > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > void > foo (char *__restrict in, char *__restrict out, int num) > { > int i; > > in = __builtin_assume_aligned (in, 16); > out = __builtin_assume_aligned (out, 16); > for (i = 0; i < num; ++i) > out[i] = (ovec[i] = in[i]); > out[num] = ovec[num / 2]; > } > -O2 -ftree-vectorize. Now, consider if this function is called > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > valid and your patch will cause a wrong-code in this case). > > Jakub