I figured the epilogue vectorization code doesn't compute the correct number of iterations for the epilogue when peeling for gaps is in effect. This prevents epilogue vectorization in some cases and given the code also sets nb_iterations_upper_bound, causes possible wrong-code (I think we probably want to remove that code since it should be redundant).
Boostrapped and tested on x86_64-unknown-linux-gnu, I also ran SPEC 2k6 with epilogue vectorization enabled on a core-avx2 machine successfully. Applied to trunk. Richard. 2018-12-03 Richard Biener <rguent...@suse.de> * tree-vect-loop.c (vect_transform_loop): Properly compute upper bound for the epilogue when doing epilogue vectorization. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 266665) +++ gcc/tree-vect-loop.c (working copy) @@ -8548,9 +8548,12 @@ vect_transform_loop (loop_vec_info loop_ { unsigned int eiters = (LOOP_VINFO_INT_NITERS (loop_vinfo) - - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)); - eiters = eiters % lowest_vf; + - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) + - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)); + eiters + = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); epilogue->nb_iterations_upper_bound = eiters - 1; + epilogue->any_upper_bound = true; unsigned int ratio; while (next_size < vector_sizes.length ()