http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49518
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-24 09:21:53 UTC --- nelements is here 16, while vf is just 2 (as the loop also operates on ints). mis is 2 (one iteration has been peeled already before vectorization). So, npeel_tmp is 14 and as cost model is used, the loop with the assertion that npeel_tmp <= vf iterates at least once. Richard, can you please look at this? Perhaps we want to and with smaller of vf and nelements & (nelements - 1) & (vf - 1)?