http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55600
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-05 10:33:06 UTC --- GCC fully unrolls the vectorized looo. ICC does not. The loop rolls 16 times: <bb 3>: # vect_p.5_30 = PHI <vect_p.5_45(4), vect_p.8_31(2)> # vect_su.12_52 = PHI <vect_su.12_53(4), { 0, 0, 0, 0 }(2)> # ivtmp_61 = PHI <ivtmp_62(4), 0(2)> vect_var_.9_46 = MEM[(int *)vect_p.5_30]; vect_p.5_47 = vect_p.5_30 + 16; vect_var_.10_48 = MEM[(int *)vect_p.5_47]; vect_perm_even_49 = VEC_PERM_EXPR <vect_var_.9_46, vect_var_.10_48, { 0, 2, 4, 6 }>; vect_perm_odd_50 = VEC_PERM_EXPR <vect_var_.9_46, vect_var_.10_48, { 1, 3, 5, 7 }>; vect_var_.11_51 = vect_perm_even_49 * vect_perm_odd_50; vect_su.12_53 = vect_var_.11_51 + vect_su.12_52; vect_p.5_45 = vect_p.5_47 + 16; ivtmp_62 = ivtmp_61 + 1; if (ivtmp_62 < 16) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; but at -O3 we don't care too much about code size in this case. So I'm not sure you can call this a "bug". Does it run slower?