https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|unknown |16.0 Target| |x86_64-*-* --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- We're actually vectorizing this as vect_pretmp_6.14_58 = MEM <vector(16) int> [(int *)vectp_c.12_56]; vect__28.15_59 = VIEW_CONVERT_EXPR<vector(16) unsigned int>(vect_pretmp_6.14_58); vect__31.18_62 = vect__28.15_59 + { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 }; vect_iftmp.19_63 = VIEW_CONVERT_EXPR<vector(16) int>(vect__31.18_62); vect__29.16_60 = vect__28.15_59 + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }; vect_iftmp.17_61 = VIEW_CONVERT_EXPR<vector(16) int>(vect__29.16_60); vect_iftmp.20_64 = VEC_COND_EXPR <mask__4.11_55, vect_iftmp.17_61, vect_iftmp.19_63>; MEM <vector(16) int> [(int *)vectp_a.21_65] = vect_iftmp.20_64; where clang seems to vectorize a[i] = c[i] + (b[i] > 0) ? 1 : 2; Using a merge-masking add is of course expensive. Folding turns the above to vect_pretmp_6.14_58 = MEM <vector(16) int> [(int *)&c + ivtmp.44_72 * 1]; vect__28.15_59 = VIEW_CONVERT_EXPR<vector(16) unsigned int>(vect_pretmp_6.14_58); vect__31.18_62 = vect__28.15_59 + { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 }; _52 = .COND_ADD (mask__4.11_55, vect__28.15_59, { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, vect__31.18_62); vect_iftmp.20_64 = VIEW_CONVERT_EXPR<vector(16) int>(_52); MEM <vector(16) int> [(int *)&a + ivtmp.44_72 * 1] = vect_iftmp.20_64; I think if-conversion (not phiopt) should have linearized pretmp_6 = c[i_14]; if (_1 > 0) goto <bb 4>; [59.00%] else goto <bb 5>; [41.00%] <bb 4> [local count: 627172604]: iftmp.0_9 = pretmp_6 + 1; goto <bb 6>; [100.00%] <bb 5> [local count: 435831803]: iftmp.0_8 = pretmp_6 + 2; <bb 6> [local count: 1063004410]: # iftmp.0_5 = PHI <iftmp.0_9(4), iftmp.0_8(5)> as an add of a conditional 1 or 2 instead (possibly using folding and match during building of the COND_EXPR). Note it requires PRE code hoisting to hoist the load out of the conditional if/else. One could argue we miss a phiopt after the PRE/SINK/DSE/DCE pass group before loop opts or that ifcvt should also run (parts of) PHI-OPT first.