https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |16.0
             Target|                            |x86_64-*-*

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're actually vectorizing this as

  vect_pretmp_6.14_58 = MEM <vector(16) int> [(int *)vectp_c.12_56];
  vect__28.15_59 = VIEW_CONVERT_EXPR<vector(16) unsigned
int>(vect_pretmp_6.14_58);
  vect__31.18_62 = vect__28.15_59 + { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2 };
  vect_iftmp.19_63 = VIEW_CONVERT_EXPR<vector(16) int>(vect__31.18_62);
  vect__29.16_60 = vect__28.15_59 + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1 };
  vect_iftmp.17_61 = VIEW_CONVERT_EXPR<vector(16) int>(vect__29.16_60);
  vect_iftmp.20_64 = VEC_COND_EXPR <mask__4.11_55, vect_iftmp.17_61,
vect_iftmp.19_63>;
  MEM <vector(16) int> [(int *)vectp_a.21_65] = vect_iftmp.20_64;

where clang seems to vectorize

  a[i] = c[i] + (b[i] > 0) ? 1 : 2;

Using a merge-masking add is of course expensive.  Folding turns the above to

  vect_pretmp_6.14_58 = MEM <vector(16) int> [(int *)&c + ivtmp.44_72 * 1];
  vect__28.15_59 = VIEW_CONVERT_EXPR<vector(16) unsigned
int>(vect_pretmp_6.14_58);
  vect__31.18_62 = vect__28.15_59 + { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2 };
  _52 = .COND_ADD (mask__4.11_55, vect__28.15_59, { 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1 }, vect__31.18_62);
  vect_iftmp.20_64 = VIEW_CONVERT_EXPR<vector(16) int>(_52);
  MEM <vector(16) int> [(int *)&a + ivtmp.44_72 * 1] = vect_iftmp.20_64;

I think if-conversion (not phiopt) should have linearized

  pretmp_6 = c[i_14];
  if (_1 > 0)
    goto <bb 4>; [59.00%]
  else
    goto <bb 5>; [41.00%]

  <bb 4> [local count: 627172604]:
  iftmp.0_9 = pretmp_6 + 1;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 435831803]:
  iftmp.0_8 = pretmp_6 + 2;

  <bb 6> [local count: 1063004410]:
  # iftmp.0_5 = PHI <iftmp.0_9(4), iftmp.0_8(5)>

as an add of a conditional 1 or 2 instead (possibly using folding and match
during building of the COND_EXPR).  Note it requires PRE code hoisting to
hoist the load out of the conditional if/else.  One could argue we miss
a phiopt after the PRE/SINK/DSE/DCE pass group before loop opts or that
ifcvt should also run (parts of) PHI-OPT first.

Reply via email to