https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70359

--- Comment #32 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
As mentioned in the previous comment, the proposed patch brings down the count
from 116 to 108 on ARM, but is shy of the desired 96.

The missing bytes can be attributed to forwprop folding this (IL expanded for
illustration):

  if (ui_7 / 10 != 0)

into:

  if (ui_7 > 9)

More specifically, changing this:

  # ui_7 = PHI <ui_13(2), ui_21(3)>
  ...
  ui_21 = ui_7 / 10;
  if (ui_21 != 0)

into:

  # ui_7 = PHI <ui_13(2), ui_21(3)>
  ...
  ui_21 = ui_7 / 10;
  if (ui_7 > 9)

Inhibiting this optimization brings down the byte count to 92 which is even
lower than our 96 boogie man, so perhaps worth pursuing.  (Assumes my proposed
patch is also applied.)  I'm no expert, but isn't a EQ/NE with 0 preferable
than a <> with a non-zero?

If so, should we restrict the folding somewhat, or clean this up after the
fact?

For reference, the folding (in forwprop) is due to this match.pd pattern:

/* X / C1 op C2 into a simple range test.  */

...though eliminating it causes another pattern to pick up the slack and do the
same:

/* Transform:
 * (X / Y) == 0 -> X < Y if X, Y are unsigned.
 * (X / Y) != 0 -> X >= Y, if X, Y are unsigned.
 */

Eliminating both patterns "fixes" the problem.

Suggestions welcome :).

Reply via email to