https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102705

--- Comment #1 from Andrew Macleod <amacleod at redhat dot com> ---
That impact of that patch on this PR is that it teaches range extraction that
[0,1] % [5,5]  is [0,1], unlike before which came up with [0,4]

This in turn causes the thread1 pass to decide to thread something it didn't
thread before.

In the absence of thread1 making this decision (ie, before the patch), the next
pass is VRP/VRpthread which ends up performing the thread anyway, but via
different means, and ever so slightly different IL

By the time we hit DCE3, the differences are very slight:
(We know b.1_1 has a range of [0,1])
Original code where we get the optimization:

  _2 = 1 >> b.1_1;
  iftmp.0_10 = (char) _2;
  _3 = (int) iftmp.0_10;
  b = _3;
  _4 = iftmp.0_10 ^ 1;
  _5 = (int) _4;
  iftmp.6_22 = (short int) _5;
  _6 = (short int) iftmp.0_10;
  if (_6 == iftmp.6_22)
    goto <bb 4>; [49.37%]
  else
    goto <bb 5>; [50.63%]


The next pass is forwprop3, and it reports:
gimple_simplified to iftmp.6_22 = (short int) _4;
gimple_simplified to if (0 != 0)

I think it can see that with  iftmp.0_10 having range [0, 1] that _4,_5 and
iftmp.6_22 are therefore basically ~iftmp.0_10
Thus if can fold the condition as never being true.

And turns this into :

  _2 = 1 >> b.1_1;
  iftmp.0_10 = (char) _2;
  _3 = (int) iftmp.0_10;
  b = _3;
  _4 = iftmp.0_10 ^ 1;
  _5 = (int) _4;
  iftmp.6_22 = (short int) _4;
  _6 = (short int) iftmp.0_10;


Meanwhile, trunk threads earlier, and produces slightly different code .   At
DSE3 time it looks like:

  _2 = 1 >> b.1_1;
  iftmp.0_10 = (char) _2;
  b = _2; 
  _4 = iftmp.0_10 ^ 1;
  _5 = (int) _4;
  iftmp.6_22 = (short int) _5;
  _6 = (short int) _2;
  if (_6 == iftmp.6_22)
    goto <bb 5>; [50.37%]
  else
    goto <bb 6>; [49.63%]

So we have already skipped a few casts as use _2 more directly.

The problem is that the next pass, forwprop3 does not like this new code, and
does not perform the same fold, leaving the condition.   And thus we never lose
the call to foo().

again, _4,_5 and iftmp.6_22 would all be known to be ~iftmp.0_10,  but it looks
like forwprop no longer recognizes that _6 = (short int) _2 makes it 
equivalent to iftmp.0_10 ?

Reply via email to