https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80791

--- Comment #6 from amker at gcc dot gnu.org ---
So before the change, ivotps dump is like:
  <bb 2> [15.00%]:
  _15 = (unsigned int) m_8(D);
  ivtmp.10_16 = _15 * 8;

  <bb 3> [100.00%]:
  # i_5 = PHI <m_8(D)(2), i_12(4)>
  # sh_6 = PHI <256(2), sh_10(4)>
  # ivtmp.10_18 = PHI <ivtmp.10_16(2), ivtmp.10_17(4)>
  _2 = ivtmp.10_18;
  _3 = (sizetype) _2;
  sh_10 = sh_6 >> 1;
  _4 = &a + _3;
  a[sh_10] = _4;
  i_12 = i_5 + 4;
  ivtmp.10_17 = ivtmp.10_18 + 32;
  if (i_12 <= 1073741839)
    goto <bb 4>; [85.00%]
  else
    goto <bb 5>; [15.00%]

  <bb 4> [85.00%]:
  goto <bb 3>; [100.00%]


After the change, it becomes:
  <bb 2> [15.00%]:

  <bb 3> [100.00%]:
  # i_5 = PHI <m_8(D)(2), i_12(4)>
  # sh_6 = PHI <256(2), sh_10(4)>
  _18 = (unsigned int) i_5;
  _17 = _18 * 8;
  _2 = _17;
  _3 = (sizetype) _2;
  sh_10 = sh_6 >> 1;
  _4 = &a + _3;
  a[sh_10] = _4;
  i_12 = i_5 + 4;
  if (i_12 <= 1073741839)
    goto <bb 4>; [85.00%]
  else
    goto <bb 5>; [15.00%]

  <bb 4> [85.00%]:
  goto <bb 3>; [100.00%]

So it chooses 1 candidate instead of 2.  So far I think it's good?  Though we
need to compute _17 with one instruction, ivopt turns an IV into a temp var. 
Thus register pressure is better and initialization code is less.

So with new code, modulo-sched.c failed at :
          if (!schedule_reg_moves (ps))
            {
              mii = ps->ii + 1;
              free_partial_schedule (ps);
              continue;
            }
I don't know anything about modulo scheduling, not sure if mod-sched's problem
or its input is wrong.  Note doloop could also change the RTL before mod-sched.

So by disabling mod-sched, the generated assembly is:
.L2:
        sradi 10,10,1
        add 6,8,9
        sldi 7,10,3
        addi 9,9,32
        rldicl 9,9,0,32
        stdx 6,8,7
        bdnz .L2

And is changed into:
.L2:
        rldic 9,3,3,32
        sradi 10,10,1
        sldi 7,10,3
        add 9,8,9
        addi 3,3,4
        stdx 9,8,7
        extsw 3,3
        bdnz .L2

It looks like new ivopts does result in on more instruction (extsw?)  not sure
if the extension is necessary here or not?  If it's necessary and required by
ivopts decision, then it should be counted in cost (not for now).  Again, it
could be a problem in RTL passes...

Reply via email to