Hello all,

I've been looking at a code generation issue with GCC 5.2 lately dealing with 
register to register moves through memory with -O3 -funroll-loops. For 
reference the C code is at the end of this mail. The generated code for mips is 
(cut down for clarity, ldc1 and sdc1 are double word floating point stores):

        div.d   $f8,$f6,$f4
        mul.d   $f2,$f8,$f8
        sdc1    $f2,8($7)
$L38:
        ldc1    $f0,8($7)               <- load instead of move
        li      $11,1                   # 0x1
        
        <snip>
$L49:
        ....
        div.d   $f8,$f6,$f4
        addiu   $11,$10,3
        mul.d   $f2,$f8,$f8
        sdc1    $f2,8($7)
        ldc1    $f0,8($7)               <- load instead of move
$L48:
        mul.d   $f2,$f2,$f0

        <snip>


$L45:
        mul.d   $f2,$f4,$f4
        mov.d   $f8,$f4
        j       $L38
        sdc1    $f2,8($7)

For the basic block L38, all dominating blocks store to 8($7) which is then 
loaded back into another floating register. 

Disabling predictive commoning generates:

        div.d   $f4,$f18,$f2
        mul.d   $f0,$f4,$f4
$L37:
        mul.d   $f6,$f0,$f0
        li      $10,1                   # 0x1
        mul.d   $f8,$f0,$f6
        mul.d   $f10,$f0,$f8
        mul.d   $f12,$f0,$f10
        mul.d   $f14,$f0,$f12
        mul.d   $f16,$f0,$f14
        beq     $4,$10,$L38
        mul.d   $f20,$f0,$f16

For the same basic block. 

Following Jeff's advice[1] to extract more information from GCC, I've narrowed 
the cause down to the predictive commoning pass inserting the load in a loop 
header style basic block. However, the next pass in GCC, tree-cunroll promptly 
removes the loop and joins the loop header to the body of the (non)loop. More 
oddly, disabling conditional store elimination pass or the dominator 
optimizations pass or disabling of jump-threading with --param 
max-jump-thread-duplication-stmts=0 nets the above assembly code. Any ideas on 
an approach for this issue?

[1] https://gcc.gnu.org/ml/gcc-help/2015-08/msg00162.html

Thanks,
Simon

double N;
int i1;
double T;
double poly[9];


void 
g (int iterations)
{ 
  int count = 0;
  for (count = 0; count < iterations; count++)
    {
      if (N > 1)
        {
          T = 1 / N;
        }
      else
        {
          T = N;
        }

      poly[1] = T * T;
      for (i1 = 2; i1 <= 8; i1++)
        {
          poly[i1] = poly[i1 - 1] * poly[1];
        }
    }

  return;
}

Reply via email to