Hello all, I've been looking at a code generation issue with GCC 5.2 lately dealing with register to register moves through memory with -O3 -funroll-loops. For reference the C code is at the end of this mail. The generated code for mips is (cut down for clarity, ldc1 and sdc1 are double word floating point stores):
div.d $f8,$f6,$f4 mul.d $f2,$f8,$f8 sdc1 $f2,8($7) $L38: ldc1 $f0,8($7) <- load instead of move li $11,1 # 0x1 <snip> $L49: .... div.d $f8,$f6,$f4 addiu $11,$10,3 mul.d $f2,$f8,$f8 sdc1 $f2,8($7) ldc1 $f0,8($7) <- load instead of move $L48: mul.d $f2,$f2,$f0 <snip> $L45: mul.d $f2,$f4,$f4 mov.d $f8,$f4 j $L38 sdc1 $f2,8($7) For the basic block L38, all dominating blocks store to 8($7) which is then loaded back into another floating register. Disabling predictive commoning generates: div.d $f4,$f18,$f2 mul.d $f0,$f4,$f4 $L37: mul.d $f6,$f0,$f0 li $10,1 # 0x1 mul.d $f8,$f0,$f6 mul.d $f10,$f0,$f8 mul.d $f12,$f0,$f10 mul.d $f14,$f0,$f12 mul.d $f16,$f0,$f14 beq $4,$10,$L38 mul.d $f20,$f0,$f16 For the same basic block. Following Jeff's advice[1] to extract more information from GCC, I've narrowed the cause down to the predictive commoning pass inserting the load in a loop header style basic block. However, the next pass in GCC, tree-cunroll promptly removes the loop and joins the loop header to the body of the (non)loop. More oddly, disabling conditional store elimination pass or the dominator optimizations pass or disabling of jump-threading with --param max-jump-thread-duplication-stmts=0 nets the above assembly code. Any ideas on an approach for this issue? [1] https://gcc.gnu.org/ml/gcc-help/2015-08/msg00162.html Thanks, Simon double N; int i1; double T; double poly[9]; void g (int iterations) { int count = 0; for (count = 0; count < iterations; count++) { if (N > 1) { T = 1 / N; } else { T = N; } poly[1] = T * T; for (i1 = 2; i1 <= 8; i1++) { poly[i1] = poly[i1 - 1] * poly[1]; } } return; }