https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
We currently generate (for -O2 -m64, -O3 unrolls it completely, see comment 7)

        li 9,8
        mtctr 9
        .p2align 4,,15
.L2:
        stfs 1,0(3)
        addi 3,3,4
        bdnz .L2
        blr



and for -m32 we get

        li 9,8
        addi 3,3,-4
        mtctr 9
        .p2align 4,,15
.L2:
        stfsu 1,4(3)
        bdnz .L2
        blr




The difference is partly the selected -mcpu=, but that doesn't explain it
completely.

The gimple passes (probably ivopts) have decided to do a pre_inc here; all
differences are at RTL level.  Except for -mcpu=power9 they didn't.

A case where it works as expected, -O2 -m32 -mcpu=power4, the auto_inc_dec
pass does not help (this is caused by rtx_cost issues):

starting bb 3
   11: [r122:SI]=r127:SF
   11: [r122:SI]=r127:SF
found mem(11) *(r[122]+0)
   10: r122:SI=r122:SI+0x4
   10: r122:SI=r122:SI+0x4
found pre inc(10) r[122]+=4
   11: [r122:SI]=r127:SF
found mem(11) *(r[122]+0)
trying SIMPLE_PRE_INC
cost failure old=16 new=408

(I have a patch for that).



but then combine comes along and does

Trying 10 -> 11:
   10: r122:SI=r122:SI+0x4
   11: [r122:SI]=r127:SF
Successfully matched this instruction:
(parallel [
        (set (mem:SF (plus:SI (reg:SI 122 [ ivtmp.10 ])
                    (const_int 4 [0x4])) [1 MEM[base: _17, offset: 0B]+0 S4
A32])
            (reg/v:SF 127 [ d ]))
        (set (reg:SI 122 [ ivtmp.10 ])
            (plus:SI (reg:SI 122 [ ivtmp.10 ])
                (const_int 4 [0x4])))
    ])
allowing combination of insns 10 and 11
original costs 4 + 4 = 8
replacement cost 4



-m64 however says

Trying 10 -> 11:
   10: r122:DI=r122:DI+0x4
   11: [r122:DI]=r127:SF
Failed to match this instruction:
(parallel [
        (set (mem:SF (plus:DI (reg:DI 122 [ ivtmp.11 ])
                    (const_int 4 [0x4])) [1 MEM[base: _17, offset: 0B]+0 S4
A32])
            (reg/v:SF 127 [ d ]))
        (set (reg:DI 122 [ ivtmp.11 ])
            (plus:DI (reg:DI 122 [ ivtmp.11 ])
                (const_int 4 [0x4])))
    ])



Oh dear, we do not have the float load/store-with-update instructions for -m64.
On all modern 64-bit CPUs these are cracked, so they execute the same as the
separate addi and store instructions, but it costs code space.  And if we do
not want them we should make them more expensive, not just pretend the insns
do not exist :-)

Reply via email to