https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932

--- Comment #22 from ktkachov at gcc dot gnu.org ---
So in an attempt to make some progress on this, I've tried Jim's approach of
changing PROMOTE_MODE to not convert the short modes to unsigned.

Building SPEC2006 INT and looking at the generated assembly actually doesn't
look too bad, with the codesize being a tiny bit smaller overall with this
patch.
Many load operations followed by extend are expressed as single ldrsh
instructions.

Then there's the ldrh+smullbb vs ldrsh+mla issue. I'm looking at the wmul-1.c
testcase. The cse1 pass is the one that eliminates the sign-extends from the
arms of the plus-mult. Before cse1 we have:
(insn 41 40 43 3 (set (reg:SI 136 [ _12 ])
        (sign_extend:SI (mem:HI (reg:SI 135))))
(insn 44 43 47 3 (set (reg:SI 138 [ _16 ])
        (sign_extend:SI (mem:HI (reg:SI 144))))

(insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ])
        (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ])
0))
                (sign_extend:SI (subreg/s/u:HI (reg:SI 138 [ _16 ]) 0)))
            (reg/v:SI 141 [ dotp ]))))
(insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ])
        (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ])
0))
                (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0)))
            (reg/v:SI 153 [ sqr ]))))

and cse1 transforms this into:
(insn 41 40 43 3 (set (reg:SI 136 [ _12 ])
        (sign_extend:SI (mem:HI (reg:SI 135 [ ivtmp.10 ])))))
(insn 44 43 47 3 (set (reg:SI 138 [ _16 ])
        (sign_extend:SI (mem:HI (reg:SI 144 [ ivtmp.13 ])))))
(insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ])
        (plus:SI (mult:SI (reg:SI 136 [ _12 ])
                (reg:SI 138 [ _16 ]))
            (reg/v:SI 141 [ dotp ]))))
(insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ])
        (plus:SI (mult:SI (reg:SI 136 [ _12 ])
                (reg:SI 136 [ _12 ]))
            (reg/v:SI 153 [ sqr ]))))

Notice how the extends on the loads remain but the extends from the plus-mult
have disappeared.
Before Jim's patch the same sequence is:
(insn 41 40 43 3 (set (reg:SI 136 [ _12 ])
        (zero_extend:SI (mem:HI (reg:SI 135 [ ivtmp.10 ])))))
(insn 44 43 47 3 (set (reg:SI 138 [ _16 ])
        (zero_extend:SI (mem:HI (reg:SI 144 [ ivtmp.13 ])))))
(insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ])
        (plus:SI (mult:SI (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ])
0))
                (sign_extend:SI (subreg/s/v:HI (reg:SI 138 [ _16 ]) 0)))
            (reg/v:SI 141 [ dotp ]))))
(insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ])
        (plus:SI (mult:SI (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ])
0))
                (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0)))
            (reg/v:SI 153 [ sqr ]))))

So the sign-extends inside the plus-mult don't match the zero_extends from the
loads, so it can't cse the sign-extends away.
If the SImode mult-accumulate is indeed more expensive than
extend-mult-accumulate on some cores, then cse and costs should have
gated that transformation? Worth looking into IMO. If we fix these
widening-mult-accumulate issues will Jim's arm backend patch
 be a valid patch correctness-wise?

Reply via email to