https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932
--- Comment #22 from ktkachov at gcc dot gnu.org --- So in an attempt to make some progress on this, I've tried Jim's approach of changing PROMOTE_MODE to not convert the short modes to unsigned. Building SPEC2006 INT and looking at the generated assembly actually doesn't look too bad, with the codesize being a tiny bit smaller overall with this patch. Many load operations followed by extend are expressed as single ldrsh instructions. Then there's the ldrh+smullbb vs ldrsh+mla issue. I'm looking at the wmul-1.c testcase. The cse1 pass is the one that eliminates the sign-extends from the arms of the plus-mult. Before cse1 we have: (insn 41 40 43 3 (set (reg:SI 136 [ _12 ]) (sign_extend:SI (mem:HI (reg:SI 135)))) (insn 44 43 47 3 (set (reg:SI 138 [ _16 ]) (sign_extend:SI (mem:HI (reg:SI 144)))) (insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/u:HI (reg:SI 138 [ _16 ]) 0))) (reg/v:SI 141 [ dotp ])))) (insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0))) (reg/v:SI 153 [ sqr ])))) and cse1 transforms this into: (insn 41 40 43 3 (set (reg:SI 136 [ _12 ]) (sign_extend:SI (mem:HI (reg:SI 135 [ ivtmp.10 ]))))) (insn 44 43 47 3 (set (reg:SI 138 [ _16 ]) (sign_extend:SI (mem:HI (reg:SI 144 [ ivtmp.13 ]))))) (insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ]) (plus:SI (mult:SI (reg:SI 136 [ _12 ]) (reg:SI 138 [ _16 ])) (reg/v:SI 141 [ dotp ])))) (insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ]) (plus:SI (mult:SI (reg:SI 136 [ _12 ]) (reg:SI 136 [ _12 ])) (reg/v:SI 153 [ sqr ])))) Notice how the extends on the loads remain but the extends from the plus-mult have disappeared. Before Jim's patch the same sequence is: (insn 41 40 43 3 (set (reg:SI 136 [ _12 ]) (zero_extend:SI (mem:HI (reg:SI 135 [ ivtmp.10 ]))))) (insn 44 43 47 3 (set (reg:SI 138 [ _16 ]) (zero_extend:SI (mem:HI (reg:SI 144 [ ivtmp.13 ]))))) (insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/v:HI (reg:SI 138 [ _16 ]) 0))) (reg/v:SI 141 [ dotp ])))) (insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0))) (reg/v:SI 153 [ sqr ])))) So the sign-extends inside the plus-mult don't match the zero_extends from the loads, so it can't cse the sign-extends away. If the SImode mult-accumulate is indeed more expensive than extend-mult-accumulate on some cores, then cse and costs should have gated that transformation? Worth looking into IMO. If we fix these widening-mult-accumulate issues will Jim's arm backend patch be a valid patch correctness-wise?