https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #9) > (In reply to Jakub Jelinek from comment #8) > > The #c5 patch obviously doesn't help here, because the testcase triggers the > > last of these 4 peephole2s. But #c7 works. > > Thanks! It looks like we'll have to live with extra stores then... Can't we improve it in the combiner? For PR71245 testcase obviously, we have: (insn 5 2 6 2 (parallel [ (set (reg:DI 89 [ _4 ]) (unspec:DI [ (mem/v:DI (symbol_ref:SI ("d") [flags 0x2] <var_decl 0x7fcf8ee5c510 d>) [-1 S8 A64]) ] UNSPEC_LDA)) (clobber (mem/c:DI (plus:SI (reg/f:SI 20 frame) (const_int -8 [0xfffffffffffffff8])) [0 S8 A64])) (clobber (scratch:DF)) ]) "/usr/include/c++/6.3.1/atomic":235 4970 {atomic_loaddi_fpu} (nil)) ... (insn 8 7 9 2 (set (reg:DF 91) (plus:DF (subreg:DF (reg:DI 89 [ _4 ]) 0) (reg:DF 92))) "pr71245.C":5 805 {*fop_df_comm} (expr_list:REG_DEAD (reg:DF 92) (expr_list:REG_DEAD (reg:DI 89 [ _4 ]) (nil)))) and apparently the combiner attempts to match: (set (reg:DF 92) (subreg:DF (unspec:DI [ (mem/v:DI (symbol_ref:SI ("d") [flags 0x2] <var_decl 0x7fcf8ee5c510 d>) [-1 S8 A64]) ] UNSPEC_LDA) 0)) Perhaps if we had such a pattern that we'd split into a normal DFmode load (perhaps with unspec before reload to guarantee it is atomic load), we wouldn't need the temporary at all?