https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70465
Vladimir Makarov <vmakarov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vmakarov at gcc dot gnu.org --- Comment #6 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- Neither IRA/LRA, nor old RA is/was aware about generation of good code for fp stack. GCC-6 generates before IRA (more correctly before coloring in IRA): (insn 16 4 17 2 (set (reg:DF 90 [ res ]) (mem/c:DF (plus:SI (reg/f:SI 16 argp) (const_int 8 [0x8])) [1 x+0 S8 A32])) b3.c:6 126 {*movdf_internal} (nil)) (insn 17 16 8 2 (set (reg/v:DF 88 [ y ]) (mem/c:DF (reg/f:SI 16 argp) [1 y+0 S8 A32])) b3.c:6 126 {*movdf_internal} (expr_list:REG_EQUIV (mem/c:DF (reg/f:SI 16 argp) [1 y+0 S8 A32]) (nil))) while gcc-4.3 has before global/reload: (insn:HI 2 5 3 2 b3.c:6 (set (reg/v:DF 60 [ y ]) (mem/c/i:DF (reg/f:SI 16 argp) [2 y+0 S8 A32])) 102 {*movdf_nointeger} (nil)) (insn:HI 3 2 4 2 b3.c:6 (set (reg/v:DF 61 [ x ]) (mem/c/i:DF (plus:SI (reg/f:SI 16 argp) (const_int 8 [0x8])) [2 x+0 S8 A32])) 102 {*movdf_nointeger} (nil)) So gcc-4.3 was lucky to have load of y first and then x, while gcc-6 is unlucky to have load of x first and than y. There are a lot of PRs usually with tiny tests where old RA (or reload) has a better code. Unfortunately it will always be that way as RA is all about heuristics. There are no opposite PRs where reload/old RA generates worse code because it is not used anymore. In any case if we exchange x, y in the argument list, gcc-4.3 will also generate fxch. Still I think it can be fixed. update_equiv_reg transforms code 2: r88:DF=[argp:SI] 3: r89:DF=[argp:SI+0x8] 16: r90:DF=r89:DF into 16: r90:DF=[argp:SI+0x8] 17: r88:DF=[argp:SI] This is the source for fxch generation. If we exchange places of insns 16 and 17, fxch will be gone. Although I can not guarantee that there will be no new PRs as such change might result in some worse code generation.