https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115610
Bug ID: 115610 Summary: -flate-combine disabled by default for x86 port Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org CC: crazylht at gmail dot com, hubicka at gcc dot gnu.org, ubizjak at gmail dot com Target Milestone: --- Target: i?86-*-* x86_64-*-* The late-combine pass is disabled by default for x86: /* Late combine tends to undo some of the effects of STV and RPAD, by combining instructions back to their original form. */ if (!OPTION_SET_P (flag_late_combine_instructions)) flag_late_combine_instructions = 0; To give more details, from an earlier version of the pass: ---------------------------------------------------------------- For example, gcc.target/i386/minmax-6.c tests whether the code compiles without any spilling. The RTL created by STV contains: (insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0) (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116)) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (const_int 1 [0x1]))) -1 (nil)) (insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0) (subreg:V4SI (reg:SI 120) 0)) {movv4si_internal} (expr_list:REG_DEAD (reg:SI 120) (nil))) (insn 34 3 32 2 (set (reg/v:SI 108 [ y ]) (reg:SI 118)) -1 (nil)) and it's crucial for the test that reg 108 is kept, rather than propagated into uses. As things stand, 118 can be allocated a vector register and 108 a scalar register. If 108 is propagated, there will be scalar and vector uses of 118, and so it will be spilled to memory. and it's crucial for the test that reg 108 is kept, rather than propagated into uses. As things stand, 118 can be allocated a vector register and 108 a scalar register. If 108 is propagated, there will be scalar and vector uses of 118, and so it will be spilled to memory. That one could be solved by running STV2 later. But RPAD is a bigger problem. In gcc.target/i386/pr87007-5.c, RPAD converts: (insn 27 26 28 6 (set (reg:DF 100 [ _15 ]) (sqrt:DF (mem/c:DF (symbol_ref:DI ("d2"))))) {*sqrtdf2_sse} (nil)) into: (insn 45 26 44 6 (set (reg:V4SF 108) (const_vector:V4SF [ (const_double:SF 0.0 [0x0.0p+0]) repeated x4 ])) -1 (nil)) (insn 44 45 27 6 (set (reg:V2DF 109) (vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI (\ "d2"))))) (subreg:V2DF (reg:V4SF 108) 0) (const_int 1 [0x1]))) -1 (nil)) (insn 27 44 28 6 (set (reg:DF 100 [ _15 ]) (subreg:DF (reg:V2DF 109) 0)) {*movdf_internal} (nil)) But both the pre-RA and post-RA passes are able to combine these instructions back to the original form. ----------------------------------------------------------------