https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115610

            Bug ID: 115610
           Summary: -flate-combine disabled by default for x86 port
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
                CC: crazylht at gmail dot com, hubicka at gcc dot gnu.org,
                    ubizjak at gmail dot com
  Target Milestone: ---
            Target: i?86-*-* x86_64-*-*

The late-combine pass is disabled by default for x86:

  /* Late combine tends to undo some of the effects of STV and RPAD,
     by combining instructions back to their original form.  */
  if (!OPTION_SET_P (flag_late_combine_instructions))
    flag_late_combine_instructions = 0;

To give more details, from an earlier version of the pass:

----------------------------------------------------------------
For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
        (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
            (const_vector:V4SI [
                    (const_int 0 [0]) repeated x4
                ])
            (const_int 1 [0x1]))) -1
     (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
        (subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
     (expr_list:REG_DEAD (reg:SI 120)
        (nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
        (reg:SI 118)) -1
     (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
        (sqrt:DF (mem/c:DF (symbol_ref:DI ("d2"))))) {*sqrtdf2_sse}
     (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
        (const_vector:V4SF [
                (const_double:SF 0.0 [0x0.0p+0]) repeated x4
            ])) -1
     (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
        (vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI
(\
"d2")))))
            (subreg:V2DF (reg:V4SF 108) 0)
            (const_int 1 [0x1]))) -1
     (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
        (subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
     (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.
----------------------------------------------------------------

Reply via email to