On Tue, Apr 29, 2025 at 10:08 AM Hongtao Liu <[email protected]> wrote:
>
> On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu <[email protected]> wrote:
> >
> > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu <[email protected]> wrote:
> > >
> >
> > > > > This is what my patch does:
> > > > But it iterates through vector_insns, using a def-ref chain to find
> > > > those insns. I think we can just record those single_set with src as
> > > > const_m1/zero, and replace src for them.
> > >
> > > Will fix it.
> >
> > Fixed in the v2 patch.
> >
> > > > >
> > > > > /* Check the single definition of CONST0_RTX and integer
> > > > > CONSTM1_RTX. */
> > > > > rtx src = SET_SRC (set);
> > > > > rtx replace;
> > > > > if (vector_const0 && src == CONST0_RTX (mode))
> > > > > {
> > > > > /* Replace REG with VECTOR_CONST0. */
> > > > > if (SUBREG_P (reg) || mode == zero_mode)
> > > > > replace = vector_const0;
> > > > > else
> > > > > replace = gen_rtx_SUBREG (mode, vector_const0, 0);
> > > > > *DF_REF_REAL_LOC (ref) = replace;
> > > > > replaced = true;
> > > > > zero_replaced = true;
> > > > > }
> > > > >
> > > > > It changed the source to a subreg directly.
> > > > >
> > > > > > Also we also need to change ix86_modes_tieable_p to make sure those
> > > > > > inserted subreg can be handled by LRA and other passes?
> > > > >
> > > > > ix86_modes_tieable_p is OK:
> > > > >
> > > > > /* If MODE2 is only appropriate for an SSE register, then tie with
> > > > > any other mode acceptable to SSE registers. */
> > > > > if (GET_MODE_SIZE (mode2) == 64
> > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > return (GET_MODE_SIZE (mode1) == 64
> > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > if (GET_MODE_SIZE (mode2) == 32
> > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > return (GET_MODE_SIZE (mode1) == 32
> > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > > if (GET_MODE_SIZE (mode2) == 16
> > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > > return (GET_MODE_SIZE (mode1) == 16
> > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > >
> > > > It's ok only size of mode1 is equal to size of mode2.
> > > > But in the testcase, there are different size vectors(32-bytes,
> > > > 16-bytes).
> > > >
> > > > So it would be better as, for mode2 >= 16 bytes, it can only be put
> > > > into SSE_REGS(except for TImode, but TImode still can be tied to
> > > > <=16bytes mode1 which can be put into SSE_REGS) , if mode1 can also be
> > > > put into SSE_REGS, then mode2 tie with mode1.
> > > >
> > > > /* If MODE2 is only appropriate for an SSE register, then tie with
> > > > any other mode acceptable to SSE registers. */
> > > > - if (GET_MODE_SIZE (mode2) == 64
> > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > - return (GET_MODE_SIZE (mode1) == 64
> > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > - if (GET_MODE_SIZE (mode2) == 32
> > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > - return (GET_MODE_SIZE (mode1) == 32
> > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > - if (GET_MODE_SIZE (mode2) == 16
> > > > + if (GET_MODE_SIZE (mode2) >= 16
> > > > + && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2)
> > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> > > > - return (GET_MODE_SIZE (mode1) == 16
> > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
> > > > + return ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1);
> > > >
> > >
> > > This caused:
> > >
> > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movd
> > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movq
> > > FAIL: gcc.target/i386/pr82580.c scan-assembler-not \\mmovzb
> > >
> > > since GCC thinks it is cheap to get QI/HI/SI/DI from TI in XMM.
> > > I am testing:
> > >
> > > /* If MODE2 is only appropriate for an SSE register, then tie with
> > > any other mode acceptable to SSE registers, excluding
> > > (subreg:QI (reg:TI 99) 0))
> > > (subreg:HI (reg:TI 99) 0))
> > > (subreg:SI (reg:TI 99) 0))
> > > (subreg:DI (reg:TI 99) 0))
> > > to avoid unnecessary move from SSE register to integer register.
> > > */
> > > if (GET_MODE_SIZE (mode2) >= 16
> > > && (GET_MODE_SIZE (mode1) == GET_MODE_SIZE (mode2)
> > > || (!INTEGRAL_MODE_P (mode1)
> > > && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2)))
> > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
> It looks like only scalar_int_mode_p should be exclude, not vector_mode?
> so, how about
> if (GET_MODE_SIZE (mode2) >= 16
> && (GET_MODE_SIZE (mode1) == GET_MODE_SIZE (mode2)
> || ((VECTOR_MODE_P (mode1) || SCALAR_FLOAT_MODE_P (mode1))
> && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2)))
> && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
>
> Only allow vector mode or scalar floating point in mode1 to be tied to
> mode2 in SSE_REGS?
Will fix.
>
> >+ /* NB: Don't run recog_memoized here since vector SUBREG may not
> >+ be valid. Let LRA handle vector SUBREG. */
> It's tricky, we can insert a move pattern with dest same size as
> SET_SRC (set), but with same component mode as constm1/constm0, so it
> should be something like
> mov v8qi, const0(v32qimode)
> mov v2sf, subreg:v2sf (v8qi)
IIt is exactly what my patch generates:
(insn 31 2 5 2 (set (reg:V32QI 110)
(const_vector:V32QI [
(const_int 0 [0]) repeated x32
])) -1
(nil))
(insn 5 31 6 2 (set (reg:V2DF 98)
(subreg:V2DF (reg:V32QI 110) 0)) "z2.c":35:6 2421 {movv2df_internal}
(nil))
(insn 6 5 7 2 (set (mem/c:V2DF (symbol_ref:DI ("d1") [flags 0x2]
<var_decl 0x7fd11d6341c8 d1>) [1 d1+0 S16 A128])
(reg:V2DF 98)) "z2.c":35:6 2421 {movv2df_internal}
(expr_list:REG_DEAD (reg:V2DF 98)
(nil)))
(insn 7 6 8 2 (set (reg:V4SF 99)
(subreg:V4SF (reg:V32QI 110) 0)) "z2.c":36:6 2418 {movv4sf_internal}
(nil))
(insn 8 7 9 2 (set (mem/c:V4SF (symbol_ref:DI ("f1") [flags 0x2]
<var_decl 0x7fd11d634130 f1>) [2 f1+0 S16 A128])
(reg:V4SF 99)) "z2.c":36:6 2418 {movv4sf_internal}
(expr_list:REG_DEAD (reg:V4SF 99)
(nil)))
...
and LRA does eliminate the redundant moves.
> I think LRA could eliminate the redundant move.
>
> >+static void
> >+ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs)
> >+{
> For the convenience of maintain, can we also replace the corresponding
> code in the remove_partial_avx_dependency function with a call to this
> ix86_place_single_vector_set function
Will fix.
Thanks.
> +}
>
> >+ SET_SRC (set) = replace;
> >+ /* Drop possible dead definitions. */
> >+ PATTERN (insn) = set;
> >+ df_insn_rescan (insn);
>
> > > return ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1);
> > >
> >
> > Tested. There is no regression.
> >
> > Here is the v2 patch. Ok for master?
> >
> > Thanks.
> >
> > --
> > H.J.
>
>
>
> --
> BR,
> Hongtao
--
H.J.