On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > This is what my patch does: > > > But it iterates through vector_insns, using a def-ref chain to find > > > those insns. I think we can just record those single_set with src as > > > const_m1/zero, and replace src for them. > > > > Will fix it. > > Fixed in the v2 patch. > > > > > > > > > /* Check the single definition of CONST0_RTX and integer > > > > CONSTM1_RTX. */ > > > > rtx src = SET_SRC (set); > > > > rtx replace; > > > > if (vector_const0 && src == CONST0_RTX (mode)) > > > > { > > > > /* Replace REG with VECTOR_CONST0. */ > > > > if (SUBREG_P (reg) || mode == zero_mode) > > > > replace = vector_const0; > > > > else > > > > replace = gen_rtx_SUBREG (mode, vector_const0, 0); > > > > *DF_REF_REAL_LOC (ref) = replace; > > > > replaced = true; > > > > zero_replaced = true; > > > > } > > > > > > > > It changed the source to a subreg directly. > > > > > > > > > Also we also need to change ix86_modes_tieable_p to make sure those > > > > > inserted subreg can be handled by LRA and other passes? > > > > > > > > ix86_modes_tieable_p is OK: > > > > > > > > /* If MODE2 is only appropriate for an SSE register, then tie with > > > > any other mode acceptable to SSE registers. */ > > > > if (GET_MODE_SIZE (mode2) == 64 > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) > > > > return (GET_MODE_SIZE (mode1) == 64 > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1)); > > > > if (GET_MODE_SIZE (mode2) == 32 > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) > > > > return (GET_MODE_SIZE (mode1) == 32 > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1)); > > > > if (GET_MODE_SIZE (mode2) == 16 > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) > > > > return (GET_MODE_SIZE (mode1) == 16 > > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1)); > > > > > > > It's ok only size of mode1 is equal to size of mode2. > > > But in the testcase, there are different size vectors(32-bytes, 16-bytes). > > > > > > So it would be better as, for mode2 >= 16 bytes, it can only be put > > > into SSE_REGS(except for TImode, but TImode still can be tied to > > > <=16bytes mode1 which can be put into SSE_REGS) , if mode1 can also be > > > put into SSE_REGS, then mode2 tie with mode1. > > > > > > /* If MODE2 is only appropriate for an SSE register, then tie with > > > any other mode acceptable to SSE registers. */ > > > - if (GET_MODE_SIZE (mode2) == 64 > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) > > > - return (GET_MODE_SIZE (mode1) == 64 > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1)); > > > - if (GET_MODE_SIZE (mode2) == 32 > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) > > > - return (GET_MODE_SIZE (mode1) == 32 > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1)); > > > - if (GET_MODE_SIZE (mode2) == 16 > > > + if (GET_MODE_SIZE (mode2) >= 16 > > > + && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2) > > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) > > > - return (GET_MODE_SIZE (mode1) == 16 > > > - && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1)); > > > + return ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1); > > > > > > > This caused: > > > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movd > > FAIL: gcc.target/i386/pr111267.c scan-assembler-not movq > > FAIL: gcc.target/i386/pr82580.c scan-assembler-not \\mmovzb > > > > since GCC thinks it is cheap to get QI/HI/SI/DI from TI in XMM. > > I am testing: > > > > /* If MODE2 is only appropriate for an SSE register, then tie with > > any other mode acceptable to SSE registers, excluding > > (subreg:QI (reg:TI 99) 0)) > > (subreg:HI (reg:TI 99) 0)) > > (subreg:SI (reg:TI 99) 0)) > > (subreg:DI (reg:TI 99) 0)) > > to avoid unnecessary move from SSE register to integer register. > > */ > > if (GET_MODE_SIZE (mode2) >= 16 > > && (GET_MODE_SIZE (mode1) == GET_MODE_SIZE (mode2) > > || (!INTEGRAL_MODE_P (mode1) > > && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2))) > > && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2)) It looks like only scalar_int_mode_p should be exclude, not vector_mode? so, how about if (GET_MODE_SIZE (mode2) >= 16 && (GET_MODE_SIZE (mode1) == GET_MODE_SIZE (mode2) || ((VECTOR_MODE_P (mode1) || SCALAR_FLOAT_MODE_P (mode1)) && GET_MODE_SIZE (mode1) <= GET_MODE_SIZE (mode2))) && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
Only allow vector mode or scalar floating point in mode1 to be tied to mode2 in SSE_REGS? >+ /* NB: Don't run recog_memoized here since vector SUBREG may not >+ be valid. Let LRA handle vector SUBREG. */ It's tricky, we can insert a move pattern with dest same size as SET_SRC (set), but with same component mode as constm1/constm0, so it should be something like mov v8qi, const0(v32qimode) mov v2sf, subreg:v2sf (v8qi) I think LRA could eliminate the redundant move. >+static void >+ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs) >+{ For the convenience of maintain, can we also replace the corresponding code in the remove_partial_avx_dependency function with a call to this ix86_place_single_vector_set function +} >+ SET_SRC (set) = replace; >+ /* Drop possible dead definitions. */ >+ PATTERN (insn) = set; >+ df_insn_rescan (insn); > > return ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1); > > > > Tested. There is no regression. > > Here is the v2 patch. Ok for master? > > Thanks. > > -- > H.J. -- BR, Hongtao