On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.w...@intel.com> wrote: > > Under APX NDD, previous TImode allocation will have issue that it was > originally allocated using continuous pair, like rax:rdi, rdi:rdx. > > This will cause issue for all TImode NDD patterns. For NDD we will not > assume the arithmetic operations like add have dependency between dest > and src1, then write to 1st highpart rdi will be overrided by the 2nd > lowpart rdi if 2nd lowpart rdi have different src as input, then the write > to 1st highpart rdi will missed and cause miscompliation. > > To resolve this, under TARGET_APX_NDD we'd only allow register with even > regno to be allocated with TImode, then TImode registers will be allocated > with non-overlapping pairs.
Perhaps you could use earlyclobber with __doubleword instructions: (define_insn_and_split "*add<dwi>3_doubleword" [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r") (plus:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0") (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o"))) (clobber (reg:CC FLAGS_REG))] For the above pattern, you can add earlyclobbered &r output alternative that guarantees that output won't be allocated to any of the input registers. Uros. > There could be some error for inline assembly if it forcely allocate __int128 > with odd number general register. > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_hard_regno_mode_ok): Restrict even regno > for TImode if APX NDD enabled. > --- > gcc/config/i386/i386.cc | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 93a9cb556a5..3efeed396c4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -20873,6 +20873,16 @@ ix86_hard_regno_mode_ok (unsigned int regno, > machine_mode mode) > return true; > return !can_create_pseudo_p (); > } > + /* With TImode we previously have assumption that src1/dest will use same > + register, so the allocation of highpart/lowpart can be consecutive, and > + 2 TImode insn would held their low/highpart in continuous sequence like > + rax:rdx, rdx:rcx. This will not work for APX_NDD since NDD allows > + different registers as dest/src1, when writes to 2nd lowpart will impact > + the writes to 1st highpart, then the insn will be optimized out. So for > + TImode pattern if we support NDD form, the allowed register number > should > + be even to avoid such mixed high/low part override. */ > + else if (TARGET_APX_NDD && mode == TImode) > + return regno % 2 == 0; > /* We handle both integer and floats in the general purpose registers. */ > else if (VALID_INT_MODE_P (mode) > || VALID_FP_MODE_P (mode)) > -- > 2.31.1 >