On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.w...@intel.com> wrote:
>
> Under APX NDD, previous TImode allocation will have issue that it was
> originally allocated using continuous pair, like rax:rdi, rdi:rdx.
>
> This will cause issue for all TImode NDD patterns. For NDD we will not
> assume the arithmetic operations like add have dependency between dest
> and src1, then write to 1st highpart rdi will be overrided by the 2nd
> lowpart rdi if 2nd lowpart rdi have different src as input, then the write
> to 1st highpart rdi will missed and cause miscompliation.
>
> To resolve this, under TARGET_APX_NDD we'd only allow register with even
> regno to be allocated with TImode, then TImode registers will be allocated
> with non-overlapping pairs.

Perhaps you could use earlyclobber with __doubleword instructions:

(define_insn_and_split "*add<dwi>3_doubleword"
  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
    (plus:<DWI>
      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
   (clobber (reg:CC FLAGS_REG))]

For the above pattern, you can add earlyclobbered &r output
alternative that guarantees that output won't be allocated to any of
the input registers.

Uros.

> There could be some error for inline assembly if it forcely allocate __int128
> with odd number general register.
>
> gcc/ChangeLog:
>
>         * config/i386/i386.cc (ix86_hard_regno_mode_ok): Restrict even regno
>         for TImode if APX NDD enabled.
> ---
>  gcc/config/i386/i386.cc | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 93a9cb556a5..3efeed396c4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -20873,6 +20873,16 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>         return true;
>        return !can_create_pseudo_p ();
>      }
> +  /* With TImode we previously have assumption that src1/dest will use same
> +     register, so the allocation of highpart/lowpart can be consecutive, and
> +     2 TImode insn would held their low/highpart in continuous sequence like
> +     rax:rdx, rdx:rcx. This will not work for APX_NDD since NDD allows
> +     different registers as dest/src1, when writes to 2nd lowpart will impact
> +     the writes to 1st highpart, then the insn will be optimized out. So for
> +     TImode pattern if we support NDD form, the allowed register number 
> should
> +     be even to avoid such mixed high/low part override. */
> +  else if (TARGET_APX_NDD && mode == TImode)
> +    return regno % 2 == 0;
>    /* We handle both integer and floats in the general purpose registers.  */
>    else if (VALID_INT_MODE_P (mode)
>            || VALID_FP_MODE_P (mode))
> --
> 2.31.1
>

Reply via email to