Re: [PATCH] [x86]Refine constraint "Bk" to define_special_memory_constraint.

2024-07-25 Thread Hongtao Liu
On Wed, Jul 24, 2024 at 3:57 PM liuhongt  wrote:
>
> For below pattern, RA may still allocate r162 as v/k register, try to
> reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi
> which result a linker error.
>
> (set (reg:DI 162)
>  (mem/u/c:DI
>(const:DI (unspec:DI
>  [(symbol_ref:DI ("a") [flags 0x60]   a>)]
>  UNSPEC_GOTNTPOFF))
>
> Quote from H.J for why linker issue an error.
> >What do these do:
> >
> >leaq__libc_tsd_CTYPE_B@gottpoff(%rip), %rax
> >vmovq   (%rax), %xmm0
> >
> >From x86-64 TLS psABI:
> >
> >The assembler generates for the x@gottpoff(%rip) expressions a R X86
> >64 GOTTPOFF relocation for the symbol x which requests the linker to
> >generate a GOT entry with a R X86 64 TPOFF64 relocation. The offset of
> >the GOT entry relative to the end of the instruction is then used in
> >the instruction. The R X86 64 TPOFF64 relocation is pro- cessed at
> >program startup time by the dynamic linker by looking up the symbol x
> >in the modules loaded at that point. The offset is written in the GOT
> >entry and later loaded by the addq instruction.
> >
> >The above code sequence looks wrong to me.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk and backport?
>
> gcc/ChangeLog:
>
> PR target/116043
> * config/i386/constraints.md (Bk): Refine to
> define_special_memory_constraint.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr116043.c: New test.
> ---
>  gcc/config/i386/constraints.md   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr116043.c | 33 
>  2 files changed, 34 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116043.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 7508d7a58bd..b760e7c221a 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -187,7 +187,7 @@ (define_special_memory_constraint "Bm"
>"@internal Vector memory operand."
>(match_operand 0 "vector_memory_operand"))
>
> -(define_memory_constraint "Bk"
> +(define_special_memory_constraint "Bk"
>"@internal TLS address that allows insn using non-integer registers."
>(and (match_operand 0 "memory_operand")
> (not (match_test "ix86_gpr_tls_address_pattern_p (op)"
> diff --git a/gcc/testsuite/gcc.target/i386/pr116043.c 
> b/gcc/testsuite/gcc.target/i386/pr116043.c
> new file mode 100644
> index 000..76553496c10
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116043.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bf16 -O3" } */
> +/* { dg-final { scan-assembler-not {(?n)lea.*@gottpoff} } } */
> +
> +extern __thread int a, c, i, j, k, l;
> +int *b;
> +struct d {
> +  int e;
> +} f, g;
> +char *h;
> +
> +void m(struct d *n) {
> +  b = &k;
> +  for (; n->e; b++, n--) {
> +i = b && a;
> +if (i)
> +  j = c;
> +  }
> +}
> +
> +char *o(struct d *n) {
> +  for (; n->e;)
> +return h;
> +}
> +
> +int q() {
> +  if (l)
> +return 1;
> +  int p = *o(&g);
> +  m(&f);
> +  m(&g);
> +  l = p;
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-25 Thread Kewen.Lin
Hi Carl,

Some minor comments are inlined on top of Segher's and Peter's comments.

on 2024/7/20 04:04, Carl Love wrote:
> GCC developers:
> 
> The following patch adds the int128 varients to the existing overloaded 
> built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
> vec_sro.  These varients were requested by Steve Munroe.
> 
> The patch has been tested on a Power 10 system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.
> 
>    Carl
> 
> 
> ---
>  rs6000, Add new overloaded vector shift builtin int128 varients
> 
> Add the signed __int128 and unsigned __int128 argument types for the
> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
> testcase and update the documentation for the built-in.
> 
> Add the missing internal names for the float and double types for
> overloaded builtin vec_sld for the float and double types.

This isn't needed, see below explanation.

> 
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vsdb_): Change
>     define_insn iterator to VEC_IC.
>     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
>     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
>     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
>     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
>     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
>     definitions.
>     (vec_sld): Add missing internal names.
>     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
>     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
>     built-ins.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
>     file.
> ---
>  gcc/config/rs6000/altivec.md  |   6 +-
>  gcc/config/rs6000/rs6000-builtins.def |  12 +
>  gcc/config/rs6000/rs6000-overload.def |  44 ++-
>  gcc/doc/extend.texi   |  42 +++
>  .../vec-shift-double-runnable-int128.c    | 349 ++
>  5 files changed, 448 insertions(+), 5 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 5af9bf920a2..2a18ee44526 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> 
>  (define_insn "vsdb_"
> - [(set (match_operand:VI2 0 "register_operand" "=v")
> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> -       (match_operand:VI2 2 "register_operand" "v")
> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> +       (match_operand:VEC_IC 2 "register_operand" "v")
>     (match_operand:QI 3 "const_0_to_12_operand" "n")]
>    VSHIFT_DBL_LR))]
>    "TARGET_POWER10"
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..fbb6e1ddf85 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -964,6 +964,9 @@
>    const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
>  VSLDOI_8HI altivec_vsldoi_v8hi {}
> 
> +  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
> +    VSLDOI_V1TI altivec_vsldoi_v1ti {}
> +
>    const vss __builtin_altivec_vslh (vss, vus);
>  VSLH vashlv8hi3 {}
> 
> @@ -1831,6 +1834,9 @@
>    const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
>  XXSLDWI_2DI vsx_xxsldwi_v2di {}
> 
> +  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
> +    XXSLDWI_Q vsx_xxsldwi_v1ti {}
> +
>    const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
>  XXSLDWI_4SF vsx_xxsldwi_v4sf {}
> 
> @@ -3299,6 +3305,9 @@
>    const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
>  VSLDB_V8HI vsldb_v8hi {}
> 
> +  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
> +    VSLDB_V1TI vsldb_v1ti {}
> +
>    const vsq __builtin_altivec_vslq (vsq, vuq);
>  VSLQ vashlv1ti3 {}
> 
> @@ -3317,6 +3326,9 @@
>    const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
>  VSRDB_V8HI vsrdb_v8hi {}
> 
> +  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
> +    VSRDB_V1TI vsrdb_v1ti {}
> +
>    const vsq __builtin_altivec_vsrq (vsq, vuq);
>  VSRQ vlshrv1ti3 {}
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index c4ecafc6f7e..302e0232533 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs

Re: [PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:32, Carl Love wrote:
> GCC maintainers:
> 
> The code generated by using C-code to set a vector element versus using a 
> built-in has been investigated.  The assembly code generated from the C-code 
> is as good or better than the assembly code generated for the built-ins for 
> both the -O0 and -O3 levels of optimization.
> 
> For the vec_insert built-in bif whose resolving makes use of the vec_set bif 
> previously, is now removed, is as good as before with optimization.
> 
> This two patch series removes the __builtin_vec_set_v1ti, 
> __builtin_vec_set_v2df, __builtin_vec_set_v2di and  built-ins 
> __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
> built-ins in favor of using C-code instead.  The built-ins use the built-in 
> set attribute in the definitions of the built-ins.  With the removal of these 
> 6 built-ins, the set built-in attribute is no longer used and the related 
> code for the attribute is removed.
> 
> The patch, first patch in this series, to remove the __builtin_vec_set_v1ti, 
> __builtin_vec_set_v2df, __builtin_vec_set_v2di was previously posted.  The 
> feedback on the patch was that we could also remove set bif attribute.  
> Removal of the set bif attribute requires also removing the 
> __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
> built-ins.  The second patch removes the vsx set built-ins and the now no 
> longer used set built-in attribute and associated code.
> 
> The patches have been tested on a Power 10 LE system with no regressions.

It would be good to test this on BE as well (both 64-bit and 32-bit).

BR,
Kewen



Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:52, Carl Love wrote:
> GCC maintainers:
> 
> This patch removes the vsx set built-ins: __builtin_vsx_set_1ti, 
> __builtin_vsx_set_2df, __builtin_vsx_set_2di.  With the  removal of these 
> built-ins, the built-in attribute "set", used in the built-in definition 
> file, is no longer needed.  The "set"  and the associated code for the "set" 
> is removed.
> 
> The assembly code generated by using C code to set an element of a vector 
> versus using the vsx set built-in to set an element was investigated.  With 
> -O0 optimization the generated assmenly code is comparable in therms of the 
> generated assembly instrucitons and number of instructions.  For the -O3 
> optimization level, the 2DI an 2DF cases the built-ins and the C code 
> generate identical assembly code.  The assembly code generated for the 1TI 
> case for the C code has one less instruction.  The built-in generates an 
> extra load instruction.  Hence, the C code is better as it has fewer load 
> instructions.
> 
> The testcase for the __builtin_vsx_set_2df is removed.  The other built-ins 
> do not have testcases.
> 
> The patch has been tested on a Power 10 LE system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> 
> --
> rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, 
> __builtin_vsx_set_2di
> 
> The built-ins set a value in a vector.  The same operation can be done
> in C-code.  The assembly code generated from the C-code is as good or
> better than the code generated by the built-ins.  With default
> optimization the number of assembly generated for the two methods are
> similar.  With -O3 optimization, the assembly generated for the two
> approaches is identical for the 2DF and 2DI types.  The assembly for
> the C-code version of the 1Ti requres one less assembly instruction.

Nit: s/requres/requires/

> It also only uses one load versus two loads for the built-in.
> 
> With the removal of the built-ins, there are no other uses of the
> set built-in attribute.  The code associated with the set built-in
> attribute is removed.
> 
> Finally, the testcase for the __builtin_vsx_set_2df is removed.  The
> other built-ins do not have testcases.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtin.cc (get_element_number,
>     altivec_expand_vec_set_builtin): Remove functions.
>     (rs6000_expand_builtin): Remove the if statement to call
>     altivec_expand_vec_set_builtin.
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti,
>     __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the
>     built-in definitions.
>     * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo):
>     Remove the isset variable from the structure.
>     (parse_bif_attrs): Remove the uses of the isset variable.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the
>     __builtin_vsx_set_2df built-in.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   | 53 ---
>  gcc/config/rs6000/rs6000-builtins.def | 10 
>  gcc/config/rs6000/rs6000-gen-builtins.cc  | 29 --
>  .../gcc.target/powerpc/vsx-builtin-3.c    |  6 ---
>  4 files changed, 11 insertions(+), 87 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 117cf0125f8..099cbc82245 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2313,56 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
> icode, tree exp, rtx target)
>    return target;
>  }
> 
> -/* Return the integer constant in ARG.  Constrain it to be in the range
> -   of the subparts of VEC_TYPE; issue an error if not.  */
> -
> -static int
> -get_element_number (tree vec_type, tree arg)
> -{
> -  unsigned HOST_WIDE_INT elt, max = TYPE_VECTOR_SUBPARTS (vec_type) - 1;
> -
> -  if (!tree_fits_uhwi_p (arg)
> -  || (elt = tree_to_uhwi (arg), elt > max))
> -    {
> -  error ("selector must be an integer constant in the range [0, %wi]", 
> max);
> -  return 0;
> -    }
> -
> -  return elt;
> -}
> -
> -/* Expand vec_set builtin.  */
> -static rtx
> -altivec_expand_vec_set_builtin (tree exp)
> -{
> -  machine_mode tmode, mode1;
> -  tree arg0, arg1, arg2;
> -  int elt;
> -  rtx op0, op1;
> -
> -  arg0 = CALL_EXPR_ARG (exp, 0);
> -  arg1 = CALL_EXPR_ARG (exp, 1);
> -  arg2 = CALL_EXPR_ARG (exp, 2);
> -
> -  tmode = TYPE_MODE (TREE_TYPE (arg0));
> -  mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0)));
> -  gcc_assert (VECTOR_MODE_P (tmode));
> -
> -  op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL);
> -  op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
> -  elt = get_element_number (TREE_TYPE (arg0), arg2);
> -
> -  if (GET_MODE (op1) 

Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:52, Carl Love wrote:
> 
> GCC maintainers:
> 
> This patch was previously posted.  Per the feedback, it is now the first of 
> two patches to remove the set built-ins.
> 
> This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
> __builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
> update the various vector elements.  This change was originally intended to 
> be part of the earlier series of cleanup patches.  It was initially thought 
> that some additional work would be needed to do some gimple generation 
> instead of these built-ins.  However, the existing default code generation 
> does produce the needed code.    For the vec_set bif, the equivalent C code 
> is as good or better than the built-in.  For the vec_insert bif whose 
> resolving previously made use of the vec_set bif, the assembly code 
> generation is as good as before with the -O3 optimization.

This background information will be also mentioned in commit log, right?

> 
> The patch has been tested on Power 10 LE with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> 
> -
> rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
> __builtin_vec_set_v2di
> 
> Remove the built-ins, use the default gimple generation instead.

OK for trunk with better commit log like the above paragraph, thanks!

// Assuming testing on BE goes well too. :)

BR,
Kewen

> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
>     __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
>     definitions.
>     * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the
>     handling for constant vec_insert position with
>     VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 13 -
>  gcc/config/rs6000/rs6000-c.cc | 40 ---
>  2 files changed, 53 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 47830b7dcb0..75c33aa9ffc 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1263,19 +1263,6 @@
>    const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>  VEC_EXT_V2DI nothing {extract}
> 
> -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
> -;; resolve_vec_insert(), rs6000-c.cc
> -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
> -;; in resolve_vec_insert are replaced by the equivalent gimple statements.
> -  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
> -    VEC_SET_V1TI nothing {set}
> -
> -  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
> -    VEC_SET_V2DF nothing {set}
> -
> -  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
> -    VEC_SET_V2DI nothing {set}
> -
>    const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
>  CMPGE_16QI vector_nltv16qi {}
> 
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 68519e1397f..04882c396bf 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1524,46 +1524,6 @@ resolve_vec_insert (resolution *res, vec 
> *arglist,
>    return error_mark_node;
>  }
> 
> -  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
> -  machine_mode mode = TYPE_MODE (arg1_type);
> -
> -  if ((mode == V2DFmode || mode == V2DImode)
> -  && VECTOR_UNIT_VSX_P (mode)
> -  && TREE_CODE (arg2) == INTEGER_CST)
> -    {
> -  wide_int selector = wi::to_wide (arg2);
> -  selector = wi::umod_trunc (selector, 2);
> -  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> -
> -  tree call = NULL_TREE;
> -  if (mode == V2DFmode)
> -    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
> -  else if (mode == V2DImode)
> -    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
> -
> -  /* Note, __builtin_vec_insert_ has vector and scalar types
> -     reversed.  */
> -  if (call)
> -    {
> -      *res = resolved;
> -      return build_call_expr (call, 3, arg1, arg0, arg2);
> -    }
> -    }
> -
> -  else if (mode == V1TImode
> -       && VECTOR_UNIT_VSX_P (mode)
> -       && TREE_CODE (arg2) == INTEGER_CST)
> -    {
> -  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
> -  wide_int selector = wi::zero(32);
> -  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> -
> -  /* Note, __builtin_vec_insert_ has vector and scalar types
> -     reversed.  */
> -  *res = resolved;
> -  return build_call_expr (call, 3, arg1, arg0, arg2);
> -    }
> -
>    /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0

Re: [PATCH ver 2] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:06, Carl Love wrote:
> GCC maintainers:
> 
> version 2, Updated patch comments, added missing ChangeLog.  Fixed unintended 
> line removal.
> 
> The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
> builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  The 
> difference is the overloaded built-ins return a vector of boolean or a vector 
> of long long booleans where as the removed built-ins returned a vector of 
> floats or vector of doubles.
> 
> The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
> __builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
> vec_cmp[eq|ge|gt] built-in with the required changes for the return type.  
> Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.
> 
> The patches have been tested on a Power 10 LE system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> -
> rs6000, remove __builtin_vsx_xvcmp* built-ins
> 
> This patch removes the built-ins:
>  __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
>  __builtin_vsx_xvcmpgtsp.
> 
> which are similar to the recommended PVIPR documented overloaded
> vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
> 
> The difference is that the overloaded built-ins return a vector of
> 32-bit booleans.  The removed built-ins returned a vector of floats.
> 
> The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
> __builtin_vsx_xvcmpgtdp are not removed as they are used by the
> overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
> 
> The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
> __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
> __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
> the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
> overloaded built-ins requires the result to be stored in a vector of
> boolean of the appropriate size or the result must be cast to the return
> type used by the original __builtin_vsx_xvcmp* built-ins.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp,
>     __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove
>     definitions.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xvcmpeqdp,
>     __builtin_vsx_xvcmpgtdp, __builtin_vsx_xvcmpgedp,
>     __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgtsp,
>     __builtin_vsx_xvcmpgesp): Remove.
>     (vec_cmpeq, vec_cmpgt, vec_cmpge): Add tests for float
>     arguments that     store result in boolean and cast result to
>     store result in float.  Add tests for double arguments that
>     store the result in long long boolean and cast result to
>     double.

Nit: Normally the one in "()" is the name of the function you changed,
so how about:

(do_cmp): Replace __builtin_vsx_xvcmp{eq,gt,ge}{sp,dp} by vec_cmp{eq,gt,ge}
respectively and add explicit casts to vector {float,double}.  Add more
testing code assigning to vector boolean types.

OK for trunk with this nit tweaked, thanks!

BR,
Kewen

> ---
>  gcc/config/rs6000/rs6000-builtins.def |  9 --
>  .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
>  2 files changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..47830b7dcb0 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1579,18 +1579,12 @@
>    const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>  XVCMPEQDP_P vector_eq_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
> -    XVCMPEQSP vector_eqv4sf {}
> -
>    const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
> 
>    const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
>  XVCMPGEDP_P vector_ge_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpgesp (vf, vf);
> -    XVCMPGESP vector_gev4sf {}
> -
>    const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
>  XVCMPGESP_P vector_ge_v4sf_p {pred}
> 
> @@ -1600,9 +1594,6 @@
>    const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
>  XVCMPGTDP_P vector_gt_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
> -    XVCMPGTSP vector_gtv4sf {}
> -
>    const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
>  XVCMPGTSP_P vector_gt_v4sf_p {pred}
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 60f91aad23c..d67f97c8011 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -156,13 +156,27 @@ int do_cmp (void)
>  {
>    int i = 0;
> 
> -  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], 

Re: [PATCH] doc: Document -O1 as the preferred level for large machine-generated code

2024-07-25 Thread Eric Gallager
On Tue, Jul 23, 2024 at 10:07 AM Sam James  wrote:
>
> At -O1, the intention is that we compile things in a "reasonable" amount
> of time (ditto memory use). In particular, we try to especially avoid
> optimizations which scale poorly on pathological cases, as is the case
> for large machine-generated code.
>
> Recommend -O1 for large machine-generated code, as has been informally
> done on bugs for a while now.
>
> This applies (broadly speaking) for both large machine-generated functions
> but also to a lesser extent repetitive small-but-still-not-tiny functions
> from a generator program.
>
> gcc/ChangeLog:
> PR middle-end/114855
> * doc/invoke.texi (Optimize options): Mention machine-generated
> code for -O1.
> ---
> richi, does this accurately reflect the discussion we had on IRC a little
> while ago?
>
> Please push if OK, thanks.
>
>  gcc/doc/invoke.texi | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e0a641213ae4..9fb0925ed292 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12560,6 +12560,11 @@ With @option{-O}, the compiler tries to reduce code 
> size and execution
>  time, without performing any optimizations that take a great deal of
>  compilation time.
>
> +@option{-O} is the recommended optimization level for large machine-generated
> +code as a sensible balance between time taken to compile and memory use:
> +higher optimization levels perform optimizations with greater algorithmic
> +complexity than at @option{-O}.
> +

Personally, I get confused when "-O1" is written as just "-O"...

>  @c Note that in addition to the default_options_table list in opts.cc,
>  @c several optimization flags default to true but control optimization
>  @c passes that are explicitly disabled at -O0.
>
> --
> 2.45.2
>


Re: [PATCH, gfortran] libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic

2024-07-25 Thread FX Coudert
Can you post an updated version of the patch, following the first round of 
review?

FX

Re: [PATCH] libstdc++: Fix testsuite for remote testing (and sim)

2024-07-25 Thread Jonathan Wakely
On Thu, 25 Jul 2024 at 07:32, Jonathan Wakely  wrote:
>
>
>
> On Thu, 25 Jul 2024, 02:58 Andrew Pinski,  wrote:
>>
>> The problem here is that v3_additional_files will have a space
>> at the begining of the string as dg-additional-files will append
>> `" " $files` to it.  Then when split is called on that string,
>> there will be an empty file and copying a dir will just fail for
>> remote/sim testing (I didn't look at why it works for native
>> testing though).
>>
>> Ran a full libstdc++ testsuite using a sim board for testing.
>
>
> Ah I did notice the extra space but since it worked for my naive testing I 
> didn't worry about it.

That was supposed to say native, but naive works too!

>
> OK for trunk, thanks.
>
>
>>
>> libstdc++-v3/ChangeLog:
>>
>> * testsuite/lib/libstdc++.exp (v3_target_compile): Call
>> string trim on v3_target_compile before calling split.
>>
>> Signed-off-by: Andrew Pinski 
>> ---
>>  libstdc++-v3/testsuite/lib/libstdc++.exp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
>> b/libstdc++-v3/testsuite/lib/libstdc++.exp
>> index 4bf88e72d05..c11e752ecfb 100644
>> --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
>> +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
>> @@ -639,7 +639,8 @@ proc v3_target_compile { source dest type options } {
>>  lappend options "timeout=[timeout_value]"
>>
>>  global v3_additional_files
>> -foreach file [split $v3_additional_files " "] {
>> +# There will be an empty file at the begining of the list so trim it 
>> off.
>> +foreach file [split [string trim $v3_additional_files] " "] {
>> global srcdir
>> v3-copy-file "$srcdir/data/$file" $file
>>  }
>> --
>> 2.43.0
>>


[PATCH] tree-optimization/116081 - typedef vs. non-typedef in vectorization

2024-07-25 Thread Richard Biener
The following addresses a behavioral difference in vector type
analysis for typedef vs. non-typedef.  It doesn't fix the issue
at hand but avoids a spurious difference in the dumps.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116081
* tree-vect-stmts.cc (vect_get_vector_types_for_stmt):
Properly compare types.
---
 gcc/tree-vect-stmts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a47482375c1..aa98599c1f5 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14905,7 +14905,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
 vector size per vectorization).  */
   scalar_type = vect_get_smallest_scalar_type (stmt_info,
   TREE_TYPE (vectype));
-  if (scalar_type != TREE_TYPE (vectype))
+  if (!types_compatible_p (scalar_type, TREE_TYPE (vectype)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
-- 
2.43.0


[PATCH] tree-optimization/116079 - store motion and clobbers

2024-07-25 Thread Richard Biener
When we move a store out of an inner loop and remove a clobber in
the process, analysis of the inner loop can run into the clobber
via the meta-data and crash when accessing its basic-block.  The
following avoids this by clearing the VDEF which is how it identifies
already processed stores.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116079
* tree-ssa-loop-im.cc (hoist_memory_references): Clear
VDEF of elided clobbers.

* gcc.dg/torture/pr116079.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr116079.c | 20 
 gcc/tree-ssa-loop-im.cc |  2 ++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116079.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116079.c 
b/gcc/testsuite/gcc.dg/torture/pr116079.c
new file mode 100644
index 000..e9120969d91
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116079.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+char g_132;
+int g_701, g_1189, func_24___trans_tmp_15, func_24_l_2691;
+long func_24___trans_tmp_9;
+int *func_24_l_2684;
+void func_24() {
+  for (; g_1189;) {
+g_132 = 0;
+for (; g_132 < 6; ++g_132) {
+  func_24___trans_tmp_9 = *func_24_l_2684 = func_24_l_2691;
+  g_701 = 4;
+  for (; g_701; g_701 -= 1) {
+int l_2748[4];
+int si2 = l_2748[3];
+func_24___trans_tmp_15 = si2;
+  }
+}
+  }
+}
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index c53efbb8d59..ccc56dc42f6 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2880,6 +2880,7 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
  unlink_stmt_vdef (stmt);
  release_defs (stmt);
+ gimple_set_vdef (stmt, NULL_TREE);
  gsi_remove (&gsi, true);
}
 
@@ -3062,6 +3063,7 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
   unlink_stmt_vdef (stmt);
   release_defs (stmt);
+  gimple_set_vdef (stmt, NULL_TREE);
   gsi_remove (&gsi, true);
 }
 
-- 
2.43.0


Re: [PATCH] libstdc++: fix uses of explicit object parameter [PR116038]

2024-07-25 Thread Jonathan Wakely
On Thu, 25 Jul 2024 at 01:04, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14 (after
> 14.2 is released)?

Yes to both, thanks.


>
> -- >8 --
>
> The type of an implicit object parameter is always the current class.
> For an explicit object parameter however, its deduced type can be a
> class derived from the current class.  So when combining multiple
> implicit-object overloads into a single explicit-object overload we need
> to account for this possibility.  For example, when accessing a member
> from the current class through an explicit object parameter, it may be a
> derived class from which the member is not accessible, as in the below
> testcases.
>
> This pitfall is discussed[1] in the deducing this paper.  The general
> solution is to cast the explicit object parameter to (a reference to)
> the current class, appropriately qualified, rather than e.g. using
> std::forward which preserves the deduced type.
>
> This patch corrects the existing problematic uses of explicit object
> parameters in the library, all of which forward the parameter via
> std::forward, to instead cast the parameter to the current class via
> our __like_t alias template.  Note that unlike the paper's like_t,
> ours always returns a reference so when can just write
>
>   __like_t(self)
>
> instead of
>
>   (_like_t&&)self
>
> as the paper does.
>
> [1]: https://wg21.link/P0847#name-lookup-within-member-functions and the
> section after that
>
> PR libstdc++/116038
>
> libstdc++-v3/ChangeLog:
>
> * include/std/functional (_Bind_front::operator()): Use __like_t
> instead of std::forward when forwarding __self.
> (_Bind_back::operator()): Likewise.
> * include/std/ranges (_Partial::operator()): Likewise.
> (_Pipe::operator()): Likewise.
> * testsuite/20_util/function_objects/bind_back/116038.cc: New test.
> * testsuite/20_util/function_objects/bind_front/116038.cc: New test.
> * testsuite/std/ranges/adaptors/116038.cc: New test.
> ---
>  libstdc++-v3/include/std/functional   |  4 +--
>  libstdc++-v3/include/std/ranges   | 11 ---
>  .../function_objects/bind_back/116038.cc  | 27 +
>  .../function_objects/bind_front/116038.cc | 27 +
>  .../testsuite/std/ranges/adaptors/116038.cc   | 29 +++
>  5 files changed, 92 insertions(+), 6 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/function_objects/bind_back/116038.cc
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/function_objects/bind_front/116038.cc
>  create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/116038.cc
>
> diff --git a/libstdc++-v3/include/std/functional 
> b/libstdc++-v3/include/std/functional
> index 99364286a72..7788a963757 100644
> --- a/libstdc++-v3/include/std/functional
> +++ b/libstdc++-v3/include/std/functional
> @@ -944,7 +944,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> noexcept(is_nothrow_invocable_v<__like_t<_Self, _Fd>,
> __like_t<_Self, _BoundArgs>..., 
> _CallArgs...>)
> {
> - return _S_call(std::forward<_Self>(__self), _BoundIndices(),
> + return _S_call(__like_t<_Self, _Bind_front>(__self), 
> _BoundIndices(),
>  std::forward<_CallArgs>(__call_args)...);
> }
>  #else
> @@ -1072,7 +1072,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> noexcept(is_nothrow_invocable_v<__like_t<_Self, _Fd>,
> _CallArgs..., __like_t<_Self, 
> _BoundArgs>...>)
> {
> - return _S_call(std::forward<_Self>(__self), _BoundIndices(),
> + return _S_call(__like_t<_Self, _Bind_back>(__self), _BoundIndices(),
>  std::forward<_CallArgs>(__call_args)...);
> }
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 3f335b95a08..b7c7aa36ddc 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -1033,7 +1033,7 @@ namespace views::__adaptor
> return _Adaptor{}(std::forward<_Range>(__r),
>   std::forward(__args)...);
>   };
> - return std::apply(__forwarder, std::forward<_Self>(__self)._M_args);
> + return std::apply(__forwarder, __like_t<_Self, 
> _Partial>(__self)._M_args);
> }
>  #else
>template
> @@ -1082,7 +1082,10 @@ namespace views::__adaptor
> requires __adaptor_invocable<_Adaptor, _Range, __like_t<_Self, _Arg>>
> constexpr auto
> operator()(this _Self&& __self, _Range&& __r)
> -   { return _Adaptor{}(std::forward<_Range>(__r), 
> std::forward<_Self>(__self)._M_arg); }
> +   {
> + return _Adaptor{}(std::forward<_Range>(__r),
> +   __like_t<_Self, _Partial>(__self)._M_arg);
> +   }
>  #else
>template
> requires __adaptor_invocable<_Adapto

[committed] libstdc++: Use concepts and conditional explicit in std::optional

2024-07-25 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

For C++20 mode we can improve compile times by using conditional
explicit to reduce the number of constructor overloads. We can also use
requires-clauses instead of SFINAE to implement constraints on the
constructors and assignment operators.

libstdc++-v3/ChangeLog:

* include/std/optional (optional): Use C++20 features to
simplify overload sets for constructors and assignment
operators.
---
 libstdc++-v3/include/std/optional | 130 +++---
 1 file changed, 121 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 700e7047aba..2cc0221865e 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -768,6 +768,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
is_assignable<_Tp&, const optional<_Up>&&>,
is_assignable<_Tp&, optional<_Up>&&>>;
 
+#if __cpp_concepts && __cpp_conditional_explicit && __glibcxx_remove_cvref
+# define _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL 1
+#endif
+
   /**
 * @brief Class template for optional values.
 */
@@ -794,17 +798,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using _Base = _Optional_base<_Tp>;
 
   // SFINAE helpers
-  template
-   using __not_self = __not_>>;
-  template
-   using __not_tag = __not_>>;
-  template
-   using _Requires = enable_if_t<__and_v<_Cond...>, bool>;
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 3836. std::expected conversion constructor
   // expected(const expected&) should take precedence over
   // expected(U&&) with operator bool
+#ifdef _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL
+  template>
+   static constexpr bool __not_constructing_bool_from_optional
+ = true;
+
+  // If T is cv bool, remove_cvref_t is not a specialization of optional
+  // i.e. do not initialize a bool from optional::operator bool().
+  template
+   static constexpr bool
+   __not_constructing_bool_from_optional<_From, bool>
+ = !__is_optional_v>;
+
+  // If T is not cv bool, converts-from-any-cvref> is false.
+  // The constructor that converts from optional is disabled if the
+  // contained value can be initialized from optional, so that the
+  // optional(U&&) constructor can be used instead.
+  template>
+   static constexpr bool __construct_from_contained_value
+ = !__converts_from_optional<_Tp, _From>::value;
+
+  // However, optional can always be converted to bool, so don't apply
+  // this constraint when T is cv bool.
+  template
+   static constexpr bool __construct_from_contained_value<_From, bool>
+ = true;
+#else
   template>
struct __not_constructing_bool_from_optional
: true_type
@@ -825,6 +849,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: true_type
{ };
 
+  template
+   using __not_self = __not_>>;
+  template
+   using __not_tag = __not_>>;
+  template
+   using __is_bool = is_same, bool>;
+  template
+   using _Requires = enable_if_t<__and_v<_Cond...>, bool>;
+#endif
+
 public:
   using value_type = _Tp;
 
@@ -833,6 +867,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr optional(nullopt_t) noexcept { }
 
   // Converting constructors for engaged optionals.
+#ifdef _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL
+  template
+   requires (!is_same_v>)
+ && (!is_same_v>)
+ && is_constructible_v<_Tp, _Up>
+ && __not_constructing_bool_from_optional<_Up>
+   constexpr explicit(!is_convertible_v<_Up, _Tp>)
+   optional(_Up&& __t)
+   noexcept(is_nothrow_constructible_v<_Tp, _Up>)
+   : _Base(std::in_place, std::forward<_Up>(__t)) { }
+
+  template
+   requires (!is_same_v>)
+ && is_constructible_v<_Tp, const _Up&>
+ && __construct_from_contained_value<_Up>
+   constexpr explicit(!is_convertible_v)
+   optional(const optional<_Up>& __t)
+   noexcept(is_nothrow_constructible_v<_Tp, const _Up&>)
+   {
+ if (__t)
+   emplace(__t._M_get());
+   }
+
+  template
+   requires (!is_same_v>)
+ && is_constructible_v<_Tp, _Up>
+ && __construct_from_contained_value<_Up>
+   constexpr explicit(!is_convertible_v<_Up, _Tp>)
+   optional(optional<_Up>&& __t)
+   noexcept(is_nothrow_constructible_v<_Tp, _Up>)
+   {
+ if (__t)
+   emplace(std::move(__t._M_get()));
+   }
+
+  template
+   requires is_constructible_v<_Tp, _Args...>
+   explicit constexpr
+   optional(in_place_t, _Args&&... __args)
+   noexcept(is_nothrow_constructible_v<_Tp, _Args...>)
+   : _Base(std::in_place, std::forward<_Args>(__args)...)
+   { }
+
+  template
+   requires is_constructible_v<_Tp, initializer_list<_Up>&, _Args...>
+   explicit constexpr
+   optional(in_place_t, initialize

Re: [PATCH] doc: Document -O1 as the preferred level for large machine-generated code

2024-07-25 Thread Sam James
Eric Gallager  writes:

> On Tue, Jul 23, 2024 at 10:07 AM Sam James  wrote:
>>
>> At -O1, the intention is that we compile things in a "reasonable" amount
>> of time (ditto memory use). In particular, we try to especially avoid
>> optimizations which scale poorly on pathological cases, as is the case
>> for large machine-generated code.
>>
>> Recommend -O1 for large machine-generated code, as has been informally
>> done on bugs for a while now.
>>
>> This applies (broadly speaking) for both large machine-generated functions
>> but also to a lesser extent repetitive small-but-still-not-tiny functions
>> from a generator program.
>>
>> gcc/ChangeLog:
>> PR middle-end/114855
>> * doc/invoke.texi (Optimize options): Mention machine-generated
>> code for -O1.
>> ---
>> richi, does this accurately reflect the discussion we had on IRC a little
>> while ago?
>>
>> Please push if OK, thanks.
>>
>>  gcc/doc/invoke.texi | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index e0a641213ae4..9fb0925ed292 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -12560,6 +12560,11 @@ With @option{-O}, the compiler tries to reduce code 
>> size and execution
>>  time, without performing any optimizations that take a great deal of
>>  compilation time.
>>
>> +@option{-O} is the recommended optimization level for large 
>> machine-generated
>> +code as a sensible balance between time taken to compile and memory use:
>> +higher optimization levels perform optimizations with greater algorithmic
>> +complexity than at @option{-O}.
>> +
>
> Personally, I get confused when "-O1" is written as just "-O"...

I, too, prefer -O1, but I was trying to be good and stick to
convention. But then I did -O1 in the commit message.

If people are fine with it, I'd prefer to do -O1.

thanks,
sam


Re: [PATCH] c++/modules: Implement P2615 'Meaningful Exports' [PR107688]

2024-07-25 Thread Jakub Jelinek
On Tue, Apr 30, 2024 at 02:46:15PM -0400, Jason Merrill wrote:
> On 3/4/24 06:18, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu. This should probably
> > wait for GCC 15 I suppose, but sending it in now in case there are any
> > comments.
> 
> OK for trunk.

I've committed following patch to mark it as implemented:

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 4f5def93..b25959b6 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -569,7 +569,7 @@
 
DR: Meaningful exports 
https://wg21.link/p2615";>P2615R1
-   https://gcc.gnu.org/PR107688";>No
+   15

 
 


Jakub



[PATCH] tree-optimization/116081 - typedef vs. non-typedef in vectorization

2024-07-25 Thread Richard Biener
The following fixes the code generation difference when using
a typedef for the scalar type.  The issue is using a pointer
equality test for an INTEGER_CST which fails when the types
are different variants.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116081
* tree-vect-loop.cc (get_initial_defs_for_reduction):
Use operand_equal_p for comparing the element with the
neutral op.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7d628efa60..856ce491c3e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5652,7 +5652,7 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
  init = gimple_build_vector_from_val (&ctor_seq, vector_type,
   neutral_op);
  int k = nunits;
- while (k > 0 && elts[k - 1] == neutral_op)
+ while (k > 0 && operand_equal_p (elts[k - 1], neutral_op))
k -= 1;
  while (k > 0)
{
-- 
2.43.0


Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-25 Thread Richard Sandiford
Jennifer Schmitz  writes:
> Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the reg 
> operands of the compare and if-then-else expressions. As it is not legal to 
> have different modes in the operand registers, I only added one check for 
> each of the expressions.
> The updated patch was bootstrapped and tested again.
> Best,
> Jennifer
>
> From 8da609be99fece8130cf1429bd938b2a26c6672b Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz 
> Date: Wed, 24 Jul 2024 06:13:59 -0700
> Subject: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2
>
> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
> implemented so far. This patch implements and tests the two fusion pairs.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> There was also no non-noise impact on SPEC CPU2017 benchmark.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>   fusion logic.
>   * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>   (cmp+cset): Likewise.
>   * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>   field fusible_ops.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>   * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.

Thanks for the update.

It looks from a quick scan like the main three instructions associated
with single-set integer COMPAREs are CMP, CMN and TST.  TST could be
distinguished from CMP and CMN based on get_attr_type (), although it
looks like:

(define_insn "*and_compare0"
  [(set (reg:CC_Z CC_REGNUM)
(compare:CC_Z
 (match_operand:SHORT 0 "register_operand" "r")
 (const_int 0)))]
  ""
  "tst\\t%0, "
  [(set_attr "type" "alus_imm")]
)

should use logics_imm instead of alus_imm.

Alternatively, we could add a new attribute for "compare_type" and use
that.  That would make the test in aarch_macro_fusion_pair_p slightly
simpler, since it could use get_attr_compare_type without having to
look at the pattern of prev_set.  But there's a danger that we'd
forget to add the new attribute for new comparison instructions.

I did wonder whether we could simply punt on CC_Zmode, but that's
not a reliable test.

But I suppose the counter-argument to my questions above is: how bad
would it be if we fused CMN and TST?  They are at least plausible
fusions, so it probably doesn't matter if we include them too.

So:

> ---
>  gcc/config/aarch64/aarch64-fusion-pairs.def   |  2 ++
>  gcc/config/aarch64/aarch64.cc | 22 +
>  gcc/config/aarch64/tuning_models/neoversev2.h |  5 ++-
>  .../gcc.target/aarch64/fuse_cmp_csel.c| 33 +++
>  .../gcc.target/aarch64/fuse_cmp_cset.c| 31 +
>  5 files changed, 92 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c
>
> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> index 9a43b0c8065..bf5e85ba8fe 100644
> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> @@ -37,5 +37,7 @@ AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
>  AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
>  AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ)
>  AARCH64_FUSION_PAIR ("addsub_2reg_const1", ADDSUB_2REG_CONST1)
> +AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL)
> +AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>  
>  #undef AARCH64_FUSION_PAIR
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9e51236ce9f..7a0351f7dac 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -27348,6 +27348,28 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
> *curr)
>&& reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>  return true;
>  
> +  /* Fuse CMP and CSEL.  */

How about adding something like:

  The test for CMP is not exact, and includes things like CMN and TST
  as well.  However, fusing with those instructions doesn't seem
  inherently bad.

> +  if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
> +  && prev_set && curr_set
> +  && GET_CODE (SET_SRC (prev_set)) == COMPARE
> +  && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
> +  && REG_P (XEXP (SET_SRC (curr_set), 1))
> +  && REG_P (XEXP (SET_SRC (curr_set), 2))

I think we could use aarch64_reg_or_zero instead of REG_P for the last
two lines, which would allow selects involving W/XZR as well.

Looks good to me otherwise, but I'd be interested to hear what others
think about the CMP match.

Thanks,
Richard

> +  && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (prev_set), 0)))
> +  && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_S

Re: [PATCH 4/5] MATCH: Create BIT_ANDN and BIT_IORN from matching

2024-07-25 Thread Richard Biener
On Thu, Jul 25, 2024 at 4:16 AM Andrew Pinski  wrote:
>
> To better create rtl directly from gimple, we can use
> these already internal functions from the gimple.
>
> That is simplify `a & ~b` into BIT_ANDN.
> Likewise `a | ~b` into BIT_IORN.
> We only want to do this late after vectorization as some
> targets (e.g. aarch64 SVE) has BIT_IORN on scalars but not on
> some vector modes; even though the vectorizer could expand it back.
>
> Note a few testcases need to be changed to not look
> into optimized dump and catch them earlier.
> The modified testcases could catch BIT_ANDN and BIT_IORN so move the
> testing to forwprop2 before simplification happens.
>
> Built and tested on aarch64-linux-gnu with no regressions.

I think we want these only for ISEL as they happen way too often and will
disturb the IL too much in ways not handled by passes.  not/and/or are
too important ops to "hide" from most of the gimple pipeline.

Richard.

> PR target/115086
>
> gcc/ChangeLog:
>
> * match.pd (`a & ~b`, `a | ~b`): New pattern.
> (BIT_ANDN/BIT_IORN with CST): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/bic-cst-1.c: New test.
> * gcc.target/aarch64/bic_simd-1.c: New test.
> * gcc.dg/tree-ssa/bitops-1.c: Move testing from optimized to 
> forwprop2.
> * gcc.dg/tree-ssa/bitops-6.c: Likewise.
> * gcc.dg/tree-ssa/cmpbit-4.c: Likewise.
> * gcc.dg/tree-ssa/pr110637-2.c: Likewise.
> * gcc.dg/tree-ssa/pr94880.c: Likewise.
> * gcc.dg/tree-ssa/pr96671-1.c: Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  | 17 ++
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c  | 10 +++---
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c  | 12 +++
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c  |  8 ++---
>  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c  | 12 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c|  8 ++---
>  gcc/testsuite/gcc.dg/tree-ssa/pr94880.c   |  6 ++--
>  gcc/testsuite/gcc.dg/tree-ssa/pr96671-1.c |  8 ++---
>  gcc/testsuite/gcc.target/aarch64/bic-cst-1.c  | 31 ++
>  gcc/testsuite/gcc.target/aarch64/bic_simd-1.c | 32 +++
>  10 files changed, 112 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/bic-cst-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/bic_simd-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cf359b0ec0f..56f631dfeec 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -9979,6 +9979,23 @@ and,
> (cond_op:s @1 @2 @3 @4 @5) @5)
>(cond_op (bit_and @1 @0) @2 @3 @4 @5)))
>
> +#if GIMPLE
> +/* Create bit_andc and bit_iorc internal functions. */
> +(for bitop  (bit_and  bit_ior)
> + bitopc (IFN_BIT_ANDN IFN_BIT_IORN)
> + (simplify
> +  (bitop:c (bit_not:s @0) @1)
> +  (if (canonicalize_math_after_vectorization_p ()
> +   && direct_internal_fn_supported_p (as_internal_fn (bitopc),
> + type, OPTIMIZE_FOR_BOTH))
> +   (bitopc @1 @0)))
> + /* If the second operand is a constant, then reduce it to a & ~cst if
> +the not simplifies. */
> + (simplify
> +  (bitopc @0 CONSTANT_CLASS_P@1)
> +  (bitop (bit_not! @1) @0)))
> +#endif
> +
>  /* For pointers @0 and @2 and nonnegative constant offset @1, look for
> expressions like:
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c
> index cf2823deb62..3a394b1f188 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O -fdump-tree-optimized-raw" } */
> +/* { dg-options "-O -fdump-tree-forwprop2-raw" } */
>
>  #define DECLS(n,VOL)   \
>  __attribute__((noinline,noclone))  \
> @@ -66,7 +66,7 @@ int main(){
> }
>  }
>
> -/* { dg-final { scan-tree-dump-times "bit_not_expr" 12 "optimized"} } */
> -/* { dg-final { scan-tree-dump-times "bit_and_expr"  9 "optimized"} } */
> -/* { dg-final { scan-tree-dump-times "bit_ior_expr" 10 "optimized"} } */
> -/* { dg-final { scan-tree-dump-times "bit_xor_expr"  9 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "bit_not_expr, " 12 "forwprop2"} } */
> +/* { dg-final { scan-tree-dump-times "bit_and_expr, "  9 "forwprop2"} } */
> +/* { dg-final { scan-tree-dump-times "bit_ior_expr, " 10 "forwprop2"} } */
> +/* { dg-final { scan-tree-dump-times "bit_xor_expr, "  9 "forwprop2"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c
> index e6ab2fd6c71..e08132e2ab5 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +/* { dg-options "-O2 -fdump-tree-forwprop2-raw" } */
>  /* PR tree-optimization/111282 */
>
>
> @@ -25,

Re: [PATCH 5/5] MATCH: Add an alt pattern for ANDN and IORN with constants

2024-07-25 Thread Richard Biener
On Thu, Jul 25, 2024 at 4:18 AM Andrew Pinski  wrote:
>
> With constants we can match `~(a | CST)` into `CST & ~a`.
> Likewise `~(a & CST)` into `CST | ~a`.
>
> Built and tested for aarch64-linux-gnu with no regressions.

Similar, I think this should be in ISEL instead.

> PR target/116013
> PR target/115086
>
> gcc/ChangeLog:
>
> * match.pd (`~(a & CST)`, `~(a | CST)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/bic-cst-2.c: New test.
> * gcc.target/aarch64/bic_simd-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  | 10 ++
>  gcc/testsuite/gcc.target/aarch64/bic-cst-2.c  | 31 +
>  gcc/testsuite/gcc.target/aarch64/bic_simd-2.c | 33 +++
>  3 files changed, 74 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/bic-cst-2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/bic_simd-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 56f631dfeec..680dfea523f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -9994,6 +9994,16 @@ and,
>   (simplify
>(bitopc @0 CONSTANT_CLASS_P@1)
>(bitop (bit_not! @1) @0)))
> +
> +/* Create bit_andc and bit_iorc internal functions. */
> +(for rbitop  (bit_ior  bit_and)
> + bitopc  (IFN_BIT_ANDN IFN_BIT_IORN)
> + (simplify
> +  (bit_not (rbitop:s @0 CONSTANT_CLASS_P@1))
> +  (if (canonicalize_math_after_vectorization_p ()
> +   && direct_internal_fn_supported_p (as_internal_fn (bitopc),
> + type, OPTIMIZE_FOR_BOTH))
> +   (bitopc (bit_not! @1) @0
>  #endif
>
>  /* For pointers @0 and @2 and nonnegative constant offset @1, look for
> diff --git a/gcc/testsuite/gcc.target/aarch64/bic-cst-2.c 
> b/gcc/testsuite/gcc.target/aarch64/bic-cst-2.c
> new file mode 100644
> index 000..b89ac72dae1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bic-cst-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/**
> +**bar1:
> +** mov w([0-9]+), 4
> +** bic w0, w\1, w1
> +** ret
> +*/
> +int bar1(int a, int c)
> +{
> +  int b = ~((~4) | c);
> +  return b;
> +}
> +
> +/**
> +**foo1:
> +** mov w([0-9]+), 4
> +** orn w0, w\1, w1
> +** ret
> +*/
> +int foo1(int a, int c)
> +{
> +  int b = ~((~4) & c);
> +  return b;
> +}
> +
> +/* { dg-final { scan-tree-dump ".BIT_ANDN " "optimized" } } */
> +/* { dg-final { scan-tree-dump ".BIT_IORN " "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/bic_simd-2.c 
> b/gcc/testsuite/gcc.target/aarch64/bic_simd-2.c
> new file mode 100644
> index 000..8543ce61400
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bic_simd-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/**
> +**bar1:
> +** moviv([0-9]+).2s, 0x4
> +** bic v0.8b, v\1.8b, v1.8b
> +** ret
> +*/
> +#define vect8 __attribute__((vector_size(8)))
> +vect8 int bar1(vect8 int a, vect8 int c)
> +{
> +  vect8 int b = ~((~4) | c);
> +  return b;
> +}
> +
> +/**
> +**foo1:
> +** moviv([0-9]+).2s, 0x4
> +** orn v0.8b, v\1.8b, v1.8b
> +** ret
> +*/
> +#define vect8 __attribute__((vector_size(8)))
> +vect8 int foo1(vect8 int a, vect8 int c)
> +{
> +  vect8 int b = ~((~4) & c);
> +  return b;
> +}
> +
> +/* { dg-final { scan-tree-dump ".BIT_ANDN " "optimized" } } */
> +/* { dg-final { scan-tree-dump ".BIT_IORN " "optimized" } } */
> +
> --
> 2.43.0
>


Re: [PATCH] MATCH: add abs support for half float

2024-07-25 Thread Richard Biener
On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
 wrote:
>
> On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
>  wrote:
> >
> > On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  wrote:
> > > >
> > > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> > > >  wrote:
> > > > >
> > > > > Revised based on the comment and moved it into existing patterns as.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > > > > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > > >
> > > > The testcase needs to make sure it runs only for targets that support
> > > > float16 so like:
> > > >
> > > > /* { dg-require-effective-target float16 } */
> > > > /* { dg-add-options float16 } */
> > > Added in the attached version.
> >
> > + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> >   (for cmp (ge gt)
> >(simplify
> > -   (cnd (cmp @0 zerop) @1 (negate @1))
> > -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> > -&& bitwise_equal_p (@0, @1))
> > +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> > +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> > +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> > +&& ((VECTOR_TYPE_P (type)
> > + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
> > +   || (!VECTOR_TYPE_P (type)
> > +   && (TYPE_PRECISION (TREE_TYPE (@1))
> > +   <= TYPE_PRECISION (TREE_TYPE (@0)
> > +&& bitwise_equal_p (@1, @2))
> >
> > I wonder about the bitwise_equal_p which tests @1 against @2 now
> > with the convert still applied to @1 - that looks odd.  You are allowing
> > sign-changing conversions but doesn't that change ge/gt behavior?
> > Also why are sign/zero-extensions not OK for vector types?
> Thanks for the review.
> My main motivation here is for _Float16  as below.
>
> _Float16 absfloat16 (_Float16 x)
> {
>   float _1;
>   _Float16 _2;
>   _Float16 _4;
>[local count: 1073741824]:
>   _1 = (float) x_3(D);
>   if (_1 < 0.0)
> goto ; [41.00%]
>   else
> goto ; [59.00%]
>[local count: 440234144]:\
>   _4 = -x_3(D);
>[local count: 1073741824]:
>   # _2 = PHI <_4(3), x_3(D)(2)>
>   return _2;
> }
>
> This is why I added  bitwise_equal_p test of @1 against @2 with
> TYPE_PRECISION checks.
> I agree that I will have to check for sign-changing conversions.
>
> Just to keep it simple, I disallowed vector types. I am not sure if
> this would  hit vec types. I am happy to handle this if that is
> needed.

I think with __builtin_convertvector you should be able to construct
a testcase that does

>
> >
> > +  (absu:type @1)
> > +  (abs @1)
> >
> > I think this should use @2 now.
> I will change this.
>
> Thanks,
> Kugan
>
> >
> > > Thanks.
> > > Kugan
> > > >
> > > > (like what is in gcc.dg/c11-floatn-3.c and others).
> > > >
> > > > Other than that it looks good but I can't approve it.
> > > >
> > > > Thanks,
> > > > Andrew Pinski
> > > >
> > > > >
> > > > > Signed-off-by: Kugan Vivekanandarajah 
> > > > >
> > > > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for 
> > > > > trunk?
> > > > > Thanks,
> > > > > Kugan
> > > > >
> > > > > 
> > > > > From: Andrew Pinski 
> > > > > Sent: Monday, 15 July 2024 5:30 AM
> > > > > To: Kugan Vivekanandarajah 
> > > > > Cc: gcc-patches@gcc.gnu.org ; 
> > > > > richard.guent...@gmail.com 
> > > > > Subject: Re: [PATCH] MATCH: add abs support for half float
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Sun, Jul 14, 2024 at 1:12 AM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > This patch extends abs detection in matched for half float.
> > > > > >
> > > > > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK 
> > > > > > for trunk?
> > > > >
> > > > > This is basically this pattern:
> > > > > ```
> > > > >  /* A >=/> 0 ? A : -Asame as abs (A) */
> > > > >  (for cmp (ge gt)
> > > > >   (simplify
> > > > >(cnd (cmp @0 zerop) @1 (negate @1))
> > > > > (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > > > >  && !TYPE_UNSIGNED (TREE_TYPE(@0))
> > > > >  && bitwise_equal_p (@0, @1))
> > > > >  (if (TYPE_UNSIGNED (type))
> > > > >   (absu:type @0)
> > > > >   (abs @0)
> > > > > ```
> > > > >
> > > > > except extended to handle an optional convert. Why didn't you just
> > > > > extend the above pattern to handle the convert instead? Also I think
> > > > > you have an issue with unsigned types with the comparison.
> > > > > Also you should extend the -abs(A) pattern right below it in a 
> > > > > similar fashion.
> > > > >
> > > > > Thanks,
> > > > > Andrew Pinski
> > > > >

Re: [PATCH] doc: Document -O1 as the preferred level for large machine-generated code

2024-07-25 Thread Richard Biener
On Tue, Jul 23, 2024 at 4:07 PM Sam James  wrote:
>
> At -O1, the intention is that we compile things in a "reasonable" amount
> of time (ditto memory use). In particular, we try to especially avoid
> optimizations which scale poorly on pathological cases, as is the case
> for large machine-generated code.
>
> Recommend -O1 for large machine-generated code, as has been informally
> done on bugs for a while now.
>
> This applies (broadly speaking) for both large machine-generated functions
> but also to a lesser extent repetitive small-but-still-not-tiny functions
> from a generator program.
>
> gcc/ChangeLog:
> PR middle-end/114855
> * doc/invoke.texi (Optimize options): Mention machine-generated
> code for -O1.
> ---
> richi, does this accurately reflect the discussion we had on IRC a little
> while ago?
>
> Please push if OK, thanks.

OK (I pushed it).

Richard.

>  gcc/doc/invoke.texi | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e0a641213ae4..9fb0925ed292 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12560,6 +12560,11 @@ With @option{-O}, the compiler tries to reduce code 
> size and execution
>  time, without performing any optimizations that take a great deal of
>  compilation time.
>
> +@option{-O} is the recommended optimization level for large machine-generated
> +code as a sensible balance between time taken to compile and memory use:
> +higher optimization levels perform optimizations with greater algorithmic
> +complexity than at @option{-O}.
> +
>  @c Note that in addition to the default_options_table list in opts.cc,
>  @c several optimization flags default to true but control optimization
>  @c passes that are explicitly disabled at -O0.
>
> --
> 2.45.2
>


[pushed] rtl-ssa: Define INCLUDE_ARRAY

2024-07-25 Thread Richard Sandiford
g:72fbd3b2b2a497dbbe6599239bd61c5624203ed0 added a use of std::array
without explicitly forcing  to be included.  That didn't cause
problems in my local builds but understandably did for some people.

Bootstrapped on aarch64-linux-gnu & pushed as obvious.

Richard

gcc/
* doc/rtl.texi: Document the need to define INCLUDE_ARRAY before
including rtl-ssa.h.
* rtl-ssa.h: Likewise (in comment).
* config/aarch64/aarch64-cc-fusion.cc: Add INCLUDE_ARRAY.
* config/aarch64/aarch64-early-ra.cc: Likewise.
* config/riscv/riscv-avlprop.cc: Likewise.
* config/riscv/riscv-vsetvl.cc: Likewise.
* fwprop.cc: Likewise.
* late-combine.cc: Likewise.
* pair-fusion.cc: Likewise.
* rtl-ssa/accesses.cc: Likewise.
* rtl-ssa/blocks.cc: Likewise.
* rtl-ssa/changes.cc: Likewise.
* rtl-ssa/functions.cc: Likewise.
* rtl-ssa/insns.cc: Likewise.
* rtl-ssa/movement.cc: Likewise.
---
 gcc/config/aarch64/aarch64-cc-fusion.cc | 1 +
 gcc/config/aarch64/aarch64-early-ra.cc  | 1 +
 gcc/config/riscv/riscv-avlprop.cc   | 1 +
 gcc/config/riscv/riscv-vsetvl.cc| 1 +
 gcc/doc/rtl.texi| 1 +
 gcc/fwprop.cc   | 1 +
 gcc/late-combine.cc | 1 +
 gcc/pair-fusion.cc  | 1 +
 gcc/rtl-ssa.h   | 1 +
 gcc/rtl-ssa/accesses.cc | 1 +
 gcc/rtl-ssa/blocks.cc   | 1 +
 gcc/rtl-ssa/changes.cc  | 1 +
 gcc/rtl-ssa/functions.cc| 1 +
 gcc/rtl-ssa/insns.cc| 1 +
 gcc/rtl-ssa/movement.cc | 1 +
 15 files changed, 15 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-cc-fusion.cc 
b/gcc/config/aarch64/aarch64-cc-fusion.cc
index e97c26682d0..3af8c00d846 100644
--- a/gcc/config/aarch64/aarch64-cc-fusion.cc
+++ b/gcc/config/aarch64/aarch64-cc-fusion.cc
@@ -63,6 +63,7 @@
 
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index 99324423ee5..5f269d029b4 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -40,6 +40,7 @@
 
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index 71d6f6a0495..caf5a93b234 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #define IN_TARGET_CODE 1
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 
 #include "config.h"
 #include "system.h"
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index bbea2b5fd4f..017efa8bc17 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #define IN_TARGET_CODE 1
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 
 #include "config.h"
 #include "system.h"
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index a1ede418c21..0cb36aae09b 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -4405,6 +4405,7 @@ A pass that wants to use the RTL SSA form should start 
with the following:
 @smallexample
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index bfdc7a1b749..2ebb2f146cc 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/late-combine.cc b/gcc/late-combine.cc
index 789d734692a..2b62e2956ed 100644
--- a/gcc/late-combine.cc
+++ b/gcc/late-combine.cc
@@ -30,6 +30,7 @@
 
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 31d2c21c88f..cb0374f426b 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -21,6 +21,7 @@
 #define INCLUDE_FUNCTIONAL
 #define INCLUDE_LIST
 #define INCLUDE_TYPE_TRAITS
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/rtl-ssa.h b/gcc/rtl-ssa.h
index 2718d97b6d9..0c0c71d8c32 100644
--- a/gcc/rtl-ssa.h
+++ b/gcc/rtl-ssa.h
@@ -27,6 +27,7 @@
 // Files that use this one should first have:
 #define INCLUDE_ALGORITHM
 #define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "s

Re: PING^3 [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-07-25 Thread Giuliano Belinassi
Pinging this again.

Em seg, 2024-07-22 às 17:18 +0800, Kewen.Lin escreveu:
> Hi,
> 
> Gentle ping this patch:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html
> 
> BR,
> Kewen
> 
> on 2024/7/12 00:15, Martin Jambor wrote:
> > Hi,
> > 
> > can I add myself to the bunch of people who are pinging this? 
> > Having
> > this in will make our life easier.
> > 
> > Thanks a lot,
> > 
> > Martin
> > 
> > 
> > On Wed, May 08 2024, Kewen.Lin wrote:
> > > Hi,
> > > 
> > > As the discussion in PR112980, although the current
> > > implementation for -fpatchable-function-entry* conforms
> > > with the documentation (making N NOPs be consecutive),
> > > it's inefficient for both kernel and userspace livepatching
> > > (see comments in PR for the details).
> > > 
> > > So this patch is to change the current implementation by
> > > emitting the "before" NOPs before global entry point and
> > > the "after" NOPs after local entry point.  The new behavior
> > > would not keep NOPs to be consecutive, so the documentation
> > > is updated to emphasize this.
> > > 
> > > Bootstrapped and regress-tested on powerpc64-linux-gnu
> > > P8/P9 and powerpc64le-linux-gnu P9 and P10.
> > > 
> > > Is it ok for trunk?  And backporting to active branches
> > > after burn-in time?  I guess we should also mention this
> > > change in changes.html?
> > > 
> > > BR,
> > > Kewen
> > > -
> > >   PR target/112980
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * config/rs6000/rs6000-logue.cc
> > > (rs6000_output_function_prologue):
> > >   Adjust the handling on patch area emitting with dual
> > > entry, remove
> > >   the restriction on "before" NOPs count, not emit
> > > "before" NOPs any
> > >   more but only emit "after" NOPs.
> > >   * config/rs6000/rs6000.cc
> > > (rs6000_print_patchable_function_entry):
> > >   Adjust by respecting cfun->machine-
> > > >stop_patch_area_print.
> > >   (rs6000_elf_declare_function_name): For ELFv2 with dual
> > > entry, set
> > >   cfun->machine->stop_patch_area_print as true.
> > >   * config/rs6000/rs6000.h (struct machine_function):
> > > Remove member
> > >   global_entry_emitted, add new member
> > > stop_patch_area_print.
> > >   * doc/invoke.texi (option -fpatchable-function-entry):
> > > Adjust the
> > >   documentation for PowerPC ELFv2 dual entry.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * c-c++-common/patchable_function_entry-default.c:
> > > Adjust.
> > >   * gcc.target/powerpc/pr99888-4.c: Likewise.
> > >   * gcc.target/powerpc/pr99888-5.c: Likewise.
> > >   * gcc.target/powerpc/pr99888-6.c: Likewise.
> > > ---
> > >  gcc/config/rs6000/rs6000-logue.cc | 40 +
> > > --
> > >  gcc/config/rs6000/rs6000.cc   | 15 +--
> > >  gcc/config/rs6000/rs6000.h    | 10 +++--
> > >  gcc/doc/invoke.texi   |  8 ++--
> > >  .../patchable_function_entry-default.c    |  3 --
> > >  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
> > >  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
> > >  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
> > >  8 files changed, 33 insertions(+), 55 deletions(-)
> > > 
> > > diff --git a/gcc/config/rs6000/rs6000-logue.cc
> > > b/gcc/config/rs6000/rs6000-logue.cc
> > > index 60ba15a8bc3..0eb019b44b3 100644
> > > --- a/gcc/config/rs6000/rs6000-logue.cc
> > > +++ b/gcc/config/rs6000/rs6000-logue.cc
> > > @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE
> > > *file)
> > >     fprintf (file, "\tadd 2,2,12\n");
> > >   }
> > > 
> > > -  unsigned short patch_area_size = crtl->patch_area_size;
> > > -  unsigned short patch_area_entry = crtl->patch_area_entry;
> > > -  /* Need to emit the patching area.  */
> > > -  if (patch_area_size > 0)
> > > - {
> > > -   cfun->machine->global_entry_emitted = true;
> > > -   /* As ELFv2 ABI shows, the allowable bytes between the
> > > global
> > > -  and local entry points are 0, 4, 8, 16, 32 and 64
> > > when
> > > -  there is a local entry point.  Considering there
> > > are two
> > > -  non-prefixed instructions for global entry point
> > > prologue
> > > -  (8 bytes), the count for patchable nops before
> > > local entry
> > > -  point would be 2, 6 and 14.  It's possible to
> > > support those
> > > -  other counts of nops by not making a local entry
> > > point, but
> > > -  we don't have clear use cases for them, so leave
> > > them
> > > -  unsupported for now.  */
> > > -   if (patch_area_entry > 0)
> > > -     {
> > > -   if (patch_area_entry != 2
> > > -   && patch_area_entry != 6
> > > -   && patch_area_entry != 14)
> > > - error ("unsupported number of nops before
> > > function entry (%u)",
> > > -    patch_area_entry);
> > > -   rs6000_print_patchable_function_entry (file,
> > > patch_area_entry,
> > > -  true);
> > > -   patch_area_size -= patch_area_ent

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-25 Thread Kyrylo Tkachov


> On 25 Jul 2024, at 13:58, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the reg 
>> operands of the compare and if-then-else expressions. As it is not legal to 
>> have different modes in the operand registers, I only added one check for 
>> each of the expressions.
>> The updated patch was bootstrapped and tested again.
>> Best,
>> Jennifer
>> 
>> From 8da609be99fece8130cf1429bd938b2a26c6672b Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz 
>> Date: Wed, 24 Jul 2024 06:13:59 -0700
>> Subject: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2
>> 
>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>> implemented so far. This patch implements and tests the two fusion pairs.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>> 
>>  * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>  fusion logic.
>>  * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>  (cmp+cset): Likewise.
>>  * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>  field fusible_ops.
>> 
>> gcc/testsuite/
>> 
>>  * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>  * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
> 
> Thanks for the update.
> 
> It looks from a quick scan like the main three instructions associated
> with single-set integer COMPAREs are CMP, CMN and TST.  TST could be
> distinguished from CMP and CMN based on get_attr_type (), although it
> looks like:
> 
> (define_insn "*and_compare0"
>  [(set (reg:CC_Z CC_REGNUM)
>(compare:CC_Z
> (match_operand:SHORT 0 "register_operand" "r")
> (const_int 0)))]
>  ""
>  "tst\\t%0, "
>  [(set_attr "type" "alus_imm")]
> )

We can change that independently.

> 
> should use logics_imm instead of alus_imm.
> 
> Alternatively, we could add a new attribute for "compare_type" and use
> that.  That would make the test in aarch_macro_fusion_pair_p slightly
> simpler, since it could use get_attr_compare_type without having to
> look at the pattern of prev_set.  But there's a danger that we'd
> forget to add the new attribute for new comparison instructions.
> 
> I did wonder whether we could simply punt on CC_Zmode, but that's
> not a reliable test.
> 
> But I suppose the counter-argument to my questions above is: how bad
> would it be if we fused CMN and TST?  They are at least plausible
> fusions, so it probably doesn't matter if we include them too.

CMN and TST can be fused with conditional branches, but not with CSEL according 
to my reading of the SWOG so I guess we’d want to keep them separate in 
principle. In practice, I can’t imagine the performance difference will be 
measurable in real workloads if they are kept together.
Jennifer’s benchmarking of this patch didn’t show any negative performance 
consequences of the more aggressive fusion, and even a slight improvement.


> 
> So:
> 
>> ---
>> gcc/config/aarch64/aarch64-fusion-pairs.def   |  2 ++
>> gcc/config/aarch64/aarch64.cc | 22 +
>> gcc/config/aarch64/tuning_models/neoversev2.h |  5 ++-
>> .../gcc.target/aarch64/fuse_cmp_csel.c| 33 +++
>> .../gcc.target/aarch64/fuse_cmp_cset.c| 31 +
>> 5 files changed, 92 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> index 9a43b0c8065..bf5e85ba8fe 100644
>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> @@ -37,5 +37,7 @@ AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
>> AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
>> AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ)
>> AARCH64_FUSION_PAIR ("addsub_2reg_const1", ADDSUB_2REG_CONST1)
>> +AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL)
>> +AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>> 
>> #undef AARCH64_FUSION_PAIR
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 9e51236ce9f..7a0351f7dac 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -27348,6 +27348,28 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
>> *curr)
>>   && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>> return true;
>> 
>> +  /* Fuse CMP and CSEL.  */
> 
> How about adding something like:
> 
>  The test for CMP is not exact, and includes things like CMN and TST
>  as well.  However, fusing wit

Re: [PATCH v2] RISC-V: Add basic support for the Zacas extension

2024-07-25 Thread Andrea Parri
On Tue, Jul 23, 2024 at 05:15:44PM -0700, Patrick O'Neill wrote:
> From: Gianluca Guida 
> 
> This patch adds support for amocas.{b|h|w|d}. Support for amocas.q
> (64/128 bit cas for rv32/64) will be added in a future patch.
> 
> Extension: https://github.com/riscv/riscv-zacas
> Ratification: https://jira.riscv.org/browse/RVS-680
> 
> gcc/ChangeLog:
> 
>   * common/config/riscv/riscv-common.cc
>   (riscv_subset_list::to_string): Skip zacas when not supported by
>   the assembler.
>   * config.in: Regenerate.
>   * config/riscv/arch-canonicalize: Make zacas imply zaamo.
>   * config/riscv/riscv.opt: Add zacas.
>   * config/riscv/sync.md (zacas_atomic_cas_value): New pattern.
>   (atomic_compare_and_swap): Use new pattern for compare-and-swap 
> ops.
>   (zalrsc_atomic_cas_value_strong): Rename atomic_cas_value_strong.
>   * configure: Regenerate.
>   * configure.ac: Regenerate.
>   * doc/sourcebuild.texi: Add Zacas documentation.
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp: Add zacas testsuite infra support.
>   * 
> gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c:
>   Remove zacas to continue to test the lr/sc pairs.
>   * gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
> Ditto.
>   * 
> gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
> Ditto.
>   * 
> gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-release.c: 
> Ditto.
>   * 
> gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst-relaxed.c: 
> Ditto.
>   * gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst.c: 
> Ditto.
>   * gcc.target/riscv/amo/zabha-zacas-preferred-over-zalrsc.c: New test.
>   * gcc.target/riscv/amo/zacas-char-requires-zabha.c: New test.
>   * gcc.target/riscv/amo/zacas-char-requires-zacas.c: New test.
>   * gcc.target/riscv/amo/zacas-preferred-over-zalrsc.c: New test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acq-rel.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acquire.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-relaxed.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-release.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-seq-cst.c: New 
> test.
>   * 
> gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-compatability-mapping-no-fence.c:
>   New test.
>   * 
> gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-compatability-mapping.cc: 
> New test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acq-rel.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acquire.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-relaxed.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-release.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-seq-cst.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acq-rel.c: 
> New test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acquire.c: 
> New test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-relaxed.c: 
> New test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-release.c: 
> New test.
>   * gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-seq-cst.c: 
> New test.
>   * gcc.target/riscv/amo/zacas-ztso-compare-exchange-char-seq-cst.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-ztso-compare-exchange-char.c: New test.
>   * 
> gcc.target/riscv/amo/zacas-ztso-compare-exchange-compatability-mapping-no-fence.c:
>   New test.
>   * 
> gcc.target/riscv/amo/zacas-ztso-compare-exchange-compatability-mapping.cc: 
> New test.
>   * gcc.target/riscv/amo/zacas-ztso-compare-exchange-int-seq-cst.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-ztso-compare-exchange-int.c: New test.
>   * gcc.target/riscv/amo/zacas-ztso-compare-exchange-short-seq-cst.c: New 
> test.
>   * gcc.target/riscv/amo/zacas-ztso-compare-exchange-short.c: New test.
> 
> Co-authored-by: Patrick O'Neill 

LGTM; feel free to add:


[PATCH] tree-optimization/116083 - improve behavior when SLP discovery limit is reached

2024-07-25 Thread Richard Biener
The following avoids some useless work when the SLP discovery limit
is reached, for example allocating a node to cache the failure
and starting discovery on split store groups when analyzing BBs.

It does not address the issue in the PR which is a gratious budget
for discovery when the store group size approaches the number of
overall statements.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116083
* tree-vect-slp.cc (vect_build_slp_tree): Do not allocate
a discovery fail node when we reached the discovery limit.
(vect_build_slp_instance): Terminate early when the
discovery limit is reached.
---
 gcc/tree-vect-slp.cc | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 8c7dfc4feca..7da5853adf6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1756,13 +1756,6 @@ vect_build_slp_tree (vec_info *vinfo,
   return NULL;
 }
 
-  /* Seed the bst_map with a stub node to be filled by vect_build_slp_tree_2
- so we can pick up backedge destinations during discovery.  */
-  slp_tree res = new _slp_tree;
-  SLP_TREE_DEF_TYPE (res) = vect_internal_def;
-  SLP_TREE_SCALAR_STMTS (res) = stmts;
-  bst_map->put (stmts.copy (), res);
-
   /* Single-lane SLP doesn't have the chance of run-away, do not account
  it to the limit.  */
   if (stmts.length () > 1)
@@ -1772,18 +1765,19 @@ vect_build_slp_tree (vec_info *vinfo,
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "SLP discovery limit exceeded\n");
- /* Mark the node invalid so we can detect those when still in use
-as backedge destinations.  */
- SLP_TREE_SCALAR_STMTS (res) = vNULL;
- SLP_TREE_DEF_TYPE (res) = vect_uninitialized_def;
- res->failed = XNEWVEC (bool, group_size);
- memset (res->failed, 0, sizeof (bool) * group_size);
  memset (matches, 0, sizeof (bool) * group_size);
  return NULL;
}
   --*limit;
 }
 
+  /* Seed the bst_map with a stub node to be filled by vect_build_slp_tree_2
+ so we can pick up backedge destinations during discovery.  */
+  slp_tree res = new _slp_tree;
+  SLP_TREE_DEF_TYPE (res) = vect_internal_def;
+  SLP_TREE_SCALAR_STMTS (res) = stmts;
+  bst_map->put (stmts.copy (), res);
+
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
 "starting SLP discovery for node %p\n", (void *) res);
@@ -3368,6 +3362,10 @@ vect_build_slp_instance (vec_info *vinfo,
 /* ???  We need stmt_info for group splitting.  */
 stmt_vec_info stmt_info_)
 {
+  /* If there's no budget left bail out early.  */
+  if (*limit == 0)
+return false;
+
   if (kind == slp_inst_kind_ctor)
 {
   if (dump_enabled_p ())
@@ -3525,7 +3523,7 @@ vect_build_slp_instance (vec_info *vinfo,
 
   stmt_vec_info stmt_info = stmt_info_;
   /* Try to break the group up into pieces.  */
-  if (kind == slp_inst_kind_store)
+  if (*limit > 0 && kind == slp_inst_kind_store)
 {
   /* ???  We could delay all the actual splitting of store-groups
 until after SLP discovery of the original group completed.
-- 
2.43.0


[PATCH] fold: Allow SSA names in inverse_conditions_p and fold VCOND_MASK.

2024-07-25 Thread Robin Dapp
Hi,

In preparation for the maskload else operand I split off this patch.  The patch
looks through SSA names for the conditions passed to inverse_conditions_p which
helps match.pd recognize more redundant vec_cond expressions.  It also adds
VCOND_MASK to the respective iterators in match.pd.

Is this acceptable without a separate test?  There will of course be several
hits once we emit VEC_COND_EXPRs after masked loads.

The initial version of the patch looked "through" each condition individually.
That caused the following problem on p10 during phiopt:

 foo = blah <= 0
 cond2: foo ? c : x
 cond1: blah > 0 ? b : cond1
 -> (match.pd:6205)
 res = blah > 0 ? b : c
 which is invalid gimple (as blah > 0 is directly used and not put in
 a variable).

Therefore, for now, I restricted the SSA_NAME check to both conditions
simultaneously so we don't run into this situation.  There must be a better
way, though?

Bootstrapped and regtested on x86, aarch64 and power10.
Regtested on armv8.8-a+sve using qemu as well as riscv64.

Regards
 Robin

gcc/ChangeLog:

* fold-const.cc (inverse_conditions_p): Look through SSA names.
* match.pd: Add VCOND_MASK to "cond" iterators.
---
 gcc/fold-const.cc | 22 ++
 gcc/match.pd  | 28 +++-
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 83c32dd10d4..1fc5d97dccc 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec-perm-indices.h"
 #include "asan.h"
 #include "gimple-range.h"
+#include "cfgexpand.h"
 
 /* Nonzero if we are folding constants inside an initializer or a C++
manifestly-constant-evaluated context; zero otherwise.
@@ -3010,6 +3011,27 @@ compcode_to_comparison (enum comparison_code code)
 bool
 inverse_conditions_p (const_tree cond1, const_tree cond2)
 {
+  /* If both conditions are SSA names, look through them.
+ Right now callees in match use one of the conditions directly and
+ we might end up having one in a COND_EXPR like
+   res = a > b ? c : d
+ instead of
+   cnd = a > b
+   res = cnd ? c : d.
+
+ Therefore always consider both conditions simultaneously.  */
+  if (TREE_CODE (cond1) == SSA_NAME
+  && TREE_CODE (cond2) == SSA_NAME)
+{
+  gimple *gcond1 = SSA_NAME_DEF_STMT (cond1);
+  if (is_gimple_assign (gcond1))
+   cond1 = gimple_assign_rhs_to_tree (gcond1);
+
+  gimple *gcond2 = SSA_NAME_DEF_STMT (cond2);
+  if (is_gimple_assign (gcond2))
+   cond2 = gimple_assign_rhs_to_tree (gcond2);
+}
+
   return (COMPARISON_CLASS_P (cond1)
  && COMPARISON_CLASS_P (cond2)
  && (invert_tree_comparison
diff --git a/gcc/match.pd b/gcc/match.pd
index cf359b0ec0f..f244e6deff5 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5601,7 +5601,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
 /* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
 /* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
-(for cnd (cond vec_cond)
+(for cnd (cond vec_cond IFN_VCOND_MASK)
  (for eqne (eq ne)
   (simplify
(eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
@@ -5614,14 +5614,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
types are compatible.  */
-(simplify
- (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2)
- (if (VECTOR_BOOLEAN_TYPE_P (type)
-  && types_match (type, TREE_TYPE (@0)))
-  (if (integer_zerop (@1) && integer_all_onesp (@2))
-   (bit_not @0)
-   (if (integer_all_onesp (@1) && integer_zerop (@2))
-@0
+(for cnd (vec_cond IFN_VCOND_MASK)
+ (simplify
+  (cnd @0 VECTOR_CST@1 VECTOR_CST@2)
+  (if (VECTOR_BOOLEAN_TYPE_P (type)
+   && types_match (type, TREE_TYPE (@0)))
+   (if (integer_zerop (@1) && integer_all_onesp (@2))
+(bit_not @0)
+(if (integer_all_onesp (@1) && integer_zerop (@2))
+ @0)
 
 /* A few simplifications of "a ? CST1 : CST2". */
 /* NOTE: Only do this on gimple as the if-chain-to-switch
@@ -6049,7 +6050,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
{ build_int_cst (integer_type_node, prec - 1);}))
 #endif
 
-(for cnd (cond vec_cond)
+(for cnd (cond vec_cond IFN_VCOND_MASK)
  /* (a != b) ? (a - b) : 0 -> (a - b) */
  (simplify
   (cnd (ne:c @0 @1) (minus@2 @0 @1) integer_zerop)
@@ -6185,7 +6186,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (TYPE_UNSIGNED (type))
   (cond (ge @0 @1) (negate @0) @2)))
 
-(for cnd (cond vec_cond)
+(for cnd (cond vec_cond IFN_VCOND_MASK)
  /* A ? B : (A ? X : C) -> A ? B : C.  */
  (simplify
   (cnd @0 (cnd @0 @1 @2) @3)
@@ -6210,8 +6211,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* A ? B : B -> B.  */
  (simplify
   (cnd @0 @1 @1)
-  @1)
+  @1))
 
+(for cnd (cond vec_cond)
  /* !A ? B : C -> A ? C : B.  */
  (simplify
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
@@ -6232,7 +6234,7 

Re: [PATCH v3 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-25 Thread Tobias Burnus

Hi Sandra,

thanks for your patch. (Disclaimer: I have not finished reading through 
your patch.)


Some upfront generic remarks:

[* When first compiling it (incremental build), I did run into the issue 
that OMP_METADIRECTIVE_CHECK wasn't declared. Thus, there seems to be a 
dependency issue causing that tree-check.h might generated after code 
that includes tree.h is processed. (Unrelated to your patch itself, but 
for completeness …)]


* Not required right now, but eventually we need to check whether 
https://gcc.gnu.org/PR112779 is fully fixed by this patch set or whether 
follow-up work is required (and if so which). There is also PR107067 for 
a Fortran ICE.


* There are some not-implemented/FIXME comments in the patches for 
missing features. I think we should ensure that those won't get 
forgotten, e.g. by filing PRs for those. – For declare variant, some PRs 
might already exist.


Can you eventually take care of the last two items?

(For the last item: e.g. 'target_device' for declare_variant, for which 
'sorry' already existed.)


* * *

I might have asked the following question before – and you might have 
answered it already:


Sandra Loosemore wrote:


This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.


I have to admit that I do not understand the part:


+  else if (set == OMP_TRAIT_SET_TARGET_DEVICE)
+/* The target_device set is dynamic, so treat it as always
+   resolvable.  */
+continue;
+


The current code has 3 states:

* 0 - if a trait is false; this directly returns as it cannot be fixed later

* 1 - if the all traits are known to match (initial value)

* -1 - if one trait cannot be evaluated, either because it is too early 
(e.g. during parsing) or because it is a dynamic context selector.


Thus, I had expected:

(a) ret = -1 as default in this case (not known)

(b) for cases where it is known, a 'return 0' / not-setting -1. In 
particular:


* n == const → device_num(n) – false if '< -1' and, for 
'!ENABLE_OFFLOADING || offload_targets == NULL' either false for n > 0 
or otherwise false.


* Checks similar to OMP_TRAIT_DEVICE_{KIND,ARCH,ISA}, i.e. kind(any) → 
true, kind(fpga) → false, arch(something_unknown) → false if not true 
for any device. With '!ENABLE_OFFLOADING || offload_targets == NULL', 
the kind_arch_isa check can be done as for the host.


* * *

Have I missed something and is it sensible to return 1 instead of -1 here?

* * *



@@ -1804,6 +1834,12 @@ omp_context_selector_matches (tree ctx)


   case OMP_TRAIT_USER_CONDITION:
 if (set == OMP_TRAIT_SET_USER)

for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
  if (OMP_TP_NAME (p) == NULL_TREE)
{
+ /* OpenMP 5.1 allows non-constant conditions for
+metadirectives.  */
+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
+

 if (integer_zerop (OMP_TP_VALUE (p)))
   return 0;
 if (integer_nonzerop (OMP_TP_VALUE (p)))
   break;
 ret = -1;
   }



* Comment wording: Please change to imply >= 5.1 not == 5.0 * Comment: I 
don't see why the non-const only applies to metadirectives; the OpenMP 
>= 5.1 seems to imply that it is also valid for declare variant. Thus, 
I would change the wording. * The current code seems to already handle 
non-const values as expected. ... except that it changes "res" to -1, 
while the idea seems to be not to modify 'ret' in this case for 
metadirectives. (Why? Same question as above).

* * *

Quotes from the specifications regarding the expressions:

The current spec has:

"Restrictions to context selectors are as follows:" …

"A variable or procedure that is referenced in an expression that 
appears in a context selector
must be visible at the location of the directive on which the context 
selector appears unless
the directive is a declare_variant directive and the variable is an 
argument of the

associated base function."

5.1 wording is the following (approx. same except for argument bit):

"All variables that are referenced in an expression that appears in
the context selector of a match clause must be accessible at a call site 
to the base function

according to the base language rules."

5.0 had (e.g. for C): "The condition(boolean-expr) selector defines a 
constant expression that must evaluate to true for the selector to be true."


* * *


+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
+
  if (integer_zerop (OMP_TP_VA

Re: [PATCH 2/2] cp+coroutines: teach convert_to_void to diagnose discarded co_awaits

2024-07-25 Thread Jason Merrill

On 7/24/24 4:52 PM, Arsen Arsenović wrote:

Jason Merrill  writes:


Ah, of course, I was overlooking the assignment.  The patch is OK.


Thanks.  Here's a range diff with a few changes to the commits, chiefly
in the commit messages.  If you agree, I can push with these changes
applied:


Looks good, thanks.


1:  32f810cca55 ! 1:  d2e74525965 cp/coroutines: do not rewrite parameters in 
unevaluated contexts
 @@ Commit message
  parameters.  Prevent this by simply skipping rewrites in unevaluated
  contexts.  Those won't mind the value not being present anyway.
  
 +This prevents an ICE during parameter substitution.  In the testcase

 +from the PR, the rewriting machinery finds a param in the body of the
 +coroutine, which it did not previously encounter while processing the
 +coroutine declaration, and that does not have a DECL_VALUE_EXPR, and
 +fails.
 +
  gcc/cp/ChangeLog:
  
  PR c++/111728

 @@ gcc/cp/coroutines.cc: rewrite_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_
   
  +  if (unevaluated_p (TREE_CODE (*stmt)))

  +{
 -+  *do_subtree = 0; // Nothing to do.
 ++  /* No odr-uses in unevaluated operands.  */
 ++  *do_subtree = 0;
  +  return NULL_TREE;
  +}
  +
2:  a16f1d34047 ! 2:  adc77c732f5 cp+coroutines: teach convert_to_void to 
diagnose discarded co_awaits
 @@ Commit message
  as such, should inherit its nodiscard.  A discarded co_await 
expression
  should, hence, act as if its call to await_resume was discarded.
  
 -CO_AWAIT_EXPR trees do conveniently contain the expression for calling

 -await_resume in them, so we can discard it.
 +This patch teaches convert_to_void how to discard 'through' a
 +CO_AWAIT_EXPR. When we discard a CO_AWAIT_EXPR, we can also just 
discard
 +the await_resume() call conveniently embedded within it.  This results
 +in a [[nodiscard]] diagnostic that the PR noted was missing.
  
  gcc/cp/ChangeLog:
  
Thanks again, have a lovely evening.




[PATCH v2 0/3] aarch64: Add initial support for +fp8 arch extensions

2024-07-25 Thread Claudio Bantaloukas


This series introduces initial flags and functionality for the fp8 feature.

Specifically, the following are added:
- functions that enable constructing valid fpm register values.
- support for the '+fp8' -march modifier.
- support for reading and writing the new system register FPMR (Floating Point 
Mode
  Register) which configures the new FP8 features

Tested against aarch64-unknown-linux-gnu.

V1 of this patch series had "aarch64: Add march flags for +fp8 arch extensions" 
as
cover letter title. Since then, changes in V2 are:

aarch64: Add march flags for +fp8 arch extensions
- Removed __ARM_FEATURE_FP8 define: will be added once the relevant features 
are in.
- Some unnecessary whitespace changes were removed.
- Helper function names now begin with __arm.

aarch64: Add support for moving fpm system register
- Removed a misleading comment.
- Removed unnecessary modifier in .md

aarch64: Add fpm register helper functions.
- Helper functions and fpm_t types are available unconditionally when including 
arm_acle.h

Is this ok for master? I do not have merge permissions. Can someone merge this 
for me please?

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (3):
  aarch64: Add march flags for +fp8 arch extensions
  aarch64: Add support for moving fpm system register
  aarch64: Add fpm register helper functions.

 .../aarch64/aarch64-option-extensions.def |   2 +
 gcc/config/aarch64/aarch64.cc |   8 ++
 gcc/config/aarch64/aarch64.h  |  17 ++-
 gcc/config/aarch64/aarch64.md |  30 +++--
 gcc/config/aarch64/arm_acle.h |  33 +
 gcc/config/aarch64/constraints.md |   3 +
 gcc/doc/invoke.texi   |   2 +
 .../gcc.target/aarch64/acle/fp8-helpers.c |  52 
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 124 ++
 9 files changed, 257 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c

-- 
2.43.0



[PATCH v2 1/3] aarch64: Add march flags for +fp8 arch extensions

2024-07-25 Thread Claudio Bantaloukas

This introduces the relevant flags to enable access to the fpmr register and 
fp8 intrinsics, which will be added subsequently.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (fp8): New.
* config/aarch64/aarch64.h (TARGET_FP8): Likewise.
* doc/invoke.texi (AArch64 Options): Document new -march flags
and extensions.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fp8.c: New test.
---
 .../aarch64/aarch64-option-extensions.def |  2 ++
 gcc/config/aarch64/aarch64.h  |  3 +++
 gcc/doc/invoke.texi   |  2 ++
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 21 +++
 4 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 42ec0eec31e..6998627f377 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
 AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
 
+AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
+
 #undef AARCH64_OPT_FMV_EXTENSION
 #undef AARCH64_OPT_EXTENSION
 #undef AARCH64_FMV_FEATURE
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b7e330438d9..2e75c6b81e2 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -463,6 +463,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
  && (aarch64_tune_params.extra_tuning_flags \
  & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
+/* fp8 instructions are enabled through +fp8.  */
+#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e0a641213ae..f293d49c61a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21843,6 +21843,8 @@ Enable support for Armv9.4-a Guarded Control Stack extension.
 Enable support for Armv8.9-a/9.4-a translation hardening extension.
 @item rcpc3
 Enable the RCpc3 (Release Consistency) extension.
+@item fp8
+Enable the fp8 (8-bit floating point) extension.
 
 @end table
 
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
new file mode 100644
index 000..4113758aa25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
@@ -0,0 +1,21 @@
+/* Test the fp8 ACLE intrinsics family.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -march=armv8-a" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+#ifdef __ARM_FEATURE_FP8
+#error "__ARM_FEATURE_FP8 feature macro defined."
+#endif
+
+#pragma GCC push_options
+#pragma GCC target("arch=armv9.4-a+fp8")
+
+/* We do not define __ARM_FEATURE_FP8 until all
+   relevant features have been added. */
+#ifdef __ARM_FEATURE_FP8
+#error "__ARM_FEATURE_FP8 feature macro defined."
+#endif
+
+#pragma GCC pop_options


[PATCH v2 3/3] aarch64: Add fpm register helper functions.

2024-07-25 Thread Claudio Bantaloukas

The ACLE declares several helper types and functions to
facilitate construction of `fpm` arguments.

gcc/ChangeLog:

* config/aarch64/arm_acle.h (fpm_t): New type representing fpmr values.
(enum __ARM_FPM_FORMAT): New enum representing valid fp8 formats.
(enum __ARM_FPM_OVERFLOW): New enum representing how some fp8
calculations work.
(arm_fpm_init): New.
(arm_set_fpm_src1_format): Likewise.
(arm_set_fpm_src2_format): Likewise.
(arm_set_fpm_dst_format): Likewise.
(arm_set_fpm_overflow_cvt): Likewise.
(arm_set_fpm_overflow_mul): Likewise.
(arm_set_fpm_lscale): Likewise.
(arm_set_fpm_lscale2): Likewise.
(arm_set_fpm_nscale): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fp8-helpers.c: New test of fpmr helper 
functions.
---
 gcc/config/aarch64/arm_acle.h | 33 
 .../gcc.target/aarch64/acle/fp8-helpers.c | 52 +++
 2 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c

diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 2aa681090fa..fd4fa855b90 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -385,6 +385,39 @@ __rndrrs (uint64_t *__res)
 
 #pragma GCC pop_options
 
+typedef uint64_t fpm_t;
+
+enum __ARM_FPM_FORMAT
+{
+  __ARM_FPM_E5M2,
+  __ARM_FPM_E4M3,
+};
+
+enum __ARM_FPM_OVERFLOW
+{
+  __ARM_FPM_INFNAN,
+  __ARM_FPM_SATURATE,
+};
+
+#define __arm_fpm_init() (0)
+
+#define __arm_set_fpm_src1_format(__fpm, __format) \
+  ((__fpm & ~(uint64_t)0x7) | (__format & (uint64_t)0x7))
+#define __arm_set_fpm_src2_format(__fpm, __format) \
+  ((__fpm & ~((uint64_t)0x7 << 3)) | ((__format & (uint64_t)0x7) << 3))
+#define __arm_set_fpm_dst_format(__fpm, __format) \
+  ((__fpm & ~((uint64_t)0x7 << 6)) | ((__format & (uint64_t)0x7) << 6))
+#define __arm_set_fpm_overflow_cvt(__fpm, __behaviour) \
+  ((__fpm & ~((uint64_t)0x1 << 15)) | ((__behaviour & (uint64_t)0x1) << 15))
+#define __arm_set_fpm_overflow_mul(__fpm, __behaviour) \
+  ((__fpm & ~((uint64_t)0x1 << 14)) | ((__behaviour & (uint64_t)0x1) << 14))
+#define __arm_set_fpm_lscale(__fpm, __scale) \
+  ((__fpm & ~((uint64_t)0x7f << 16)) | ((__scale & (uint64_t)0x7f) << 16))
+#define __arm_set_fpm_lscale2(__fpm, __scale) \
+  ((__fpm & ~((uint64_t)0x3f << 32)) | ((__scale & (uint64_t)0x3f) << 32))
+#define __arm_set_fpm_nscale(__fpm, __scale) \
+  ((__fpm & ~((uint64_t)0xff << 24)) | ((__scale & (uint64_t)0xff) << 24))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c
new file mode 100644
index 000..e235c3621d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers.c
@@ -0,0 +1,52 @@
+/* Test the fp8 ACLE helper functions.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv9.4-a+fp8" } */
+
+#include 
+
+void
+test_prepare_fpmr_sysreg ()
+{
+
+#define _S_EQ(expr, expected)  \
+  _Static_assert (expr == expected, #expr " == " #expected)
+
+  _S_EQ (__arm_fpm_init (), 0);
+
+  /* Bits [2:0] */
+  _S_EQ (__arm_set_fpm_src1_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
+  _S_EQ (__arm_set_fpm_src1_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x1);
+
+  /* Bits [5:3] */
+  _S_EQ (__arm_set_fpm_src2_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
+  _S_EQ (__arm_set_fpm_src2_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x8);
+
+  /* Bits [8:6] */
+  _S_EQ (__arm_set_fpm_dst_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
+  _S_EQ (__arm_set_fpm_dst_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x40);
+
+  /* Bit 14 */
+  _S_EQ (__arm_set_fpm_overflow_mul (__arm_fpm_init (), __ARM_FPM_INFNAN), 0);
+  _S_EQ (__arm_set_fpm_overflow_mul (__arm_fpm_init (), __ARM_FPM_SATURATE),
+	 0x4000);
+
+  /* Bit 15 */
+  _S_EQ (__arm_set_fpm_overflow_cvt (__arm_fpm_init (), __ARM_FPM_INFNAN), 0);
+  _S_EQ (__arm_set_fpm_overflow_cvt (__arm_fpm_init (), __ARM_FPM_SATURATE),
+	 0x8000);
+
+  /* Bits [22:16] */
+  _S_EQ (__arm_set_fpm_lscale (__arm_fpm_init (), 0), 0);
+  _S_EQ (__arm_set_fpm_lscale (__arm_fpm_init (), 127), 0x7F);
+
+  /* Bits [37:32] */
+  _S_EQ (__arm_set_fpm_lscale2 (__arm_fpm_init (), 0), 0);
+  _S_EQ (__arm_set_fpm_lscale2 (__arm_fpm_init (), 63), 0x3F);
+
+  /* Bits [31:24] */
+  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), 0), 0);
+  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), 127), 0x7F00);
+  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), -128), 0x8000);
+
+#undef _S_EQ
+}


[PATCH v2 2/3] aarch64: Add support for moving fpm system register

2024-07-25 Thread Claudio Bantaloukas

Unlike most system registers, fpmr can be heavily written to in code that
exercises the fp8 functionality. That is because every fp8 instrinsic call
can potentially change the value of fpmr.
Rather than just use a an unspec, we treat the fpmr system register like
all other registers and use a move operation to read and write to it.

We introduce a new class of moveable system registers that, currently,
only accepts fpmr and a new constraint, Umv, that allows us to
selectively use mrs and msr instructions when expanding rtl for them.
Given that there is code that depends on "real" registers coming before
"fake" ones, we introduce a new constant FPM_REGNUM that uses an
existing value and renumber registers below that.
This requires us to update the bitmaps that describe which registers
belong to each register class.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
support for MOVEABLE_SYSREGS class.
(aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
(aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
(aarch64_class_max_nregs): Likewise.
* config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
(CALL_REALLY_USED_REGISTERS): Likewise.
(REGISTER_NAMES): Likewise.
(enum reg_class): Add MOVEABLE_SYSREGS class.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
the new MOVEABLE_REGS class and renumbering of registers.
* config/aarch64/aarch64.md: (FPM_REGNUM): added new register
number, reusing old value.
(FFR_REGNUM): Renumber.
(FFRT_REGNUM): Likewise.
(LOWERING_REGNUM): Likewise.
(TPIDR2_BLOCK_REGNUM): Likewise.
(SME_STATE_REGNUM): Likewise.
(TPIDR2_SETUP_REGNUM): Likewise.
(ZA_FREE_REGNUM): Likewise.
(ZA_SAVED_REGNUM): Likewise.
(ZA_REGNUM): Likewise.
(ZT0_REGNUM): Likewise.
(*mov_aarch64): Add support for moveable sysregs.
(*movsi_aarch64): Likewise.
(*movdi_aarch64): Likewise.
* config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fp8.c: New tests.
---
 gcc/config/aarch64/aarch64.cc   |   8 ++
 gcc/config/aarch64/aarch64.h|  14 ++-
 gcc/config/aarch64/aarch64.md   |  30 --
 gcc/config/aarch64/constraints.md   |   3 +
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 103 
 5 files changed, 144 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e0cf382998c..9810f2c0390 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
 case PR_HI_REGS:
   return mode == VNx32BImode ? 2 : 1;
 
+case MOVEABLE_SYSREGS:
 case FFR_REGS:
 case PR_AND_FFR_REGS:
 case FAKE_REGS:
@@ -2045,6 +2046,9 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
 /* This must have the same size as _Unwind_Word.  */
 return mode == DImode;
 
+  if (regno == FPM_REGNUM)
+return mode == QImode || mode == HImode || mode == SImode || mode == DImode;
+
   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
   if (vec_flags == VEC_SVE_PRED)
 return pr_or_ffr_regnum_p (regno);
@@ -12680,6 +12684,9 @@ aarch64_regno_regclass (unsigned regno)
   if (PR_REGNUM_P (regno))
 return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
 
+  if (regno == FPM_REGNUM)
+return MOVEABLE_SYSREGS;
+
   if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
 return FFR_REGS;
 
@@ -13068,6 +13075,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
 case PR_HI_REGS:
   return mode == VNx32BImode ? 2 : 1;
 
+case MOVEABLE_SYSREGS:
 case STACK_REG:
 case FFR_REGS:
 case PR_AND_FFR_REGS:
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2e75c6b81e2..2dfb999bea5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -523,6 +523,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 1, 1, 1, 1,			/* SFP, AP, CC, VG */	\
 0, 0, 0, 0,   0, 0, 0, 0,   /* P0 - P7 */   \
 0, 0, 0, 0,   0, 0, 0, 0,   /* P8 - P15 */  \
+1,/* FPMR */		\
 1, 1,			/* FFR and FFRT */	\
 1, 1, 1, 1, 1, 1, 1, 1	/* Fake registers */	\
   }
@@ -547,6 +548,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 1, 1, 1, 0,			/* SFP, AP, CC, VG */	\
 1, 1, 1, 1,   1, 1, 1, 1,	/* P0 - P7 */		\
 1, 1, 1, 1,   1, 1, 1, 1,	/* P8 - P15 */		\
+1,/* FPMR */		\
 1, 1,			/* FFR and FFRT */	\
 0, 0, 0, 0, 0, 0, 0, 0	/* Fake registers */	\
   }
@@ -564,6 +566,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 "sfp", "ap",  "cc",  "vg",	\

[committed] Trivial testcase adjustment

2024-07-25 Thread Jeff Law
I made pr116037.c dependent on int32 just based on the constants used 
without noting the int128 vector type.  Naturally on targets that don't 
support int128 the test fails.  Fixed by changing the target selector 
from int32 to int128.


Pushed to the trunk.



Jeff



commit 2dd45655db47362153756261881413b368582597
Author: Jeff Law 
Date:   Thu Jul 25 08:42:04 2024 -0600

[committed] Trivial testcase adjustment

I made pr116037.c dependent on int32 just based on the constants used 
without
noting the int128 vector type.  Naturally on targets that don't support 
int128
the test fails.  Fixed by changing the target selector from int32 to int128.

Pushed to the trunk.

gcc/testsuite
* gcc.dg/torture/pr116037.c: Fix target selector.

diff --git a/gcc/testsuite/gcc.dg/torture/pr116037.c 
b/gcc/testsuite/gcc.dg/torture/pr116037.c
index cb34ba4e5d4..86ab50de4b2 100644
--- a/gcc/testsuite/gcc.dg/torture/pr116037.c
+++ b/gcc/testsuite/gcc.dg/torture/pr116037.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target int32 } */
+/* { dg-require-effective-target int128 } */
 /* { dg-additional-options "-Wno-psabi" } */
 
 typedef __attribute__((__vector_size__ (64))) unsigned char VC;


Re: [PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-25 Thread Jeff Law




On 7/24/24 9:15 AM, Christoph Müllner wrote:

auto_inc_dec (-O3) performs optimizations like the following
if RVV and XTheadMemIdx is enabled.

(insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  [(char *)_39]+0 S4 A32])
 (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
  (nil))
(insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
 (plus:DI (reg:DI 136 [ ivtmp.13 ])
 (const_int 20 [0x14]))) 5 {adddi3}
  (nil))
>
(insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
 (plus:DI (reg:DI 136 [ ivtmp.13 ])
 (const_int 20 [0x14]))) [0 MEM  [(char 
*)_39]+0 S4 A32])
 (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
  (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
 (nil)))

The reason why the pass believes that this is legal is,
that the mode test in th_memidx_classify_address_modify()
requires INTEGRAL_MODE_P (mode), which includes vector modes.

Let's restrict the mode test such, that only MODE_INT is allowed.

PR target/116033

gcc/ChangeLog:

* config/riscv/thead.cc (th_memidx_classify_address_modify):
Fix mode test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116033.c: New test.

OK
jeff



Re: [patch,avr] Implement PR116056: attribute signal(n) and interrupt(n)

2024-07-25 Thread Jeff Law




On 7/23/24 2:19 PM, Georg-Johann Lay wrote:

This patch adds support for arguments to the signal and interrupt
function attributes.  It allows to specify the ISR by means of the
associated IRQ number, in extension to the current attributes that
require to specify the ISR name like "__vector_1" as (assembly) name
for the function.  The new feature is more convenient, e.g. when the
ISR is implemented by a class method or in a namespace.  There is no
requirement that the ISR is externally visible.  The syntax is like:

__attribute__((signal(1, 2, ...), signal(3, 4, ...)))
[static] void isr_function (void)
{
     // Code
}

Ok for trunk?

Johann

--

AVR target 116056 - Support attribute signal(n) and interrupt(n).

This patch adds support for arguments to the signal and interrupt
function attributes.  It allows to specify the ISR by means of the
associated IRQ number, in extension to the current attributes that
require to specify the ISR name like "__vector_1" as (assembly) name
for the function.  The new feature is more convenient, e.g. when the
ISR is implemented by a class method or in a namespace.  There is no
requirement that the ISR is externally visible.  The syntax is like:

__attribute__((signal(1, 2, ...), signal(3, 4, ...)))
[static] void isr_function (void)
{
     // Code
}

 PR target/116056
gcc/
 * config/avr/avr.h (ASM_DECLARE_FUNCTION_NAME): New define.
 * config/avr/avr-protos.h (avr_declare_function_name): New proto.
 * config/avr/avr-c.cc (avr_cpu_cpp_builtins) <__HAVE_SIGNAL_N__>: New
 built-in macro.
 * config/avr/avr.cc (avr_declare_function_name): New function.
 (avr_attribute_table) : Allow any number of args.
 (avr_insert_attributes): Check validity of "signal" and "interrupt"
 arguments.
 (avr_foreach_function_attribute, avr_interrupt_signal_function)
 (avr_isr_number, avr_asm_isr_alias, avr_handle_isr_attribute):
 New static functions.
 (avr_interrupt_function): New from avr_interrupt_function_p.
 Adjust callers.
 (avr_signal_function): New from avr_signal_function_p.
 Adjust callers.
 (avr_set_current_function): Only diagnose non-__vector ISR names
 when "signal" or "interrupt" attribute has no args.
 (struct avr_fun_cookie): New.
 * doc/extend.texi (AVR Function Attributes): Document
 signal(num) and interrupt(num).
 * doc/invoke.texi (AVR Built-in Macros) <__HAVE_SIGNAL_N__>: Document.
gcc/testsuite/
 * gcc.target/avr/torture/signal_n-1.c: New test.
 * gcc.target/avr/torture/signal_n-2.c: New test.
 * gcc.target/avr/torture/signal_n-3.c: New test.
 * gcc.target/avr/torture/signal_n-4.cpp: New test.

Just a couple whitespace nits.





+  avr_foreach_function_attribute (func, attr_name,
+[] (tree, tree attr, void *cookie)
+{
+  int*pcook = (int*) cookie;

Whitespace nit "int *pcook = (int *) cookie" would be better

+




+
+static void
+avr_asm_isr_alias (tree /*func*/, tree attr, void *pv)
+{
+  avr_fun_cookie*cookie = (avr_fun_cookie*) pv;

Similarly.



No need to wait for another review.  Just fix the whitespace issues and 
you're good to go.


jeff






Re: [PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-25 Thread Christoph Müllner
Ok, also to backport to GCC 14?

On Thu, Jul 25, 2024 at 4:56 PM Jeff Law  wrote:
>
>
>
> On 7/24/24 9:15 AM, Christoph Müllner wrote:
> > auto_inc_dec (-O3) performs optimizations like the following
> > if RVV and XTheadMemIdx is enabled.
> >
> > (insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  > char> [(char *)_39]+0 S4 A32])
> >  (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 
> > 3183 {*movv4qi}
> >   (nil))
> > (insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
> >  (plus:DI (reg:DI 136 [ ivtmp.13 ])
> >  (const_int 20 [0x14]))) 5 {adddi3}
> >   (nil))
> > >
> > (insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
> >  (plus:DI (reg:DI 136 [ ivtmp.13 ])
> >  (const_int 20 [0x14]))) [0 MEM  [(char 
> > *)_39]+0 S4 A32])
> >  (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 
> > 3183 {*movv4qi}
> >   (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
> >  (nil)))
> >
> > The reason why the pass believes that this is legal is,
> > that the mode test in th_memidx_classify_address_modify()
> > requires INTEGRAL_MODE_P (mode), which includes vector modes.
> >
> > Let's restrict the mode test such, that only MODE_INT is allowed.
> >
> >   PR target/116033
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/thead.cc (th_memidx_classify_address_modify):
> >   Fix mode test.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/pr116033.c: New test.
> OK
> jeff
>


Re: [PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-25 Thread Jeff Law




On 7/25/24 9:06 AM, Christoph Müllner wrote:

Ok, also to backport to GCC 14?

Yes, of course.
jeff



Re: [PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-25 Thread Palmer Dabbelt

On Thu, 25 Jul 2024 08:10:25 PDT (-0700), jeffreya...@gmail.com wrote:



On 7/25/24 9:06 AM, Christoph Müllner wrote:

Ok, also to backport to GCC 14?

Yes, of course.


I'm OK with that, but according to the latest status report 
, we're 
between the RC and the release for 14.2 and the homepage is saying 
"frozen for release" (thanks to Andrew Pinski for pointing that out).


+Jakub


Re: [PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-25 Thread Christoph Müllner
On Thu, Jul 25, 2024 at 5:19 PM Palmer Dabbelt  wrote:
>
> On Thu, 25 Jul 2024 08:10:25 PDT (-0700), jeffreya...@gmail.com wrote:
> >
> >
> > On 7/25/24 9:06 AM, Christoph Müllner wrote:
> >> Ok, also to backport to GCC 14?
> > Yes, of course.
>
> I'm OK with that, but according to the latest status report
> , we're
> between the RC and the release for 14.2 and the homepage is saying
> "frozen for release" (thanks to Andrew Pinski for pointing that out).

This popped up when I was about to push, so I did not push yet.
The last thing I want to do is interfere with the release process,
so apologies for pushing the backport for PR116035 yesterday.

I will wait with this patch for the GCC 14.2 release, rebase/retest
and push then,
unless the release manager or maintainers propose another procedure.


Re: [PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-25 Thread Palmer Dabbelt

On Thu, 25 Jul 2024 08:37:05 PDT (-0700), christoph.muell...@vrull.eu wrote:

On Thu, Jul 25, 2024 at 5:19 PM Palmer Dabbelt  wrote:


On Thu, 25 Jul 2024 08:10:25 PDT (-0700), jeffreya...@gmail.com wrote:
>
>
> On 7/25/24 9:06 AM, Christoph Müllner wrote:
>> Ok, also to backport to GCC 14?
> Yes, of course.

I'm OK with that, but according to the latest status report
, we're
between the RC and the release for 14.2 and the homepage is saying
"frozen for release" (thanks to Andrew Pinski for pointing that out).


This popped up when I was about to push, so I did not push yet.
The last thing I want to do is interfere with the release process,
so apologies for pushing the backport for PR116035 yesterday.

I will wait with this patch for the GCC 14.2 release, rebase/retest
and push then,
unless the release manager or maintainers propose another procedure.


That works for me, thanks!


[PATCH] MATCH: Optimize `VEC_SHL_INSERT (dup (A), A)` to just `dup (A) [PR116075]

2024-07-25 Thread Andrew Pinski
It was noticed if we have `.VEC_SHL_INSERT ({ 0, ... }, 0)` it was not being
simplified to just `{ 0, ... }`. This was generated from the autovectorizer
(maybe even on accident, see PR tree-optmization/116081).

This adds a few SVE testcases to see if this is optimized since the
auto-vectorizer or intrinsics are the only two ways of getting this
produced.

Build and tested for aarch64-linux-gnu with no regressions.

PR target/116075

gcc/ChangeLog:

* match.pd (`VEC_SHL_INSERT (dup (A), A)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/dup-insr-1.c: New test.
* gcc.target/aarch64/sve/dup-insr-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  | 17 
 .../gcc.target/aarch64/sve/dup-insr-1.c   | 26 +++
 .../gcc.target/aarch64/sve/dup-insr-2.c   | 26 +++
 3 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 680dfea523f..a3a64bd742e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10657,3 +10657,20 @@ and,
   }
   (if (full_perm_p)
(vec_perm (op@3 @0 @1) @3 @2))
+
+/* vec shift left insert (dup(A), A) -> dup(A) */
+(simplify
+ (IFN_VEC_SHL_INSERT vec_same_elem_p@0 @1)
+  (with {
+tree elem = uniform_vector_p (@0);
+if (!elem && TREE_CODE (@0) == SSA_NAME)
+  {
+gimple *def = SSA_NAME_DEF_STMT (@0);
+   if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
+ elem = uniform_vector_p (gimple_assign_rhs1 (def));
+   else if (gimple_assign_rhs_code (def) == VEC_DUPLICATE_EXPR)
+ elem = gimple_assign_rhs1 (def);
+  }
+   }
+(if (elem && operand_equal_p (@1, elem))
+ @0)))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
new file mode 100644
index 000..41dcbba45cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+/* PR target/116075 */
+
+#include 
+
+svint8_t f(void)
+{
+  svint8_t tt;
+  tt = svdup_s8 (0);
+  tt = svinsr (tt, 0);
+  return tt;
+}
+
+svint8_t f1(int8_t t)
+{
+  svint8_t tt;
+  tt = svdup_s8 (t);
+  tt = svinsr (tt, t);
+  return tt;
+}
+
+/* The above 2 functions should have removed the VEC_SHL_INSERT. */
+
+/* { dg-final { scan-tree-dump-not ".VEC_SHL_INSERT " "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
new file mode 100644
index 000..8eafe974624
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+/* PR target/116075 */
+
+#include 
+
+svint8_t f(int8_t t)
+{
+  svint8_t tt;
+  tt = svdup_s8 (0);
+  tt = svinsr (tt, t);
+  return tt;
+}
+
+svint8_t f1(int8_t t)
+{
+  svint8_t tt;
+  tt = svdup_s8 (t);
+  tt = svinsr (tt, 0);
+  return tt;
+}
+
+/* The above 2 functions should not have removed the VEC_SHL_INSERT. */
+
+/* { dg-final { scan-tree-dump-times ".VEC_SHL_INSERT " 2 "optimized" } } */
+
-- 
2.43.0



Re: [PATCH] MATCH: Optimize `VEC_SHL_INSERT (dup (A), A)` to just `dup (A) [PR116075]

2024-07-25 Thread Richard Biener



> Am 25.07.2024 um 17:56 schrieb Andrew Pinski :
> 
> It was noticed if we have `.VEC_SHL_INSERT ({ 0, ... }, 0)` it was not being
> simplified to just `{ 0, ... }`. This was generated from the autovectorizer
> (maybe even on accident, see PR tree-optmization/116081).
> 
> This adds a few SVE testcases to see if this is optimized since the
> auto-vectorizer or intrinsics are the only two ways of getting this
> produced.
> 
> Build and tested for aarch64-linux-gnu with no regressions.

For the case in question implementing fold_const_call would be better.  Also …

>PR target/116075
> 
> gcc/ChangeLog:
> 
>* match.pd (`VEC_SHL_INSERT (dup (A), A)`): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/sve/dup-insr-1.c: New test.
>* gcc.target/aarch64/sve/dup-insr-2.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/match.pd  | 17 
> .../gcc.target/aarch64/sve/dup-insr-1.c   | 26 +++
> .../gcc.target/aarch64/sve/dup-insr-2.c   | 26 +++
> 3 files changed, 69 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 680dfea523f..a3a64bd742e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -10657,3 +10657,20 @@ and,
>   }
>   (if (full_perm_p)
>(vec_perm (op@3 @0 @1) @3 @2))
> +
> +/* vec shift left insert (dup(A), A) -> dup(A) */
> +(simplify
> + (IFN_VEC_SHL_INSERT vec_same_elem_p@0 @1)
> +  (with {
> +tree elem = uniform_vector_p (@0);
> +if (!elem && TREE_CODE (@0) == SSA_NAME)
> +  {
> +gimple *def = SSA_NAME_DEF_STMT (@0);
> +if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
> +  elem = uniform_vector_p (gimple_assign_rhs1 (def));
> +else if (gimple_assign_rhs_code (def) == VEC_DUPLICATE_EXPR)
> +  elem = gimple_assign_rhs1 (def);
> +  }
> +   }
> +(if (elem && operand_equal_p (@1, elem))

Ugh.  Two predicates involved and we still have to do this?

> + @0)))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> new file mode 100644
> index 000..41dcbba45cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +/* PR target/116075 */
> +
> +#include 
> +
> +svint8_t f(void)
> +{
> +  svint8_t tt;
> +  tt = svdup_s8 (0);
> +  tt = svinsr (tt, 0);
> +  return tt;
> +}
> +
> +svint8_t f1(int8_t t)
> +{
> +  svint8_t tt;
> +  tt = svdup_s8 (t);
> +  tt = svinsr (tt, t);
> +  return tt;
> +}
> +
> +/* The above 2 functions should have removed the VEC_SHL_INSERT. */
> +
> +/* { dg-final { scan-tree-dump-not ".VEC_SHL_INSERT " "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> new file mode 100644
> index 000..8eafe974624
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +/* PR target/116075 */
> +
> +#include 
> +
> +svint8_t f(int8_t t)
> +{
> +  svint8_t tt;
> +  tt = svdup_s8 (0);
> +  tt = svinsr (tt, t);
> +  return tt;
> +}
> +
> +svint8_t f1(int8_t t)
> +{
> +  svint8_t tt;
> +  tt = svdup_s8 (t);
> +  tt = svinsr (tt, 0);
> +  return tt;
> +}
> +
> +/* The above 2 functions should not have removed the VEC_SHL_INSERT. */
> +
> +/* { dg-final { scan-tree-dump-times ".VEC_SHL_INSERT " 2 "optimized" } } */
> +
> --
> 2.43.0
> 


Re: [PATCH] c++: alias of alias tmpl with dependent attrs [PR115897]

2024-07-25 Thread Patrick Palka
On Mon, 22 Jul 2024, Jason Merrill wrote:

> On 7/19/24 10:30 AM, Patrick Palka wrote:
> > On Thu, 18 Jul 2024, Jason Merrill wrote:
> > 
> > > On 7/18/24 12:45 PM, Patrick Palka wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does thi look
> > > > OK for trunk/14?
> > > > 
> > > > -- >8 --
> > > > 
> > > > As a followup of r15-2047-g7954bb4fcb6fa8, we also need to consider
> > > > dependent attributes when recursing into a non-template alias that names
> > > > a dependent alias template specialization (and so STF_STRIP_DEPENDENT
> > > > is set), otherwise in the first testcase below we undesirably strip B
> > > > all the way to T instead of to A.
> > > > 
> > > > We also need to move the typedef recursion case of strip_typedefs up to
> > > > get checked before the compound type recursion cases.  Otherwise for C
> > > > below (which ultimately aliases T*) we end up stripping it to T* instead
> > > > of to A because the POINTER_TYPE recursion dominates the typedef
> > > > recursion.  It also means we issue an unexpected extra error in the
> > > > third testcase below.
> > > > 
> > > > Ideally we would also want to consider dependent attributes on
> > > > non-template aliases, so that we accept the second testcase below, but
> > > > making that work correctly would require broader changes to e.g.
> > > > spec_hasher which currently assumes all non-template aliases are
> > > > stripped and hence it'd conflate the dependent specializations A
> > > > and A even if we didn't strip B.
> > > 
> > > Wouldn't that just be a matter of changing structural_comptypes to
> > > consider
> > > dependent attributes as well as dependent specializations?
> > 
> > Pretty much, it seems.  ISTM we should check dependent attributes even
> > when !comparing_dependent_aliases since they affect type identity rather
> > than just SFINAE behavior.
> > 
> > > 
> > > Or better, adding attributes to dependent_alias_template_spec_p (and
> > > changing
> > > its name)?  It seems like other callers would also benefit from that
> > > change.
> > 
> > I ended up adding a new predicate opaque_alias_p separate from
> > dependent_alias_template_spec_p since ISTM we need to call it from
> > there and from alias_template_specialization_p to avoid looking through
> > such aliases.
> 
> Sounds good, but I think let's add the word "dependent" to the name of the new
> function.

Done.

> 
> > So opaque_alias_p checks for type identity of an alias, whereas
> > dependent_alias_template_spec_p more broadly checks for SFINAE identity.
> > 
> > Something like the following (as an incremental patch on top of the
> > previous one, to consider separately for backportings since it's riskier):
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 0620c8c023a..4d4a5cef92c 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -6508,6 +6508,19 @@ alias_type_or_template_p (tree t)
> >   || DECL_ALIAS_TEMPLATE_P (t));
> >   }
> >   +/* Return true if substituting into T would yield a different type than
> > +   substituting into its expansion.  */
> 
> Please discuss when to use this vs dependent_alias_template_spec_p (both here
> and there).  Maybe just to say that any place that checks one probably also
> wants to check the other.
> 
> Other places that use d_a_t_s_p and seem to need adjusting:
> dependent_type_p_r, any_dependent_arguments_need_structural_equality_p,
> alias_ctad_tweaks.

For dependent_type_p_r, I think we can get rid of the existing d_a_t_s_p
check if we manually set TYPE_DEPENDENT_P/_VALID to true at parse time
like we do at instantiation time from instantiate_alias_template.

For any_dependent_arguments_need_structural_equality_p, those checks
can't be removed but we can relax nt_transparent to nt_opaque since we
know the template arguments have already gone through strip_typedefs.

For alias_ctad_tweaks I think we can replace the d_a_t_s_p with
template_args_equal which has the desired comparing_dependent_aliases
behavior.

After the above, it seems no relevant user of d_a_t_s_p survives that
passes nt_transparent instead of nt_opaque, and so we can avoid adding
a transparent_typedefs flag to the new predicate.

> 
> > +bool
> > +opaque_alias_p (const_tree t)
> > +{
> > +  return (TYPE_P (t)
> > + && typedef_variant_p (t)
> > + && uses_template_parms (DECL_ORIGINAL_TYPE (TYPE_NAME (t)))
> 
> Checking this seems wrong; a vector of dependent size seems opaque even if
> it's a vector of int.

Makes sense, fixed.  I added a testcase variant too.

-- >8 --

Subject: [PATCH 2/1] c++: non-template alias with dependent attributes 
[PR115897]

PR c++/115897

gcc/cp/ChangeLog:

* cp-tree.h (dependent_opaque_alias_p): Declare.
* pt.cc (push_template_decl): Manually mark a dependent opaque
alias or dependent alias template specialization as dependent,
and use structural equality for them.
* pt.cc (dependent_opaque_alias_p): Define.
(alias_template_specializat

Re: [PATCH] c++: Implement C++26 P2558R2 - Add @, $, and ` to the basic character set [PR110343]

2024-07-25 Thread Jason Merrill

On 7/17/24 6:04 PM, Jakub Jelinek wrote:

Hi!

The following patch implements the easy parts of the paper.
When @$` are added to the basic character set, it means that
R"@$`()@$`" should now be valid (here I've noticed most of the
raw string tests were tested solely with -std=c++11 or -std=gnu++11
and I've tried to change that), and on the other side even if
by extension $ is allowed in identifiers, \u0024 or \U0024
or \u{24} should not be, similarly how \u0041 is not allowed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The paper in 3.1 claims though that
#include 

#define STR(x) #x

int main()
{
   printf("%s", STR(\u0060)); // U+0060 is ` GRAVE ACCENT
}
should have been accepted before this paper (and rejected after it),
but g++ rejects it.

I've tried to understand it, but am confused on what is the right
behavior and why.

Consider
#define STR(x) #x
const char *a = "\u00b7";
const char *b = STR(\u00b7);
const char *c = "\u0041";
const char *d = STR(\u0041);
const char *e = STR(a\u00b7);
const char *f = STR(a\u0041);
const char *g = STR(a \u00b7);
const char *h = STR(a \u0041);
const char *i = "\u066d";
const char *j = STR(\u066d);
const char *k = "\u0040";
const char *l = STR(\u0040);
const char *m = STR(a\u066d);
const char *n = STR(a\u0040);
const char *o = STR(a \u066d);
const char *p = STR(a \u0040);

Neither clang nor gcc emit any diagnostics on the a, c, i and k
initializers, those are certainly valid (c is invalid in C23 though).  g++
emits with -pedantic-errors errors on all the others, while clang++ on the
ones with STR involving \u0041, \u0040 and a\u0066d.  The chosen values are
\u0040 '@' as something being changed by this paper, \u0041 'A' as basic
character set char valid in identifiers before/after, \u00b7 as an example
of character which is pedantically valid in identifiers if not at the start
and \u066d s something pedantically not valid in identifiers.

Now, https://eel.is/c++draft/lex.charset#6 says that UCN used outside of a
string/character literal which corresponds to basic character set character
(or control character) is ill-formed, that would make d, f, h cases invalid
for C++ and l, n, p cases invalid for C++26.

https://eel.is/c++draft/lex.name states which characters can appear at the
start of the identifier and which can appear after the start.  And
https://eel.is/c++draft/lex.pptoken states that preprocessing-token is
either identifier, or tons of other things, or "each non-whitespace
character that cannot be one of the above"

Then https://eel.is/c++draft/lex.pptoken#1 says that this last category is
invalid if the preprocessing token is being converted into token.

And https://eel.is/c++draft/lex.pptoken#2 includes "If any character not in
the basic character set matches the last category, the program is
ill-formed."

Now, e.g.  for the C++23 STR(\u0040) case, \u0040 is there not in the basic
character set, so valid outside of the literals (not the case anymore in
C++26), but it isn't nondigit and doesn't have XID_Start property, so it
isn't IMHO an identifier and so must be the "each non-whitespace character
that cannot be one of the above" case.  Why doesn't the above mentioned
https://eel.is/c++draft/lex.pptoken#2 sentence make that invalid?


Your argument makes sense to me, though...


Ignoring
that, I'd say it would be then stringized and that feels like it is what
clang++ is doing.  Now, e.g.  for the STR(a\u066d) case, I wonder why that
isn't lexed as a identifier followed by \u066d "each non-whitespace
character that cannot be one of the above" token and stringified similarly,
clang++ rejects that.

What GCC libcpp seems to be doing is that if that forms_identifier_p calls
_cpp_valid_utf8 or _cpp_valid_ucn with an argument which tells it is first
or second+ in identifier, and e.g.  _cpp_valid_ucn then for UCNs valid in
string literals calls
   else if (identifier_pos)
 {
   int validity = ucn_valid_in_identifier (pfile, result, nst);
   
   if (validity == 0)

 cpp_error (pfile, CPP_DL_ERROR,
"universal character %.*s is not valid in an identifier",
(int) (str - base), base);
   else if (validity == 2 && identifier_pos == 1)
 cpp_error (pfile, CPP_DL_ERROR,
"universal character %.*s is not valid at the start of an identifier",
(int) (str - base), base);
 }
so basically all those invalid in identifiers cases emit an error and
pretend to be valid in identifiers, rather than what e.g.  _cpp_valid_utf8
does for C but not for C++ and only for the chars completely invalid in
identifiers rather than just valid in identifiers but not at the start:
   /* In C++, this is an error for invalid character in an identifier
  because logically, the UTF-8 was converted to a UCN during
  translation phase 1 (even though we don't physically do it that
  way).  In C, this byte rather becomes grammati

[committed][PR rtl-optimization/116039] Fix life computation for promoted subregs

2024-07-25 Thread Jeff Law
So this turned out to be a neat little test and while the fuzzer found 
it on RISC-V, I wouldn't be surprised if the underlying issue is also 
the root cause of the loongarch issue with ext-dce.


The key issue is that if we have something like

(set (dest) (any_extend (subreg (source

If the subreg object is marked with SUBREG_PROMOTED and the 
sign/unsigned state matches the any_extend opcode, then combine (and I 
guess anything using simplify-rtx) may simplify that to


(set (dest) (source))

That implies that bits outside the mode of the subreg are actually live 
and valid.  This needs to be accounted for during liveness computation.


We have to be careful here though. If we're too conservative about 
setting additional bits live, then we'll inhibit the desired 
optimization in the coremark examples.  To do a good job we need to know 
the extension opcode.


I'm extremely unhappy with how the use handling works in ext-dce.  It 
mixes different conceptual steps and has horribly complex control flow. 
It only handles a subset of the unary/binary opcodes, etc etc.  It's 
just damn mess.It's going to need some more noodling around.


In the mean time this is a bit hacky in that it depends on non-obvious 
behavior to know it can get the extension opcode, but I don't want to 
leave the trunk in a broken state while I figure out the refactoring 
problem.




Bootstrapped and regression tested on x86 and tested on the crosses. 
Pushing to the trunk.


Jeffcommit 34fb0feca71f763b2fbe832548749666d34a4a76
Author: Jeff Law 
Date:   Thu Jul 25 12:32:28 2024 -0600

[PR rtl-optimization/116039] Fix life computation for promoted subregs

So this turned out to be a neat little test and while the fuzzer found it on
RISC-V, I wouldn't be surprised if the underlying issue is also the root 
cause
of the loongarch issue with ext-dce.

The key issue is that if we have something like

(set (dest) (any_extend (subreg (source

If the subreg object is marked with SUBREG_PROMOTED and the sign/unsigned 
state
matches the any_extend opcode, then combine (and I guess anything using
simplify-rtx) may simplify that to

(set (dest) (source))

That implies that bits outside the mode of the subreg are actually live and
valid.  This needs to be accounted for during liveness computation.

We have to be careful here though. If we're too conservative about setting
additional bits live, then we'll inhibit the desired optimization in the
coremark examples.  To do a good job we need to know the extension opcode.

I'm extremely unhappy with how the use handling works in ext-dce.  It mixes
different conceptual steps and has horribly complex control flow.  It only
handles a subset of the unary/binary opcodes, etc etc.  It's just damn mess.
It's going to need some more noodling around.

In the mean time this is a bit hacky in that it depends on non-obvious 
behavior
to know it can get the extension opcode, but I don't want to leave the 
trunk in
a broken state while I figure out the refactoring problem.

Bootstrapped and regression tested on x86 and tested on the crosses.  
Pushing to the trunk.

PR rtl-optimization/116039
gcc/
* ext-dce.cc (ext_dce_process_uses): Add some comments about 
concerns
with current code.  Mark additional bit groups as live when we have
an extension of a suitably promoted subreg.

gcc/testsuite
* gcc.dg/torture/pr116039.c: New test.

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index c94d1fc3414..14f163a01d6 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -667,6 +667,12 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
  if (modify && !skipped_dest && (dst_mask & ~src_mask) == 0)
ext_dce_try_optimize_insn (insn, x);
 
+ /* Stripping the extension here just seems wrong on multiple
+levels.  It's source side handling, so it seems like it
+belongs in the loop below.  Stripping here also makes it
+harder than necessary to properly handle live bit groups
+for (ANY_EXTEND (SUBREG)) where the SUBREG has
+SUBREG_PROMOTED state.  */
  dst_mask &= src_mask;
  src = XEXP (src, 0);
  code = GET_CODE (src);
@@ -674,8 +680,8 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
 
  /* Optimization is done at this point.  We just want to make
 sure everything that should get marked as live is marked
-from here onward.  */
-
+from here onward.  Shouldn't the backpropagate step happen
+before optimization?  */
  dst_mask = carry_backpropagate (dst_mask, code, src);
 
  /* We will handle the other operand of a binary operator
@

Re: [PATCH] c++: structured bindings and lookup of tuple_size/tuple_element [PR115605]

2024-07-25 Thread Jason Merrill

On 6/25/24 1:00 AM, Andrew Pinski wrote:

The problem here is even though we pass std namespace to lookup_template_class
as the context, it will look at the current scope for the name too.
The fix is to lookup the qualified name first and then use that
for lookup_template_class.


If lookup_template_class is mishandling an explicit context argument, 
let's fix that rather than work around it.



This is how std::initializer_list is handled in listify.

Note g++.dg/cpp1z/decomp22.C testcase now fails correctly
with an error, that tuple_size is not in the std namespace.
I copied a fixed up testcase into g++.dg/cpp1z/decomp62.C.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c++/115605

gcc/cp/ChangeLog:

* decl.cc (get_tuple_size): Call lookup_qualified_name
before calling lookup_template_class.
(get_tuple_element_type): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp22.C: Expect an error
* g++.dg/cpp1z/decomp61.C: New test.
* g++.dg/cpp1z/decomp62.C: Copied from decomp22.C
and wrap tuple_size/tuple_element inside std namespace.

Signed-off-by: Andrew Pinski 
---
  gcc/cp/decl.cc| 16 +---
  gcc/testsuite/g++.dg/cpp1z/decomp22.C |  2 +-
  gcc/testsuite/g++.dg/cpp1z/decomp61.C | 53 +++
  gcc/testsuite/g++.dg/cpp1z/decomp62.C | 23 
  4 files changed, 88 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp61.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 03deb1493a4..81dde4d51a3 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9195,10 +9195,13 @@ get_tuple_size (tree type)
  {
tree args = make_tree_vec (1);
TREE_VEC_ELT (args, 0) = type;
-  tree inst = lookup_template_class (tuple_size_identifier, args,
+  tree std_tuple_size = lookup_qualified_name (std_node, 
tuple_size_identifier);
+  if (std_tuple_size == error_mark_node)
+return NULL_TREE;
+  tree inst = lookup_template_class (std_tuple_size, args,
 /*in_decl*/NULL_TREE,
-/*context*/std_node,
-tf_none);
+/*context*/NULL_TREE,
+tf_warning_or_error);
inst = complete_type (inst);
if (inst == error_mark_node
|| !COMPLETE_TYPE_P (inst)
@@ -9224,9 +9227,12 @@ get_tuple_element_type (tree type, unsigned i)
tree args = make_tree_vec (2);
TREE_VEC_ELT (args, 0) = build_int_cst (integer_type_node, i);
TREE_VEC_ELT (args, 1) = type;
-  tree inst = lookup_template_class (tuple_element_identifier, args,
+  tree std_tuple_elem = lookup_qualified_name (std_node, 
tuple_element_identifier);
+  if (std_tuple_elem == error_mark_node)
+return NULL_TREE;
+  tree inst = lookup_template_class (std_tuple_elem, args,
 /*in_decl*/NULL_TREE,
-/*context*/std_node,
+/*context*/NULL_TREE,
 tf_warning_or_error);
return make_typename_type (inst, type_identifier,
 none_type, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp22.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp22.C
index 9e6b8df486a..4131486e292 100644
--- a/gcc/testsuite/g++.dg/cpp1z/decomp22.C
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp22.C
@@ -17,5 +17,5 @@ int
  foo (C t)
  {
auto[x0] = t;   // { dg-warning "structured bindings only available with" 
"" { target c++14_down } }
-  return x0;
+  return x0; /* { dg-error "cannot convert" } */
  }
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp61.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp61.C
new file mode 100644
index 000..874844b2c61
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp61.C
@@ -0,0 +1,53 @@
+// PR c++/115605
+// { dg-do compile { target c++17 } }
+// { dg-options "" }
+
+using size_t = decltype(sizeof(0));
+
+namespace std
+{
+  template
+  struct tuple_size;
+  template
+  struct tuple_element;
+}
+
+struct mytuple
+{
+  int t;
+  template
+  int &get()
+  {
+return t;
+  }
+};
+
+namespace std
+{
+  template<>
+  struct tuple_size
+  {
+static constexpr int value = 3;
+  };
+  template
+  struct tuple_element
+  {
+using type = int;
+  };
+}
+
+/* The tuple_size/tuple_element lookup should only be from std and not
+   from the current scope so these 2 functions should work. */
+int foo() {
+int const tuple_size = 5;
+mytuple array;
+auto [a, b, c] = array;
+return c;
+}
+int foo1() {
+int const tuple_element = 5;
+mytuple array;
+auto [a, b, c] = array;
+return c;
+}
+
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp62.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp62.C
new file mode 100644
index 000..694f3263bd8
--- /dev/null
+++ b/gcc/testsuite/g

Re: [PATCH] c++: alias of alias tmpl with dependent attrs [PR115897]

2024-07-25 Thread Patrick Palka
On Thu, 25 Jul 2024, Patrick Palka wrote:

> On Mon, 22 Jul 2024, Jason Merrill wrote:
> 
> > On 7/19/24 10:30 AM, Patrick Palka wrote:
> > > On Thu, 18 Jul 2024, Jason Merrill wrote:
> > > 
> > > > On 7/18/24 12:45 PM, Patrick Palka wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does thi look
> > > > > OK for trunk/14?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > As a followup of r15-2047-g7954bb4fcb6fa8, we also need to consider
> > > > > dependent attributes when recursing into a non-template alias that 
> > > > > names
> > > > > a dependent alias template specialization (and so STF_STRIP_DEPENDENT
> > > > > is set), otherwise in the first testcase below we undesirably strip B
> > > > > all the way to T instead of to A.
> > > > > 
> > > > > We also need to move the typedef recursion case of strip_typedefs up 
> > > > > to
> > > > > get checked before the compound type recursion cases.  Otherwise for C
> > > > > below (which ultimately aliases T*) we end up stripping it to T* 
> > > > > instead
> > > > > of to A because the POINTER_TYPE recursion dominates the typedef
> > > > > recursion.  It also means we issue an unexpected extra error in the
> > > > > third testcase below.
> > > > > 
> > > > > Ideally we would also want to consider dependent attributes on
> > > > > non-template aliases, so that we accept the second testcase below, but
> > > > > making that work correctly would require broader changes to e.g.
> > > > > spec_hasher which currently assumes all non-template aliases are
> > > > > stripped and hence it'd conflate the dependent specializations A
> > > > > and A even if we didn't strip B.
> > > > 
> > > > Wouldn't that just be a matter of changing structural_comptypes to
> > > > consider
> > > > dependent attributes as well as dependent specializations?
> > > 
> > > Pretty much, it seems.  ISTM we should check dependent attributes even
> > > when !comparing_dependent_aliases since they affect type identity rather
> > > than just SFINAE behavior.
> > > 
> > > > 
> > > > Or better, adding attributes to dependent_alias_template_spec_p (and
> > > > changing
> > > > its name)?  It seems like other callers would also benefit from that
> > > > change.
> > > 
> > > I ended up adding a new predicate opaque_alias_p separate from
> > > dependent_alias_template_spec_p since ISTM we need to call it from
> > > there and from alias_template_specialization_p to avoid looking through
> > > such aliases.
> > 
> > Sounds good, but I think let's add the word "dependent" to the name of the 
> > new
> > function.
> 
> Done.
> 
> > 
> > > So opaque_alias_p checks for type identity of an alias, whereas
> > > dependent_alias_template_spec_p more broadly checks for SFINAE identity.
> > > 
> > > Something like the following (as an incremental patch on top of the
> > > previous one, to consider separately for backportings since it's riskier):
> > > 
> > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > index 0620c8c023a..4d4a5cef92c 100644
> > > --- a/gcc/cp/pt.cc
> > > +++ b/gcc/cp/pt.cc
> > > @@ -6508,6 +6508,19 @@ alias_type_or_template_p (tree t)
> > > || DECL_ALIAS_TEMPLATE_P (t));
> > >   }
> > >   +/* Return true if substituting into T would yield a different type than
> > > +   substituting into its expansion.  */
> > 
> > Please discuss when to use this vs dependent_alias_template_spec_p (both 
> > here
> > and there).  Maybe just to say that any place that checks one probably also
> > wants to check the other.
> > 
> > Other places that use d_a_t_s_p and seem to need adjusting:
> > dependent_type_p_r, any_dependent_arguments_need_structural_equality_p,
> > alias_ctad_tweaks.
> 
> For dependent_type_p_r, I think we can get rid of the existing d_a_t_s_p
> check if we manually set TYPE_DEPENDENT_P/_VALID to true at parse time
> like we do at instantiation time from instantiate_alias_template.
> 
> For any_dependent_arguments_need_structural_equality_p, those checks
> can't be removed but we can relax nt_transparent to nt_opaque since we
> know the template arguments have already gone through strip_typedefs.
> 
> For alias_ctad_tweaks I think we can replace the d_a_t_s_p with
> template_args_equal which has the desired comparing_dependent_aliases
> behavior.
> 
> After the above, it seems no relevant user of d_a_t_s_p survives that
> passes nt_transparent instead of nt_opaque, and so we can avoid adding
> a transparent_typedefs flag to the new predicate.
> 
> > 
> > > +bool
> > > +opaque_alias_p (const_tree t)
> > > +{
> > > +  return (TYPE_P (t)
> > > +   && typedef_variant_p (t)
> > > +   && uses_template_parms (DECL_ORIGINAL_TYPE (TYPE_NAME (t)))
> > 
> > Checking this seems wrong; a vector of dependent size seems opaque even if
> > it's a vector of int.
> 
> Makes sense, fixed.  I added a testcase variant too.

I forgot to mention, the related PRs PR83997 etc seem to be about
propagating/respecting attributes attached to the aliased type rather
than to the alias declaratio

Re: [PATCH v2] MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]

2024-07-25 Thread Adhemerval Zanella Netto



On 17/07/24 14:00, Andrew Pinski wrote:
> On Wed, Jul 17, 2024 at 5:24 AM Richard Biener
>  wrote:
>>
>> On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta  
>> wrote:
>>>
>>> This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
>>> In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
>>> `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
>>> TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.
>>>
>>> The patch adds match pattern for a, b:
>>> (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE
>>> (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
>>> (a ? x : y) != (b ? y : x) --> (a^b) ? TRUE  : FALSE
>>> (a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE
>>
>> OK.
> 
> Pushed as r15-2106-g44fcc1ca11e7ea (with one small change to the
> commit message in the changelog where tabs should be used before the
> *; most likely a copy and paste error).

It seems that this change triggered with Linaro CI on arm 32 bit [1]:

--
Executing on host: 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/x86_64-pc-linux-gnu/bin/arm-eabi-g++
   
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C
  -mthumb -march=armv8.1-m.main+mve.fp+fp.dp -mtune=cortex-m55 -mfloat-abi=hard 
-mfpu=auto   -fdiagnostics-plain-output  -nostdinc++ 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include/arm-eabi
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/libsupc++
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/include/backward
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/util
 -fmessage-length=0  -std=gnu++98 -O1 -fdump-tree-forwprop1 -Wno-psabi  -S  -o 
pr50.s(timeout = 600)
spawn -ignore SIGHUP 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/x86_64-pc-linux-gnu/bin/arm-eabi-g++
 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C
 -mthumb -march=armv8.1-m.main+mve.fp+fp.dp -mtune=cortex-m55 -mfloat-abi=hard 
-mfpu=auto -fdiagnostics-plain-output -nostdinc++ 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include/arm-eabi
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/libsupc++
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/include/backward
 
-I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/util
 -fmessage-length=0 -std=gnu++98 -O1 -fdump-tree-forwprop1 -Wno-psabi -S -o 
pr50.s
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:
 In function 'v4si f1_(v4si, v4si, v4si, v4si, v4si, v4si)':
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:13:1:
 error: unrecognizable insn:
(insn 22 21 26 2 (set (reg:V4SI 120 [  ])
(unspec:V4SI [
(reg:V4SI 136)
(reg:V4SI 137)
(subreg:V4BI (reg:HI 135) 0)
] VPSELQ_S)) 
"/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C":12:17
 -1
 (nil))
during RTL pass: vregs
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:13:1:
 internal compiler error: in extract_insn, at recog.cc:2848
0x21fd635 internal_error(char const*, ...)
../../../../../../gcc/gcc/diagnostic-global-context.cc:491
0x9a0958 fancy_abort(char const*, int, char const*)
../../../../../../gcc/gcc/diagnostic.cc:1725
0x840e4d _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../../../../../../gcc/gcc/rtl-error.cc:108
0x840e6f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../../../../../gcc/gcc/rtl-error.cc:116
0x83f76b extract_insn(rtx_insn*)
../../../../../../gcc/gcc/recog.cc:2848
0xf1a805 instantiate_virtual_regs_in_insn
../../../../../../gcc/gcc/function.cc:1612
0xf1a805 instantiate_virtual_regs
../../../../../../gcc/gcc/function.cc:1995
0xf1a805 execute
../../../../../../gcc/gcc/function.cc:2042
--

Should I open a bug report?

[1] 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m55_hard_eabi-build/517/artifact/artifacts/00-sumfiles/


Re: [PATCH v3 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-25 Thread Sandra Loosemore

On 7/25/24 08:00, Tobias Burnus wrote:

Hi Sandra,

thanks for your patch. (Disclaimer: I have not finished reading through 
your patch.)


Some upfront generic remarks:

[* When first compiling it (incremental build), I did run into the issue 
that OMP_METADIRECTIVE_CHECK wasn't declared. Thus, there seems to be a 
dependency issue causing that tree-check.h might generated after code 
that includes tree.h is processed. (Unrelated to your patch itself, but 
for completeness …)]


I've never run into this.  Are you saying some .cc file is missing a 
makefile dependency on tree-check.h?  Which one?  Or is it tree-check.h 
that is missing a dependency on something else and failing to get 
regenerated?  Or both?


* Not required right now, but eventually we need to check whether 
https://gcc.gnu.org/PR112779 is fully fixed by this patch set or whether 
follow-up work is required (and if so which). There is also PR107067 for 
a Fortran ICE.


* There are some not-implemented/FIXME comments in the patches for 
missing features. I think we should ensure that those won't get 
forgotten, e.g. by filing PRs for those. – For declare variant, some PRs 
might already exist.


Can you eventually take care of the last two items?


Yes.



(For the last item: e.g. 'target_device' for declare_variant, for which 
'sorry' already existed.)


* * *

I might have asked the following question before – and you might have 
answered it already:


Sandra Loosemore wrote:


This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.


I have to admit that I do not understand the part:


+  else if (set == OMP_TRAIT_SET_TARGET_DEVICE)
+/* The target_device set is dynamic, so treat it as always
+   resolvable.  */
+continue;
+


The current code has 3 states:

* 0 - if a trait is false; this directly returns as it cannot be fixed 
later


* 1 - if the all traits are known to match (initial value)

* -1 - if one trait cannot be evaluated, either because it is too early 
(e.g. during parsing) or because it is a dynamic context selector.


Your last assertion is wrong.  The comments on the top of 
omp_context_selector_matches explicitly *say* "Dynamic properties (which 
are evaluated at run-time) should always return 1."  This is because 
dynamic selectors are *always* candidates, which are then scored and 
sorted according to the rules in the spec for the metadirective and 
declare variant constructs.


Maybe the problem is the name of the function  Would it help if I 
renamed it from "omp_context_selector_matches" to 
"omp_context_selector_is_candidate"?



@@ -1804,6 +1834,12 @@ omp_context_selector_matches (tree ctx)


    case OMP_TRAIT_USER_CONDITION:
  if (set == OMP_TRAIT_SET_USER)

  for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
    if (OMP_TP_NAME (p) == NULL_TREE)
  {
+  /* OpenMP 5.1 allows non-constant conditions for
+ metadirectives.  */
+  if (metadirective_p
+  && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+    break;
+

  if (integer_zerop (OMP_TP_VALUE (p)))
    return 0;
  if (integer_nonzerop (OMP_TP_VALUE (p)))
    break;
  ret = -1;
    }



* Comment wording: Please change to imply >= 5.1 not == 5.0 * Comment: I 
don't see why the non-const only applies to metadirectives; the OpenMP 
 >= 5.1 seems to imply that it is also valid for declare variant. Thus, 
I would change the wording. 


The first 7 patches in the posted set implement support for dynamic 
selectors in metadirectives.  Dynamic selector support for declare 
variant comes in the later patches, which further modify this code.


If you'd find it easier to review, I can smash everything together into 
one gigantic patch (or some smaller number of patches), but I fear it 
would make everything harder to review given the amount of removed or 
moved around.  Alternatively, trying to split it into more but smaller 
incremental patches is problematical due to inter-dependencies and, 
again, multiple patches touching the same bits of code, adding and 
removing temporary stubs, etc.


* The current code seems to already handle 
non-const values as expected. ... except that it changes "res" to -1, 
while the idea seems to be not to modify 'ret' in this case for 
metadirectives. (Why? Same question as above).


This gets back to same point as earlier, dynamic selectors are always 
candidates.  According to the spec, the "user" selector is dynamic if 
the "condition" is not a constant, so the function should return 1.  The 
original code, without dynamic selector support, returned -1 because it 
actually required a 

Re: [PATCH v2] MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]

2024-07-25 Thread Andrew Pinski
On Thu, Jul 25, 2024 at 12:01 PM Adhemerval Zanella Netto
 wrote:
>
>
>
> On 17/07/24 14:00, Andrew Pinski wrote:
> > On Wed, Jul 17, 2024 at 5:24 AM Richard Biener
> >  wrote:
> >>
> >> On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta  
> >> wrote:
> >>>
> >>> This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
> >>> In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
> >>> `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
> >>> TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.
> >>>
> >>> The patch adds match pattern for a, b:
> >>> (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE
> >>> (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
> >>> (a ? x : y) != (b ? y : x) --> (a^b) ? TRUE  : FALSE
> >>> (a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE
> >>
> >> OK.
> >
> > Pushed as r15-2106-g44fcc1ca11e7ea (with one small change to the
> > commit message in the changelog where tabs should be used before the
> > *; most likely a copy and paste error).
>
> It seems that this change triggered with Linaro CI on arm 32 bit [1]:
>
> --
> Executing on host: 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/x86_64-pc-linux-gnu/bin/arm-eabi-g++
>
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C
>   -mthumb -march=armv8.1-m.main+mve.fp+fp.dp -mtune=cortex-m55 
> -mfloat-abi=hard -mfpu=auto   -fdiagnostics-plain-output  -nostdinc++ 
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include/arm-eabi
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/libsupc++
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/include/backward
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/util
>  -fmessage-length=0  -std=gnu++98 -O1 -fdump-tree-forwprop1 -Wno-psabi  -S  
> -o pr50.s(timeout = 600)
> spawn -ignore SIGHUP 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/x86_64-pc-linux-gnu/bin/arm-eabi-g++
>  
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C
>  -mthumb -march=armv8.1-m.main+mve.fp+fp.dp -mtune=cortex-m55 
> -mfloat-abi=hard -mfpu=auto -fdiagnostics-plain-output -nostdinc++ 
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include/arm-eabi
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/libsupc++
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/include/backward
>  
> -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/util
>  -fmessage-length=0 -std=gnu++98 -O1 -fdump-tree-forwprop1 -Wno-psabi -S -o 
> pr50.s
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:
>  In function 'v4si f1_(v4si, v4si, v4si, v4si, v4si, v4si)':
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:13:1:
>  error: unrecognizable insn:
> (insn 22 21 26 2 (set (reg:V4SI 120 [  ])
> (unspec:V4SI [
> (reg:V4SI 136)
> (reg:V4SI 137)
> (subreg:V4BI (reg:HI 135) 0)
> ] VPSELQ_S)) 
> "/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C":12:17
>  -1
>  (nil))
> during RTL pass: vregs
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:13:1:
>  internal compiler error: in extract_insn, at recog.cc:2848
> 0x21fd635 internal_error(char const*, ...)
> ../../../../../../gcc/gcc/diagnostic-global-context.cc:491
> 0x9a0958 fancy_abort(char const*, int, char const*)
> ../../../../../../gcc/gcc/diagnostic.cc:1725
> 0x840e4d _fatal_insn(char const*, rtx_def const*, char const*, int, char 
> const*)
> ../../../../../../gcc/gcc/rtl-error.cc:108
> 0x840e6f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
> ../../../../../../gcc/gcc/rtl-error.cc:116
> 0x83f76b extract_insn(rtx_insn*)
> ../../../../../../gcc/gcc/recog.cc:2848
> 0xf1a805 instantiate_virtual_regs_in_insn
> ../../../../../../gcc/gcc/function.cc:1612
> 0xf1a805 instantiate_virtual_regs
> ../../../../../../gcc/gcc/function.cc:1995
> 0xf1a8

Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:22 AM, Kewen.Lin wrote:

-
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di

Remove the built-ins, use the default gimple generation instead.

OK for trunk with better commit log like the above paragraph, thanks!

// Assuming testing on BE goes well too. 🙂
Good point, I hadn't double checked things on BE.  I tested the patches 
today on BE.  The patches do not generate any additional regression testing.


I also investigated the assembly code generation with and without the 
patches for -O0 and -O3 using the same scripts as I used previously on 
LE.  I see the same results.  With -O0 the assembly code generations 
take one extra instruction for the built-in.  With -O3, the code 
generated for the  vsx set 2df and 2di cases is identical.  The code for 
the vsx set 1di case requires fewer assembly instructions for the C code 
versus the built-in.


 Carl


Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:22 AM, Kewen.Lin wrote:

on 2024/7/24 01:52, Carl Love wrote:

GCC maintainers:

This patch was previously posted.  Per the feedback, it is now the first of two 
patches to remove the set built-ins.

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
__builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
update the various vector elements.  This change was originally intended to be 
part of the earlier series of cleanup patches.  It was initially thought that 
some additional work would be needed to do some gimple generation instead of 
these built-ins.  However, the existing default code generation does produce 
the needed code.    For the vec_set bif, the equivalent C code is as good or 
better than the built-in.  For the vec_insert bif whose resolving previously 
made use of the vec_set bif, the assembly code generation is as good as before 
with the -O3 optimization.

This background information will be also mentioned in commit log, right?


Forgot to mention, I added the paragragh to the commit log.

   Carl



[PATCH] aarch64: Fix target/optimize option handling with transiting between O1 to O2

2024-07-25 Thread Andrew Pinski
The problem here is the aarch64 backend enables -mearly-ra at -O2 and above but
it is not marked as an Optimization in the .opt file so enabling it sometimes
reset the target options when going from -O1 to -O2 for the first time.

Build and tested for aarch64-linux-gnu with no regressions.

PR target/116065

gcc/ChangeLog:

* config/aarch64/aarch64.opt (mearly-ra=): Mark as Optimization rather
than Save.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/target_optimization-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.opt   |  2 +-
 .../aarch64/sve/target_optimization-1.c  | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 2f90f10352a..6229bcb371e 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -256,7 +256,7 @@ EnumValue
 Enum(early_ra_scope) String(none) Value(AARCH64_EARLY_RA_NONE)
 
 mearly-ra=
-Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra) 
Init(AARCH64_EARLY_RA_NONE) Save
+Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra) 
Init(AARCH64_EARLY_RA_NONE) Optimization
 Specify when to enable an early register allocation pass.  The possibilities
 are: all functions, functions that have access to strided multi-register
 instructions, and no functions.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c
new file mode 100644
index 000..3010f0c4189
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+#include 
+
+/* Turn off SVE overall */
+#pragma GCC target("+nosve")
+
+/* But the function turns it on again so it should work.
+   Even if changing the optimization level from O1 to O2. */
+int __attribute__((target ("+sve"), optimize(2)))
+bar (void)
+{
+  svfloat32_t xseg;
+  return svlen_f32(xseg);
+}
-- 
2.43.0



Re: [PATCH] c++: alias of alias tmpl with dependent attrs [PR115897]

2024-07-25 Thread Jason Merrill

On 7/25/24 2:17 PM, Patrick Palka wrote:

On Mon, 22 Jul 2024, Jason Merrill wrote:


On 7/19/24 10:30 AM, Patrick Palka wrote:

On Thu, 18 Jul 2024, Jason Merrill wrote:


On 7/18/24 12:45 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does thi look
OK for trunk/14?

-- >8 --

As a followup of r15-2047-g7954bb4fcb6fa8, we also need to consider
dependent attributes when recursing into a non-template alias that names
a dependent alias template specialization (and so STF_STRIP_DEPENDENT
is set), otherwise in the first testcase below we undesirably strip B
all the way to T instead of to A.

We also need to move the typedef recursion case of strip_typedefs up to
get checked before the compound type recursion cases.  Otherwise for C
below (which ultimately aliases T*) we end up stripping it to T* instead
of to A because the POINTER_TYPE recursion dominates the typedef
recursion.  It also means we issue an unexpected extra error in the
third testcase below.

Ideally we would also want to consider dependent attributes on
non-template aliases, so that we accept the second testcase below, but
making that work correctly would require broader changes to e.g.
spec_hasher which currently assumes all non-template aliases are
stripped and hence it'd conflate the dependent specializations A
and A even if we didn't strip B.


Wouldn't that just be a matter of changing structural_comptypes to
consider
dependent attributes as well as dependent specializations?


Pretty much, it seems.  ISTM we should check dependent attributes even
when !comparing_dependent_aliases since they affect type identity rather
than just SFINAE behavior.



Or better, adding attributes to dependent_alias_template_spec_p (and
changing
its name)?  It seems like other callers would also benefit from that
change.


I ended up adding a new predicate opaque_alias_p separate from
dependent_alias_template_spec_p since ISTM we need to call it from
there and from alias_template_specialization_p to avoid looking through
such aliases.


Sounds good, but I think let's add the word "dependent" to the name of the new
function.


Done.




So opaque_alias_p checks for type identity of an alias, whereas
dependent_alias_template_spec_p more broadly checks for SFINAE identity.

Something like the following (as an incremental patch on top of the
previous one, to consider separately for backportings since it's riskier):

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0620c8c023a..4d4a5cef92c 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6508,6 +6508,19 @@ alias_type_or_template_p (tree t)
  || DECL_ALIAS_TEMPLATE_P (t));
   }
   +/* Return true if substituting into T would yield a different type than
+   substituting into its expansion.  */


Please discuss when to use this vs dependent_alias_template_spec_p (both here
and there).  Maybe just to say that any place that checks one probably also
wants to check the other.

Other places that use d_a_t_s_p and seem to need adjusting:
dependent_type_p_r, any_dependent_arguments_need_structural_equality_p,
alias_ctad_tweaks.


For dependent_type_p_r, I think we can get rid of the existing d_a_t_s_p
check if we manually set TYPE_DEPENDENT_P/_VALID to true at parse time
like we do at instantiation time from instantiate_alias_template.

For any_dependent_arguments_need_structural_equality_p, those checks
can't be removed but we can relax nt_transparent to nt_opaque since we
know the template arguments have already gone through strip_typedefs.

For alias_ctad_tweaks I think we can replace the d_a_t_s_p with
template_args_equal which has the desired comparing_dependent_aliases
behavior.

After the above, it seems no relevant user of d_a_t_s_p survives that
passes nt_transparent instead of nt_opaque, and so we can avoid adding
a transparent_typedefs flag to the new predicate.




+bool
+opaque_alias_p (const_tree t)
+{
+  return (TYPE_P (t)
+ && typedef_variant_p (t)
+ && uses_template_parms (DECL_ORIGINAL_TYPE (TYPE_NAME (t)))


Checking this seems wrong; a vector of dependent size seems opaque even if
it's a vector of int.


Makes sense, fixed.  I added a testcase variant too.


OK, thanks.


-- >8 --

Subject: [PATCH 2/1] c++: non-template alias with dependent attributes 
[PR115897]

PR c++/115897

gcc/cp/ChangeLog:

* cp-tree.h (dependent_opaque_alias_p): Declare.
* pt.cc (push_template_decl): Manually mark a dependent opaque
alias or dependent alias template specialization as dependent,
and use structural equality for them.
* pt.cc (dependent_opaque_alias_p): Define.
(alias_template_specialization_p): Don't look through an
opaque alias.
(complex_alias_template_p): Use dependent_opaque_alias_p instead of
any_dependent_template_arguments_p directly.
(dependent_alias_template_spec_p): Don't look through an
opaque alias.
(get_underlying_template): Use dependent_opaque_alia

Re: [PATCH] MATCH: Optimize `VEC_SHL_INSERT (dup (A), A)` to just `dup (A) [PR116075]

2024-07-25 Thread Andrew Pinski
On Thu, Jul 25, 2024 at 10:20 AM Richard Biener
 wrote:
>
>
>
> > Am 25.07.2024 um 17:56 schrieb Andrew Pinski :
> >
> > It was noticed if we have `.VEC_SHL_INSERT ({ 0, ... }, 0)` it was not 
> > being
> > simplified to just `{ 0, ... }`. This was generated from the autovectorizer
> > (maybe even on accident, see PR tree-optmization/116081).
> >
> > This adds a few SVE testcases to see if this is optimized since the
> > auto-vectorizer or intrinsics are the only two ways of getting this
> > produced.
> >
> > Build and tested for aarch64-linux-gnu with no regressions.
>
> For the case in question implementing fold_const_call would be better.  Also …

Makes sense, I have a patch which I am testing that does this.

>
> >PR target/116075
> >
> > gcc/ChangeLog:
> >
> >* match.pd (`VEC_SHL_INSERT (dup (A), A)`): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.target/aarch64/sve/dup-insr-1.c: New test.
> >* gcc.target/aarch64/sve/dup-insr-2.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> > gcc/match.pd  | 17 
> > .../gcc.target/aarch64/sve/dup-insr-1.c   | 26 +++
> > .../gcc.target/aarch64/sve/dup-insr-2.c   | 26 +++
> > 3 files changed, 69 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 680dfea523f..a3a64bd742e 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -10657,3 +10657,20 @@ and,
> >   }
> >   (if (full_perm_p)
> >(vec_perm (op@3 @0 @1) @3 @2))
> > +
> > +/* vec shift left insert (dup(A), A) -> dup(A) */
> > +(simplify
> > + (IFN_VEC_SHL_INSERT vec_same_elem_p@0 @1)
> > +  (with {
> > +tree elem = uniform_vector_p (@0);
> > +if (!elem && TREE_CODE (@0) == SSA_NAME)
> > +  {
> > +gimple *def = SSA_NAME_DEF_STMT (@0);
> > +if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
> > +  elem = uniform_vector_p (gimple_assign_rhs1 (def));
> > +else if (gimple_assign_rhs_code (def) == VEC_DUPLICATE_EXPR)
> > +  elem = gimple_assign_rhs1 (def);
> > +  }
> > +   }
> > +(if (elem && operand_equal_p (@1, elem))
>
> Ugh.  Two predicates involved and we still have to do this?

vec_same_elem_p is not the best predicate and other uses it does the
same as above.
Anyways I have simplified this down to just supporting vec_duplicate
and it still fixes the f1 in dup-insr-1.c.
Once my tests are finished, I will post the patch.

Thanks,
Andrew

>
> > + @0)))
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> > new file mode 100644
> > index 000..41dcbba45cf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-optimized" } */
> > +/* PR target/116075 */
> > +
> > +#include 
> > +
> > +svint8_t f(void)
> > +{
> > +  svint8_t tt;
> > +  tt = svdup_s8 (0);
> > +  tt = svinsr (tt, 0);
> > +  return tt;
> > +}
> > +
> > +svint8_t f1(int8_t t)
> > +{
> > +  svint8_t tt;
> > +  tt = svdup_s8 (t);
> > +  tt = svinsr (tt, t);
> > +  return tt;
> > +}
> > +
> > +/* The above 2 functions should have removed the VEC_SHL_INSERT. */
> > +
> > +/* { dg-final { scan-tree-dump-not ".VEC_SHL_INSERT " "optimized" } } */
> > +
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> > new file mode 100644
> > index 000..8eafe974624
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-optimized" } */
> > +/* PR target/116075 */
> > +
> > +#include 
> > +
> > +svint8_t f(int8_t t)
> > +{
> > +  svint8_t tt;
> > +  tt = svdup_s8 (0);
> > +  tt = svinsr (tt, t);
> > +  return tt;
> > +}
> > +
> > +svint8_t f1(int8_t t)
> > +{
> > +  svint8_t tt;
> > +  tt = svdup_s8 (t);
> > +  tt = svinsr (tt, 0);
> > +  return tt;
> > +}
> > +
> > +/* The above 2 functions should not have removed the VEC_SHL_INSERT. */
> > +
> > +/* { dg-final { scan-tree-dump-times ".VEC_SHL_INSERT " 2 "optimized" } } 
> > */
> > +
> > --
> > 2.43.0
> >


Re: [PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:21 AM, Kewen.Lin wrote:

The patch, first patch in this series, to remove the __builtin_vec_set_v1ti, 
__builtin_vec_set_v2df, __builtin_vec_set_v2di was previously posted.  The 
feedback on the patch was that we could also remove set bif attribute.  Removal 
of the set bif attribute requires also removing the __builtin_vsx_set_1ti,  
__builtin_vsx_set_2df, __builtin_vsx_set_2di built-ins.  The second patch 
removes the vsx set built-ins and the now no longer used set built-in attribute 
and associated code.

The patches have been tested on a Power 10 LE system with no regressions.

It would be good to test this on BE as well (both 64-bit and 32-bit).


Yes.  I updated my scripts to test the vec_set and vsx_set code 
generations with -m32.  The code generation for the various test case 
with -m32 and -m64.  The code generation for -m32 versus -m64 is 
slightly different as expected.  When comparing the results with and 
without the patch for -m32 the generated assembly is again the same or 
better for the C code versus the built-ins.   So, no surprises with any 
of the testing with -m32.  It is consistent with the results for the 
-m64 testing.


    Carl



Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:24 AM, Kewen.Lin wrote:




optimization the number of assembly generated for the two methods are
similar.  With -O3 optimization, the assembly generated for the two
approaches is identical for the 2DF and 2DI types.  The assembly for
the C-code version of the 1Ti requres one less assembly instruction.

Nit: s/requres/requires/

Fixed

    fprintf (header_file, "\n");
-  fprintf (header_file,
-       "#define bif_is_set(x)\t\t((x).bifattrs & bif_set_bit)\n");
    fprintf (header_file,
     "#define bif_is_extract(x)\t((x).bifattrs & bif_extract_bit)\n");
    fprintf (header_file,
@@ -2497,10 +2491,9 @@ write_bif_static_init (void)
    fprintf (init_file, "  /* nargs */\t%d,\n",
     bifp->proto.nargs);
    fprintf (init_file, "  /* bifattrs */\t0");
-  if (bifp->attrs.isset)
-    fprintf (init_file, " | bif_set_bit");
    if (bifp->attrs.isextract)
  fprintf (init_file, " | bif_extract_bit");
+

Nit: unnecessary empty line.


Fixed

 Carl


Re: [PATCH 4/5] MATCH: Create BIT_ANDN and BIT_IORN from matching

2024-07-25 Thread Andrew Pinski
On Thu, Jul 25, 2024 at 5:16 AM Richard Biener
 wrote:
>
> On Thu, Jul 25, 2024 at 4:16 AM Andrew Pinski  
> wrote:
> >
> > To better create rtl directly from gimple, we can use
> > these already internal functions from the gimple.
> >
> > That is simplify `a & ~b` into BIT_ANDN.
> > Likewise `a | ~b` into BIT_IORN.
> > We only want to do this late after vectorization as some
> > targets (e.g. aarch64 SVE) has BIT_IORN on scalars but not on
> > some vector modes; even though the vectorizer could expand it back.
> >
> > Note a few testcases need to be changed to not look
> > into optimized dump and catch them earlier.
> > The modified testcases could catch BIT_ANDN and BIT_IORN so move the
> > testing to forwprop2 before simplification happens.
> >
> > Built and tested on aarch64-linux-gnu with no regressions.
>
> I think we want these only for ISEL as they happen way too often and will
> disturb the IL too much in ways not handled by passes.  not/and/or are
> too important ops to "hide" from most of the gimple pipeline.

I agree.

I also think the simplifications of `(VEC_COND @0 (uncond_expr) @1)`
-> COND_EXPR should also be done in isel rather than early on. I filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116100 to record that and
I will change those too when I get back to this next week.

Thanks,
Andrew

>
> Richard.
>
> > PR target/115086
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`a & ~b`, `a | ~b`): New pattern.
> > (BIT_ANDN/BIT_IORN with CST): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/bic-cst-1.c: New test.
> > * gcc.target/aarch64/bic_simd-1.c: New test.
> > * gcc.dg/tree-ssa/bitops-1.c: Move testing from optimized to 
> > forwprop2.
> > * gcc.dg/tree-ssa/bitops-6.c: Likewise.
> > * gcc.dg/tree-ssa/cmpbit-4.c: Likewise.
> > * gcc.dg/tree-ssa/pr110637-2.c: Likewise.
> > * gcc.dg/tree-ssa/pr94880.c: Likewise.
> > * gcc.dg/tree-ssa/pr96671-1.c: Likewise.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/match.pd  | 17 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c  | 10 +++---
> >  gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c  | 12 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c  |  8 ++---
> >  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c  | 12 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c|  8 ++---
> >  gcc/testsuite/gcc.dg/tree-ssa/pr94880.c   |  6 ++--
> >  gcc/testsuite/gcc.dg/tree-ssa/pr96671-1.c |  8 ++---
> >  gcc/testsuite/gcc.target/aarch64/bic-cst-1.c  | 31 ++
> >  gcc/testsuite/gcc.target/aarch64/bic_simd-1.c | 32 +++
> >  10 files changed, 112 insertions(+), 32 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/bic-cst-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/bic_simd-1.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index cf359b0ec0f..56f631dfeec 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -9979,6 +9979,23 @@ and,
> > (cond_op:s @1 @2 @3 @4 @5) @5)
> >(cond_op (bit_and @1 @0) @2 @3 @4 @5)))
> >
> > +#if GIMPLE
> > +/* Create bit_andc and bit_iorc internal functions. */
> > +(for bitop  (bit_and  bit_ior)
> > + bitopc (IFN_BIT_ANDN IFN_BIT_IORN)
> > + (simplify
> > +  (bitop:c (bit_not:s @0) @1)
> > +  (if (canonicalize_math_after_vectorization_p ()
> > +   && direct_internal_fn_supported_p (as_internal_fn (bitopc),
> > + type, OPTIMIZE_FOR_BOTH))
> > +   (bitopc @1 @0)))
> > + /* If the second operand is a constant, then reduce it to a & ~cst if
> > +the not simplifies. */
> > + (simplify
> > +  (bitopc @0 CONSTANT_CLASS_P@1)
> > +  (bitop (bit_not! @1) @0)))
> > +#endif
> > +
> >  /* For pointers @0 and @2 and nonnegative constant offset @1, look for
> > expressions like:
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c
> > index cf2823deb62..3a394b1f188 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O -fdump-tree-optimized-raw" } */
> > +/* { dg-options "-O -fdump-tree-forwprop2-raw" } */
> >
> >  #define DECLS(n,VOL)   \
> >  __attribute__((noinline,noclone))  \
> > @@ -66,7 +66,7 @@ int main(){
> > }
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "bit_not_expr" 12 "optimized"} } */
> > -/* { dg-final { scan-tree-dump-times "bit_and_expr"  9 "optimized"} } */
> > -/* { dg-final { scan-tree-dump-times "bit_ior_expr" 10 "optimized"} } */
> > -/* { dg-final { scan-tree-dump-times "bit_xor_expr"  9 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "bit_not_expr, " 12 "forwprop2"} } */
> > +/* { dg-final { scan-tree-dump-times "bit_and_expr, "  9 "forwprop2"} } */
> > +/* { dg-final { scan-tree

[committed] libstdc++: Reorder template params of std::optional comparisons (LWG 2945)

2024-07-25 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/optional: Reorder parameters in comparison
operators as per LWG 2945.
---
 libstdc++-v3/include/std/optional | 36 +++
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 2cc0221865e..4694d594f98 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1601,10 +1601,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __lhs && *__lhs == __rhs; }
 
   template
-_REQUIRES_NOT_OPTIONAL(_Up)
+_REQUIRES_NOT_OPTIONAL(_Tp)
 constexpr auto
-operator==(const _Up& __lhs, const optional<_Tp>& __rhs)
--> __optional_eq_t<_Up, _Tp>
+operator== [[nodiscard]] (const _Tp& __lhs, const optional<_Up>& __rhs)
+-> __optional_eq_t<_Tp, _Up>
 { return __rhs && __lhs == *__rhs; }
 
   template
@@ -1615,10 +1615,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__lhs || *__lhs != __rhs; }
 
   template
-_REQUIRES_NOT_OPTIONAL(_Up)
+_REQUIRES_NOT_OPTIONAL(_Tp)
 constexpr auto
-operator!= [[nodiscard]] (const _Up& __lhs, const optional<_Tp>& __rhs)
--> __optional_ne_t<_Up, _Tp>
+operator!= [[nodiscard]] (const _Tp& __lhs, const optional<_Up>& __rhs)
+-> __optional_ne_t<_Tp, _Up>
 { return !__rhs || __lhs != *__rhs; }
 
   template
@@ -1629,10 +1629,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__lhs || *__lhs < __rhs; }
 
   template
-_REQUIRES_NOT_OPTIONAL(_Up)
+_REQUIRES_NOT_OPTIONAL(_Tp)
 constexpr auto
-operator< [[nodiscard]] (const _Up& __lhs, const optional<_Tp>& __rhs)
--> __optional_lt_t<_Up, _Tp>
+operator< [[nodiscard]] (const _Tp& __lhs, const optional<_Up>& __rhs)
+-> __optional_lt_t<_Tp, _Up>
 { return __rhs && __lhs < *__rhs; }
 
   template
@@ -1643,10 +1643,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __lhs && *__lhs > __rhs; }
 
   template
-_REQUIRES_NOT_OPTIONAL(_Up)
+_REQUIRES_NOT_OPTIONAL(_Tp)
 constexpr auto
-operator> [[nodiscard]] (const _Up& __lhs, const optional<_Tp>& __rhs)
--> __optional_gt_t<_Up, _Tp>
+operator> [[nodiscard]] (const _Tp& __lhs, const optional<_Up>& __rhs)
+-> __optional_gt_t<_Tp, _Up>
 { return !__rhs || __lhs > *__rhs; }
 
   template
@@ -1657,10 +1657,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__lhs || *__lhs <= __rhs; }
 
   template
-_REQUIRES_NOT_OPTIONAL(_Up)
+_REQUIRES_NOT_OPTIONAL(_Tp)
 constexpr auto
-operator<= [[nodiscard]] (const _Up& __lhs, const optional<_Tp>& __rhs)
--> __optional_le_t<_Up, _Tp>
+operator<= [[nodiscard]] (const _Tp& __lhs, const optional<_Up>& __rhs)
+-> __optional_le_t<_Tp, _Up>
 { return __rhs && __lhs <= *__rhs; }
 
   template
@@ -1671,10 +1671,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __lhs && *__lhs >= __rhs; }
 
   template
-_REQUIRES_NOT_OPTIONAL(_Up)
+_REQUIRES_NOT_OPTIONAL(_Tp)
 constexpr auto
-operator>= [[nodiscard]] (const _Up& __lhs, const optional<_Tp>& __rhs)
--> __optional_ge_t<_Up, _Tp>
+operator>= [[nodiscard]] (const _Tp& __lhs, const optional<_Up>& __rhs)
+-> __optional_ge_t<_Tp, _Up>
 { return !__rhs || __lhs >= *__rhs; }
 
 #ifdef __cpp_lib_three_way_comparison
-- 
2.45.2



[committed] libstdc++: Add static_assert to std::expected for LWG 3843 and 3940

2024-07-25 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/expected (expected::value): Add assertions for LWG
3843 requirements.
(expected::value): Add assertions for LWG 3940
requirements.
---
 libstdc++-v3/include/std/expected | 8 
 1 file changed, 8 insertions(+)

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 3c52f7db01e..515a1e6ab8f 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -754,6 +754,7 @@ namespace __expected
   constexpr const _Tp&
   value() const &
   {
+   static_assert( is_copy_constructible_v<_Er> );
if (_M_has_value) [[likely]]
  return _M_val;
_GLIBCXX_THROW_OR_ABORT(bad_expected_access<_Er>(_M_unex));
@@ -762,6 +763,7 @@ namespace __expected
   constexpr _Tp&
   value() &
   {
+   static_assert( is_copy_constructible_v<_Er> );
if (_M_has_value) [[likely]]
  return _M_val;
const auto& __unex = _M_unex;
@@ -771,6 +773,8 @@ namespace __expected
   constexpr const _Tp&&
   value() const &&
   {
+   static_assert( is_copy_constructible_v<_Er> );
+   static_assert( is_constructible_v<_Er, const _Er&&> );
if (_M_has_value) [[likely]]
  return std::move(_M_val);
_GLIBCXX_THROW_OR_ABORT(bad_expected_access<_Er>(std::move(_M_unex)));
@@ -779,6 +783,8 @@ namespace __expected
   constexpr _Tp&&
   value() &&
   {
+   static_assert( is_copy_constructible_v<_Er> );
+   static_assert( is_constructible_v<_Er, _Er&&> );
if (_M_has_value) [[likely]]
  return std::move(_M_val);
_GLIBCXX_THROW_OR_ABORT(bad_expected_access<_Er>(std::move(_M_unex)));
@@ -1510,6 +1516,7 @@ namespace __expected
   constexpr void
   value() const&
   {
+   static_assert( is_copy_constructible_v<_Er> );
if (_M_has_value) [[likely]]
  return;
_GLIBCXX_THROW_OR_ABORT(bad_expected_access<_Er>(_M_unex));
@@ -1518,6 +1525,7 @@ namespace __expected
   constexpr void
   value() &&
   {
+   static_assert( is_copy_constructible_v<_Er> );
if (_M_has_value) [[likely]]
  return;
_GLIBCXX_THROW_OR_ABORT(bad_expected_access<_Er>(std::move(_M_unex)));
-- 
2.45.2



[committed] libstdc++: Implement P2968R2 "Making std::ignore a first-class object"

2024-07-25 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This was recently approved for C++26, but we can apply the changes for
all modes back to C++11. There's no reason not to make the assignment
usable in constant expressions for C++11 mode, and noexcept for all
modes.

Move the definitions to  so they're available in
 as well as .

libstdc++-v3/ChangeLog:

* include/bits/utility.h (_Swallow_assign): Make assignment
constexpr for C++11 as well, and add noexcept.
* include/std/tuple (_Swallow_assign, ignore): Move to
bits/utility.h.
* testsuite/20_util/headers/utility/ignore.cc: New test.
---
 libstdc++-v3/include/bits/utility.h   | 29 +
 libstdc++-v3/include/std/tuple| 31 ---
 .../20_util/headers/utility/ignore.cc | 29 +
 3 files changed, 58 insertions(+), 31 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/headers/utility/ignore.cc

diff --git a/libstdc++-v3/include/bits/utility.h 
b/libstdc++-v3/include/bits/utility.h
index 9f3b99231b3..44c74333e92 100644
--- a/libstdc++-v3/include/bits/utility.h
+++ b/libstdc++-v3/include/bits/utility.h
@@ -280,6 +280,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   } // namespace __detail
 #endif
 
+  // A class (and instance) which can be used in 'tie' when an element
+  // of a tuple is not required.
+  struct _Swallow_assign
+  {
+template
+  constexpr const _Swallow_assign&
+  operator=(const _Tp&) const noexcept
+  { return *this; }
+  };
+
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2773. Making std::ignore constexpr
+  /** Used with `std::tie` to ignore an element of a tuple
+   *
+   * When using `std::tie` to assign the elements of a tuple to variables,
+   * unwanted elements can be ignored by using `std::ignore`. For example:
+   *
+   * ```
+   * int x, y;
+   * std::tie(x, std::ignore, y) = std::make_tuple(1, 2, 3);
+   * ```
+   *
+   * This assignment will perform `x=1; std::ignore=2; y=3;` which results
+   * in the second element being ignored.
+   *
+   * @since C++11
+   */
+  _GLIBCXX17_INLINE constexpr _Swallow_assign ignore{};
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index df3f6e38eeb..93b649e7d21 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -2845,37 +2845,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 swap(tuple<_Elements...>&, tuple<_Elements...>&) = delete;
 #endif
 
-  // A class (and instance) which can be used in 'tie' when an element
-  // of a tuple is not required.
-  // _GLIBCXX14_CONSTEXPR
-  // 2933. PR for LWG 2773 could be clearer
-  struct _Swallow_assign
-  {
-template
-  _GLIBCXX14_CONSTEXPR const _Swallow_assign&
-  operator=(const _Tp&) const
-  { return *this; }
-  };
-
-  // _GLIBCXX_RESOLVE_LIB_DEFECTS
-  // 2773. Making std::ignore constexpr
-  /** Used with `std::tie` to ignore an element of a tuple
-   *
-   * When using `std::tie` to assign the elements of a tuple to variables,
-   * unwanted elements can be ignored by using `std::ignore`. For example:
-   *
-   * ```
-   * int x, y;
-   * std::tie(x, std::ignore, y) = std::make_tuple(1, 2, 3);
-   * ```
-   *
-   * This assignment will perform `x=1; std::ignore=2; y=3;` which results
-   * in the second element being ignored.
-   *
-   * @since C++11
-   */
-  _GLIBCXX17_INLINE constexpr _Swallow_assign ignore{};
-
   /// Partial specialization for tuples
   template
 struct uses_allocator, _Alloc> : true_type { };
diff --git a/libstdc++-v3/testsuite/20_util/headers/utility/ignore.cc 
b/libstdc++-v3/testsuite/20_util/headers/utility/ignore.cc
new file mode 100644
index 000..fc7a45dc55b
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/headers/utility/ignore.cc
@@ -0,0 +1,29 @@
+// { dg-do compile { target c++11 } }
+
+// P2968R2 Make std::ignore a first-class object.
+// This is a C++26 change, but we treat it as a DR against C++11.
+
+// C++26 [tuple.general]:
+// In addition to being available via inclusion of the  header,
+// ignore is available when  is included.
+#include 
+
+using ignore_type = std::remove_const::type;
+
+#ifdef __cpp_lib_is_aggregate
+static_assert( std::is_aggregate_v );
+#endif
+
+static_assert( std::is_nothrow_default_constructible::value, "" );
+static_assert( std::is_nothrow_copy_constructible::value, "" );
+static_assert( std::is_nothrow_copy_assignable::value, "" );
+
+static_assert( std::is_nothrow_assignable::value,
+"assignable from arbitrary types" );
+static_assert( std::is_nothrow_assignable::value,
+"assignable from arbitrary types" );
+
+constexpr ignore_type ignore;
+constexpr ignore_type ignore_more(ignore);
+constexpr ignore_type ignore_morer(ignore = ignore);
+constexpr ignore_type ignore_morest(ignore = "");
-- 
2.45.2



[committed] libstdc++: Remove std::basic_format_args default constructor (LWG 4106)

2024-07-25 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

There's no valid use case for default constructing this type, so the
committee approved removing the default constructor.

libstdc++-v3/ChangeLog:

* include/std/format (basic_format_args): Remove default
constructor, as per LWG 4106.
* testsuite/std/format/arguments/args.cc: Check it isn't default
constructible.
---
 libstdc++-v3/include/std/format | 2 --
 libstdc++-v3/testsuite/std/format/arguments/args.cc | 4 
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 16cee0d3c74..8f6a82a1fd4 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3667,8 +3667,6 @@ namespace __format
{ return {_Format_arg::template _S_to_enum<_Args>()...}; }
 
 public:
-  basic_format_args() noexcept = default;
-
   template
basic_format_args(const _Store<_Args...>& __store) noexcept;
 
diff --git a/libstdc++-v3/testsuite/std/format/arguments/args.cc 
b/libstdc++-v3/testsuite/std/format/arguments/args.cc
index eba129ff894..16ca71caecb 100644
--- a/libstdc++-v3/testsuite/std/format/arguments/args.cc
+++ b/libstdc++-v3/testsuite/std/format/arguments/args.cc
@@ -3,6 +3,10 @@
 #include 
 #include 
 
+// LWG 4106. basic_format_args should not be default-constructible
+static_assert( ! std::is_default_constructible_v );
+static_assert( ! std::is_default_constructible_v );
+
 template
 bool equals(std::basic_format_arg fmt_arg, T expected) {
   return std::visit_format_arg([=](auto arg_val) {
-- 
2.45.2



[PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-25 Thread Andi Kleen
From: Andi Kleen 

- Run the target_effective tail_call checks without optimization to
match the actual test cases.
- Add an extra check for external tail calls to handle targets like
powerpc that cannot tail call between different object files.
This one will also cover templates.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* g++.dg/musttail10.C: Use external tail call target check.
* g++.dg/musttail6.C: Dito.
* lib/target-supports.exp: Add external_tail_call. Disable
optimization for tail call checks.
---
 gcc/testsuite/g++.dg/musttail10.C |  2 +-
 gcc/testsuite/g++.dg/musttail6.C  |  2 +-
 gcc/testsuite/lib/target-supports.exp | 14 +++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/g++.dg/musttail10.C 
b/gcc/testsuite/g++.dg/musttail10.C
index ff7fcc7d8755..bd75affa2220 100644
--- a/gcc/testsuite/g++.dg/musttail10.C
+++ b/gcc/testsuite/g++.dg/musttail10.C
@@ -8,7 +8,7 @@ double g() { [[gnu::musttail]] return f(); } /* { dg-error 
"cannot tail-cal
 
 template 
 __attribute__((noinline, noclone, noipa))
-T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not able" 
"" { target powerpc*-*-* } } */
+T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not able" 
"" { target { external_tail_call } } } */
 
 template 
 __attribute__((noinline, noclone, noipa))
diff --git a/gcc/testsuite/g++.dg/musttail6.C b/gcc/testsuite/g++.dg/musttail6.C
index 5c6f69407ddb..81f6d9f3ca77 100644
--- a/gcc/testsuite/g++.dg/musttail6.C
+++ b/gcc/testsuite/g++.dg/musttail6.C
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { struct_tail_call } } } */
+/* { dg-require-effective-target external_tail_call } */
 /* A lot of architectures will not build this due to PR115606 and PR115607 */
-/* { dg-skip-if "powerpc does not support sibcall to templates" { powerpc*-*-* 
} } */
 /* { dg-options "-std=gnu++11" } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d368251ef9a4..0a3946e82d4b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12741,7 +12741,15 @@ proc check_effective_target_tail_call { } {
 return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
__attribute__((__noipa__)) void foo (void) { }
__attribute__((__noipa__)) void bar (void) { foo(); }
-} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
dump.
+} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
+}
+
+# Return 1 if the target can perform tail-calls for externals
+proc check_effective_target_external_tail_call { } {
+return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
+   extern __attribute__((__noipa__)) void foo (void);
+   __attribute__((__noipa__)) void bar (void) { foo(); }
+} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
 }
 
 # Return 1 if the target can perform tail-call optimizations for structures
@@ -12751,9 +12759,9 @@ proc check_effective_target_struct_tail_call { } {
 return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
// C++
struct foo { int a, b; };
-   __attribute__((__noipa__)) struct foo foo (void) { return {}; }
+   extern __attribute__((__noipa__)) struct foo foo (void);
__attribute__((__noipa__)) struct foo bar (void) { return foo(); }
-} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
dump.
+} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
 }
 
 # Return 1 if the target's calling sequence or its ABI
-- 
2.45.2



[PATCH v1 2/2] PR116019: Improve tail call error message

2024-07-25 Thread Andi Kleen
From: Andi Kleen 

The "tail call must be the same type" message is common on some
targets with C++, or without optimization. It is generated
when gcc believes there is an access of the return value
after the call. However usually it does not actually corespond
to a type mismatch, but can be caused for other reasons.

Make it slightly more vague to be less misleading.

gcc/ChangeLog:

PR c++/116019
* tree-tailcall.cc (find_tail_calls): Change tail call
error message.
---
 gcc/tree-tailcall.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index a68079d4f507..1901b1a13f99 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -632,7 +632,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail,
   && may_be_aliased (result_decl)
   && ref_maybe_used_by_stmt_p (call, result_decl, false))
 {
-  maybe_error_musttail (call, _("tail call must be same type"));
+  maybe_error_musttail (call, _("return value used after call"));
   return;
 }
 
-- 
2.45.2



Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-25 Thread Sam James
Andi Kleen  writes:

> From: Andi Kleen 
>
> - Run the target_effective tail_call checks without optimization to
> match the actual test cases.
> - Add an extra check for external tail calls to handle targets like
> powerpc that cannot tail call between different object files.
> This one will also cover templates.

Two trivial comments below.

>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/116080
>   * g++.dg/musttail10.C: Use external tail call target check.
>   * g++.dg/musttail6.C: Dito.

s/Dito/Ditto/

>   * lib/target-supports.exp: Add external_tail_call. Disable
>   optimization for tail call checks.
> ---
>  gcc/testsuite/g++.dg/musttail10.C |  2 +-
>  gcc/testsuite/g++.dg/musttail6.C  |  2 +-
>  gcc/testsuite/lib/target-supports.exp | 14 +++---
>  3 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/g++.dg/musttail10.C 
> b/gcc/testsuite/g++.dg/musttail10.C
> index ff7fcc7d8755..bd75affa2220 100644
> --- a/gcc/testsuite/g++.dg/musttail10.C
> +++ b/gcc/testsuite/g++.dg/musttail10.C
> @@ -8,7 +8,7 @@ double g() { [[gnu::musttail]] return f(); } /* { 
> dg-error "cannot tail-cal
>  
>  template 
>  __attribute__((noinline, noclone, noipa))
> -T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not 
> able" "" { target powerpc*-*-* } } */
> +T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not 
> able" "" { target { external_tail_call } } } */
>  
>  template 
>  __attribute__((noinline, noclone, noipa))
> diff --git a/gcc/testsuite/g++.dg/musttail6.C 
> b/gcc/testsuite/g++.dg/musttail6.C
> index 5c6f69407ddb..81f6d9f3ca77 100644
> --- a/gcc/testsuite/g++.dg/musttail6.C
> +++ b/gcc/testsuite/g++.dg/musttail6.C
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { struct_tail_call } } } */
> +/* { dg-require-effective-target external_tail_call } */
>  /* A lot of architectures will not build this due to PR115606 and PR115607 */
> -/* { dg-skip-if "powerpc does not support sibcall to templates" { 
> powerpc*-*-* } } */
>  /* { dg-options "-std=gnu++11" } */
>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>  
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index d368251ef9a4..0a3946e82d4b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -12741,7 +12741,15 @@ proc check_effective_target_tail_call { } {
>  return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
>   __attribute__((__noipa__)) void foo (void) { }
>   __attribute__((__noipa__)) void bar (void) { foo(); }
> -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +}
> +
> +# Return 1 if the target can perform tail-calls for externals
> +proc check_effective_target_external_tail_call { } {
> +return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
> + extern __attribute__((__noipa__)) void foo (void);
> + __attribute__((__noipa__)) void bar (void) { foo(); }

There's far more instances of noipa in the testsuite than __noipa__.

> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
>  }
>  
>  # Return 1 if the target can perform tail-call optimizations for structures
> @@ -12751,9 +12759,9 @@ proc check_effective_target_struct_tail_call { } {
>  return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
>   // C++
>   struct foo { int a, b; };
> - __attribute__((__noipa__)) struct foo foo (void) { return {}; }
> + extern __attribute__((__noipa__)) struct foo foo (void);
>   __attribute__((__noipa__)) struct foo bar (void) { return foo(); }
> -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
>  }
>  
>  # Return 1 if the target's calling sequence or its ABI


signature.asc
Description: PGP signature


Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-25 Thread Andrew Pinski
On Thu, Jul 25, 2024 at 4:09 PM Sam James  wrote:
>
> Andi Kleen  writes:
>
> > From: Andi Kleen 
> >
> > - Run the target_effective tail_call checks without optimization to
> > match the actual test cases.
> > - Add an extra check for external tail calls to handle targets like
> > powerpc that cannot tail call between different object files.
> > This one will also cover templates.
>
> Two trivial comments below.
>
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR testsuite/116080
> >   * g++.dg/musttail10.C: Use external tail call target check.
> >   * g++.dg/musttail6.C: Dito.
>
> s/Dito/Ditto/

One extra nit, It is much (~5x) more common to use "Likewise" rather
than Ditto in GCC's changelogs.
[apinski@xeond2 gcc]$ git grep Ditto *{,/}ChangeLog*|wc -l
41565
[apinski@xeond2 gcc]$ git grep Likewise *{,/}ChangeLog*|wc -l
196587

Thanks,
Andrew

>
> >   * lib/target-supports.exp: Add external_tail_call. Disable
> >   optimization for tail call checks.
> > ---
> >  gcc/testsuite/g++.dg/musttail10.C |  2 +-
> >  gcc/testsuite/g++.dg/musttail6.C  |  2 +-
> >  gcc/testsuite/lib/target-supports.exp | 14 +++---
> >  3 files changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/testsuite/g++.dg/musttail10.C 
> > b/gcc/testsuite/g++.dg/musttail10.C
> > index ff7fcc7d8755..bd75affa2220 100644
> > --- a/gcc/testsuite/g++.dg/musttail10.C
> > +++ b/gcc/testsuite/g++.dg/musttail10.C
> > @@ -8,7 +8,7 @@ double g() { [[gnu::musttail]] return f(); } /* { 
> > dg-error "cannot tail-cal
> >
> >  template 
> >  __attribute__((noinline, noclone, noipa))
> > -T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not 
> > able" "" { target powerpc*-*-* } } */
> > +T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not 
> > able" "" { target { external_tail_call } } } */
> >
> >  template 
> >  __attribute__((noinline, noclone, noipa))
> > diff --git a/gcc/testsuite/g++.dg/musttail6.C 
> > b/gcc/testsuite/g++.dg/musttail6.C
> > index 5c6f69407ddb..81f6d9f3ca77 100644
> > --- a/gcc/testsuite/g++.dg/musttail6.C
> > +++ b/gcc/testsuite/g++.dg/musttail6.C
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile { target { struct_tail_call } } } */
> > +/* { dg-require-effective-target external_tail_call } */
> >  /* A lot of architectures will not build this due to PR115606 and PR115607 
> > */
> > -/* { dg-skip-if "powerpc does not support sibcall to templates" { 
> > powerpc*-*-* } } */
> >  /* { dg-options "-std=gnu++11" } */
> >  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
> >
> > diff --git a/gcc/testsuite/lib/target-supports.exp 
> > b/gcc/testsuite/lib/target-supports.exp
> > index d368251ef9a4..0a3946e82d4b 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -12741,7 +12741,15 @@ proc check_effective_target_tail_call { } {
> >  return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
> >   __attribute__((__noipa__)) void foo (void) { }
> >   __attribute__((__noipa__)) void bar (void) { foo(); }
> > -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a 
> > detailed dump.
> > +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> > dump.
> > +}
> > +
> > +# Return 1 if the target can perform tail-calls for externals
> > +proc check_effective_target_external_tail_call { } {
> > +return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
> > + extern __attribute__((__noipa__)) void foo (void);
> > + __attribute__((__noipa__)) void bar (void) { foo(); }
>
> There's far more instances of noipa in the testsuite than __noipa__.
>
> > +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> > dump.
> >  }
> >
> >  # Return 1 if the target can perform tail-call optimizations for structures
> > @@ -12751,9 +12759,9 @@ proc check_effective_target_struct_tail_call { } {
> >  return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
> >   // C++
> >   struct foo { int a, b; };
> > - __attribute__((__noipa__)) struct foo foo (void) { return {}; }
> > + extern __attribute__((__noipa__)) struct foo foo (void);
> >   __attribute__((__noipa__)) struct foo bar (void) { return foo(); }
> > -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a 
> > detailed dump.
> > +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> > dump.
> >  }
> >
> >  # Return 1 if the target's calling sequence or its ABI


[PATCH] gimple-ssa-sprintf: Fix typo in range check

2024-07-25 Thread Siddhesh Poyarekar
The code to scale ranges for wide chars in format_string incorrectly
checks range.likely to scale range.unlikely, which is a copy-paste typo
from the immediate previous condition.

gcc/ChangeLog:

gimple-ssa-sprintf.cc (format_string): Fix type in range check
for UNLIKELY for wide chars.

Signed-off-by: Siddhesh Poyarekar 
---
Tested on x86_64, no new testsuite regressions due to this fix.

 gcc/gimple-ssa-sprintf.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index 025b0fbff6f..0900710647c 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -2623,7 +2623,7 @@ format_string (const directive &dir, tree arg, 
pointer_query &ptr_qry)
  if (slen.range.likely < target_int_max ())
slen.range.likely *= 2;
 
- if (slen.range.likely < target_int_max ())
+ if (slen.range.unlikely < target_int_max ())
slen.range.unlikely *= target_mb_len_max ();
 
  /* A non-empty wide character conversion may fail.  */
-- 
2.45.1



Question on SUBREG simplification for big endian target.

2024-07-25 Thread Jeff Law
We don't really have a good place for discussions anymore other than 
gcc-patches, but this really isn't a patch..  Oh well.



I'm debugging a failure of ext-dce on a big endian target (m68k) and I 
can't help but think it's actually exposed a latent bug in subreg 
handling.  Your input would be appreciated.



So we have this insn:


(insn 14 13 15 2 (set (reg:DI 55 [ _6 ])
(zero_extend:DI (subreg:SI (reg:DI 35 [ _5 ]) 0))) "j.c":21:27 85 
{*zero_extendsidi2}
 (expr_list:REG_DEAD (reg:DI 35 [ _5 ])
(nil)))
So bits 32..63 from (reg:DI 35) and shove them into bits 0..31 of 
(reg:DI 55) with zero extension.


ext-dce has correctly determined that we only need bits 0..31 of (reg:DI 
55).  It it conceptually wants to transform that into:


(set (reg:DI 55) (subreg:DI (subreg:SI (reg:DI 35) 0) 0))

Of course we don't want nested subregs, so we actually use 
simplify_gen_subreg.


It's simplifying that to...

(set (reg:DI 55) (reg:DI 35))

Which seems wrong for the SET_SRC expression.

This code looks suspicious to me (simplify-rtx.cc):




  /* Changing mode twice with SUBREG => just change it once,
 or not at all if changing back op starting mode.  */
  if (GET_CODE (op) == SUBREG)
{
  machine_mode innermostmode = GET_MODE (SUBREG_REG (op));
  poly_uint64 innermostsize = GET_MODE_SIZE (innermostmode);
  rtx newx;

  /* Make sure that the relationship between the two subregs is
 known at compile time.  */
  if (!ordered_p (outersize, innermostsize))
return NULL_RTX;

  if (outermode == innermostmode
  && known_eq (byte, 0U)
  && known_eq (SUBREG_BYTE (op), 0))
return SUBREG_REG (op);



That test just seems wrong on a big endian target.  If fact if I take 
out that early exit and let the rest of the code run we get into this:



  /* Work out the memory offset of the final OUTERMODE value relative
 to the inner value of OP.  */
  poly_int64 mem_offset = subreg_memory_offset (outermode,
innermode, byte);
  poly_int64 op_mem_offset = subreg_memory_offset (op);
  poly_int64 final_offset = mem_offset + op_mem_offset;

  /* See whether resulting subreg will be paradoxical.  */
  if (!paradoxical_subreg_p (outermode, innermostmode))
{
  /* Bail out in case resulting subreg would be incorrect.  */
  if (maybe_lt (final_offset, 0)
  || maybe_ge (poly_uint64 (final_offset), innermostsize)
  || !multiple_p (final_offset, outersize))
return NULL_RTX;
}


OUTERMODE and INNERMOSTMODE are both DImode, so it's not paradoxical at 
this point.  But FINAL_OFFSET is -4, which triggers rejecting the 
simplification.


Am I totally offbase here?   Or should we just not have called 
gen_simplify_subreg in the way we did?


Thanks in advance,
jeff




Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:21 AM, Kewen.Lin wrote:

Hi Carl,

Some minor comments are inlined on top of Segher's and Peter's comments.

on 2024/7/20 04:04, Carl Love wrote:

GCC developers:

The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
vec_sro.  These varients were requested by Steve Munroe.

The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

    Carl


---
  rs6000, Add new overloaded vector shift builtin int128 varients

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

Add the missing internal names for the float and double types for
overloaded builtin vec_sld for the float and double types.

This isn't needed, see below explanation.


OK, per comments below, removed the additional internal names.



diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c4ecafc6f7e..302e0232533 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3396,9 +3396,13 @@
    vull __builtin_vec_sld (vull, vull, const int);
  VSLDOI_2DI  VSLDOI_VULL
    vf __builtin_vec_sld (vf, vf, const int);
-    VSLDOI_4SF
+    VSLDOI_4SF VSLDOI_VF
    vd __builtin_vec_sld (vd, vd, const int);
-    VSLDOI_2DF
+    VSLDOI_2DF VSLDOI_VD

The other instances for vector integer type have multiple uses of 1st token,
such as:

   vsll __builtin_vec_sld (vsll, vsll, const int);
 VSLDOI_2DI  VSLDOI_VSLL
   vbll __builtin_vec_sld (vbll, vbll, const int);
 VSLDOI_2DI  VSLDOI_VBLL
   vull __builtin_vec_sld (vull, vull, const int);
 VSLDOI_2DI  VSLDOI_VULL

, it's unable to use the 1st token VSLDOI_2DI for the overload id (otherwise
it can be ambiguous), but for vector float/double they don't have multiple
variants, VSLDOI_4SF and VSLDOI_2DF are used once respectively so they are
fine here.  I think the existing code is intentional so let's keep them
unchanged (creating more unnecessary ids is slightly worse than before).


OK, removed the addtional tokens VSLDOI_VF and VSLDOI_VD.



+  vsq __builtin_vec_sld (vsq, vsq, const int);
+    VSLDOI_V1TI  VSLDOI_VSQ
+  vuq __builtin_vec_sld (vuq, vuq, const int);
+    VSLDOI_V1TI  VSLDOI_VUQ

  [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
    vsc __builtin_vec_sldb (vsc, vsc, const int);
@@ -3417,6 +3421,10 @@
  VSLDB_V2DI  VSLDB_VSLL
    vull __builtin_vec_sldb (vull, vull, const int);
  VSLDB_V2DI  VSLDB_VULL
+  vsq __builtin_vec_sldb (vsq, vsq, const int);
+    VSLDB_V1TI  VSLDB_VSQ
+  vuq __builtin_vec_sldb (vuq, vuq, const int);
+    VSLDB_V1TI  VSLDB_VUQ

  [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
    vsc __builtin_vec_sldw (vsc, vsc, const int);
@@ -3439,6 +3447,10 @@
  XXSLDWI_4SF  XXSLDWI_VF
    vd __builtin_vec_sldw (vd, vd, const int);
  XXSLDWI_2DF  XXSLDWI_VD
+  vsq __builtin_vec_sldw (vsq, vsq, const int);
+    XXSLDWI_Q  XXSLDWI_VSQ
+  vuq __builtin_vec_sldw (vuq, vuq, const int);
+    XXSLDWI_Q  XXSLDWI_VUQ

Nit: s/XXSLDWI_Q/XXSLDWI_1TI/ to keep consistent with the
other XXSLDWI_* as 1st token (XXSLDWI_16QI etc. are used
above rather than XXSLDWI_{SC,UC} etc.)


OK, changed to:

  vsq __builtin_vec_sldw (vsq, vsq, const int);
    XXSLDWI_1TI  XXSLDWI_VSQ
  vuq __builtin_vec_sldw (vuq, vuq, const int);
    XXSLDWI_1TI  XXSLDWI_VUQ




  [VEC_SRV, vec_srv, __builtin_vec_vsrv]
    vuc __builtin_vec_vsrv (vuc, vuc);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0b572afca72..5125a6d9def 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23504,6 +23504,10 @@ const unsigned int);
  vector signed long long, const unsigned int);
  @exdent vector unsigned long long vec_sldb (vector unsigned long long,
  vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_sldb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
  @end smallexample

  Shift the combined input vectors left by the amount specified by the low-order
@@ -23531,6 +23535,10 @@ const unsigned int);
  vector signed long long, const unsigned int);
  @exdent vector unsigned long long vec_srdb (vector unsigned long long,
  vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_srdb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
  @end smallexample

  Shift the

Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-25 Thread Carl Love

Peter, Segher:

On 7/23/24 2:26 PM, Peter Bergner wrote:

On 7/19/24 3:04 PM, Carl Love wrote:

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

  (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-   (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+   (match_operand:VEC_IC 2 "register_operand" "v")
 (match_operand:QI 3 "const_0_to_12_operand" "n")]
VSHIFT_DBL_LR))]
"TARGET_POWER10"

I know the old code used the register_operand predicate for the vector
operands, but those really should be changed to altivec_register_operand.

Peter

Segher's response was:

register_operand is just fine usually.  The "v" constraint already makes
sure things end up in a VMX (a lower VSX) register, the predicate
doesn't help here.  register_operand is shorter (and thus, preferred),
and also more likely correct if the code changes later 🙂


Which Peter said and Segher responded:

On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote:


On 7/24/24 12:06 PM, Segher Boessenkool wrote:
I thought we always wanted the predicate to match the constraint being used?


Predicates and constraints have different purposes, and are used at
different times (typically).  Everything before RA is predicates
only, and RA and everything after it use constraints (as well).

register_operand says it has to be a register.  It allows any
pseudo-register, so before RA, there is no real difference between
register_operand and altivec_register_operand (which allows all pseudos
as well)..

The constraint should not demand things that weren't clear earlier,
because that will then cause reloading eventually, often with less
efficient code.  It still will*work*  though.

But that is not the case here 🙂


So, I think the final word here is don't change it.

 Carl







[pushed] c++: #pragma target and deferred instantiation [PR115403]

2024-07-25 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

Also built highway to check.

-- 8< --

My patch for 109753 applies the current #pragma target/optimize to a
function when we compile it, which was a problem for a template
instantiation deferred until EOF, where different #pragmas are active.  So
let's only do this for artificial functions.

PR c++/115403
PR c++/109753

gcc/cp/ChangeLog:

* decl.cc (start_preparsed_function): Only call decl_attributes for
artificial functions.

gcc/testsuite/ChangeLog:

* g++.dg/ext/pragma-target1.C: New test.
---
 gcc/cp/decl.cc| 7 +--
 gcc/testsuite/g++.dg/ext/pragma-target1.C | 6 ++
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-target1.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 6b686d75a49..279af21eed0 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17882,8 +17882,11 @@ start_preparsed_function (tree decl1, tree attrs, int 
flags)
doing_friend = true;
 }
 
-  /* Adjust for #pragma target/optimize.  */
-  decl_attributes (&decl1, NULL_TREE, 0);
+  /* Adjust for #pragma target/optimize if this is an artificial function that
+ (probably) didn't go through grokfndecl.  We particularly don't want this
+ for deferred instantiations, which should match their template.  */
+  if (DECL_ARTIFICIAL (decl1))
+decl_attributes (&decl1, NULL_TREE, 0);
 
   if (DECL_DECLARED_INLINE_P (decl1)
   && lookup_attribute ("noinline", attrs))
diff --git a/gcc/testsuite/g++.dg/ext/pragma-target1.C 
b/gcc/testsuite/g++.dg/ext/pragma-target1.C
new file mode 100644
index 000..0ce2438da2f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pragma-target1.C
@@ -0,0 +1,6 @@
+// PR c++/115403
+// { dg-do compile { target x86_64-*-* } }
+
+template  __attribute__((always_inline)) inline void AssertEqual() {}
+void TestAllF16FromF32() { AssertEqual(); }
+#pragma GCC target "sse4.1"

base-commit: 29341f21ce1eb7cdb8cd468e4ceb0d07cf2775e0
-- 
2.45.2



[PATCH 1/3] isel: Move duplicate comparisons to its own function

2024-07-25 Thread Andrew Pinski
This is just a small cleanup to isel and no functional changes just.
The loop inside pass_gimple_isel::execute looked was getting too
deap so let's fix that by moving it to its own function.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-isel.cc (pass_gimple_isel::execute): Factor out
duplicate comparisons out to ...
(duplicate_comparison): New function.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-isel.cc | 66 --
 1 file changed, 35 insertions(+), 31 deletions(-)

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 57f7281bb50..327a78ea408 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -395,6 +395,40 @@ gimple_expand_vec_cond_expr (struct function *fun, 
gimple_stmt_iterator *gsi,
 5, op0a, op0b, op1, op2, tcode_tree);
 }
 
+/* Duplicate COND_EXPR condition defs of STMT located in BB when they are
+   comparisons so RTL expansion with the help of TER
+   can perform better if conversion.  */
+static void
+duplicate_comparison (gassign *stmt, basic_block bb)
+{
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
+  auto_vec cond_exprs;
+  unsigned cnt = 0;
+  tree lhs = gimple_assign_lhs (stmt);
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+{
+  if (is_gimple_debug (USE_STMT (use_p)))
+   continue;
+  cnt++;
+  if (gimple_bb (USE_STMT (use_p)) == bb
+ && is_gimple_assign (USE_STMT (use_p))
+ && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use
+ && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR)
+   cond_exprs.safe_push (as_a  (USE_STMT (use_p)));
+  }
+  for (unsigned i = cond_exprs.length () == cnt ? 1 : 0;
+   i < cond_exprs.length (); ++i)
+{
+  gassign *copy = as_a  (gimple_copy (stmt));
+  tree new_def = duplicate_ssa_name (lhs, copy);
+  gimple_assign_set_lhs (copy, new_def);
+  auto gsi2 = gsi_for_stmt (cond_exprs[i]);
+  gsi_insert_before (&gsi2, copy, GSI_SAME_STMT);
+  gimple_assign_set_rhs1 (cond_exprs[i], new_def);
+  update_stmt (cond_exprs[i]);
+}
+}
 
 
 namespace {
@@ -469,37 +503,7 @@ pass_gimple_isel::execute (struct function *fun)
  tree lhs = gimple_assign_lhs (stmt);
  if (TREE_CODE_CLASS (code) == tcc_comparison
  && !has_single_use (lhs))
-   {
- /* Duplicate COND_EXPR condition defs when they are
-comparisons so RTL expansion with the help of TER
-can perform better if conversion.  */
- imm_use_iterator imm_iter;
- use_operand_p use_p;
- auto_vec cond_exprs;
- unsigned cnt = 0;
- FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
-   {
- if (is_gimple_debug (USE_STMT (use_p)))
-   continue;
- cnt++;
- if (gimple_bb (USE_STMT (use_p)) == bb
- && is_gimple_assign (USE_STMT (use_p))
- && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use
- && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR)
-   cond_exprs.safe_push (as_a  (USE_STMT (use_p)));
-   }
- for (unsigned i = cond_exprs.length () == cnt ? 1 : 0;
-  i < cond_exprs.length (); ++i)
-   {
- gassign *copy = as_a  (gimple_copy (stmt));
- tree new_def = duplicate_ssa_name (lhs, copy);
- gimple_assign_set_lhs (copy, new_def);
- auto gsi2 = gsi_for_stmt (cond_exprs[i]);
- gsi_insert_before (&gsi2, copy, GSI_SAME_STMT);
- gimple_assign_set_rhs1 (cond_exprs[i], new_def);
- update_stmt (cond_exprs[i]);
-   }
-   }
+   duplicate_comparison (stmt, bb);
}
 }
 
-- 
2.43.0



[PATCH 3/3] isel: Don't duplicate comparisons for -O0 nor -fno-tree-ter [PR116101]

2024-07-25 Thread Andrew Pinski
While doing cleanups on this code I noticed that we do the duplicate
of comparisons at -O0. For C and C++ code this makes no difference as
the gimplifier never produces COND_EXPR. But it could make a difference
for other front-ends.
Oh and for -fno-tree-ter, duplicating the comparison is just a waste
as it is never used for expand.

I also decided to add a few testcases so this is checked in the future.
Even added one for the duplication itself.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/116101

gcc/ChangeLog:

* gimple-isel.cc (maybe_duplicate_comparison): Don't
do anything for -O0 or -fno-tree-ter.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/dup_compare_cond-1.c: New test.
* gcc.dg/tree-ssa/dup_compare_cond-2.c: New test.
* gcc.dg/tree-ssa/dup_compare_cond-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-isel.cc|  5 +
 .../gcc.dg/tree-ssa/dup_compare_cond-1.c  | 19 +++
 .../gcc.dg/tree-ssa/dup_compare_cond-2.c  | 19 +++
 .../gcc.dg/tree-ssa/dup_compare_cond-3.c  | 19 +++
 4 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 99bfc937bd5..2817ab659af 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -407,6 +407,11 @@ maybe_duplicate_comparison (gassign *stmt, basic_block bb)
   tree lhs = gimple_assign_lhs (stmt);
   unsigned cnt = 0;
 
+  /* This is should not be used for -O0 nor it is not useful
+ when ter is turned off. */
+  if (!optimize || !flag_tree_ter)
+return;
+
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
 {
   if (is_gimple_debug (USE_STMT (use_p)))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c
new file mode 100644
index 000..0321a60b34f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O0 -fdump-tree-optimized " } */
+/* PR tree-optimization/116101 */
+
+int __GIMPLE() f(int a, int b, int c, int d, int e)
+{
+  _Bool t;
+  int ff;
+  int gg;
+  int res;
+  t = a == b;
+  ff = t ? a : e;
+  gg = t ? d : b;
+  res = ff+gg;
+  return res;
+}
+
+/* At -O0 we should not duplicate the comparison. */
+/* { dg-final { scan-tree-dump-times " == " 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c
new file mode 100644
index 000..07e2175c612
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O2 -fdump-tree-optimized " } */
+/* PR middle-end/105715 */
+
+int __GIMPLE() f(int a, int b, int c, int d, int e)
+{
+  _Bool t;
+  int ff;
+  int gg;
+  int res;
+  t = a == b;
+  ff = t ? a : e;
+  gg = t ? d : b;
+  res = ff+gg;
+  return res;
+}
+
+/* At -O2 we should have duplicate the comparison. */
+/* { dg-final { scan-tree-dump-times " == " 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c
new file mode 100644
index 000..88bf19795e0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O2 -fno-tree-ter -fdump-tree-optimized " } */
+/* PR tree-optimization/116101 */
+
+int __GIMPLE() f(int a, int b, int c, int d, int e)
+{
+  _Bool t;
+  int ff;
+  int gg;
+  int res;
+  t = a == b;
+  ff = t ? a : e;
+  gg = t ? d : b;
+  res = ff+gg;
+  return res;
+}
+
+/* With -fno-tree-ter it is not useful to duplicate the comparison. */
+/* { dg-final { scan-tree-dump-times " == " 1 "optimized" } } */
-- 
2.43.0



[PATCH 2/3] isel: Small cleanup of duplicating comparisons

2024-07-25 Thread Andrew Pinski
This is a small cleanup of the duplicating comparison code.
There is code generation difference but only for -O0 and -fno-tree-ter
(both of which will be fixed in a later patch).
The difference is instead of skipping the first use if the
comparison uses are only in cond_expr we skip the last use.
Also we go through the uses list in the opposite order now too.

The cleanups are the following:
* Don't call has_single_use as we will do the loop anyways
* Change the order of the checks slightly, it is better
  to check for cond_expr earlier
* Use cond_exprs as a stack and pop from it.
  Skipping the top if the use is only from cond_expr.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-isel.cc (duplicate_comparison): Rename to ...
(maybe_duplicate_comparison): This. Add check for use here
rather than in its caller.
(pass_gimple_isel::execute): Don't check how many uses the
comparison had and call maybe_duplicate_comparison instead of
duplicate_comparison.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-isel.cc | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 327a78ea408..99bfc937bd5 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -399,34 +399,46 @@ gimple_expand_vec_cond_expr (struct function *fun, 
gimple_stmt_iterator *gsi,
comparisons so RTL expansion with the help of TER
can perform better if conversion.  */
 static void
-duplicate_comparison (gassign *stmt, basic_block bb)
+maybe_duplicate_comparison (gassign *stmt, basic_block bb)
 {
   imm_use_iterator imm_iter;
   use_operand_p use_p;
   auto_vec cond_exprs;
-  unsigned cnt = 0;
   tree lhs = gimple_assign_lhs (stmt);
+  unsigned cnt = 0;
+
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
 {
   if (is_gimple_debug (USE_STMT (use_p)))
continue;
   cnt++;
+  /* Add the use statement if it was a cond_expr.  */
   if (gimple_bb (USE_STMT (use_p)) == bb
  && is_gimple_assign (USE_STMT (use_p))
- && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use
- && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR)
+ && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR
+ && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use)
cond_exprs.safe_push (as_a  (USE_STMT (use_p)));
-  }
-  for (unsigned i = cond_exprs.length () == cnt ? 1 : 0;
-   i < cond_exprs.length (); ++i)
+}
+
+  /* If the comparison has 0 or 1 uses, no reason to do anything. */
+  if (cnt <= 1)
+return;
+
+  /* If we only use the expression inside cond_exprs in that BB, we don't
+ need to duplicate for one of them so pop the top. */
+  if (cond_exprs.length () == cnt)
+cond_exprs.pop();
+
+  while (!cond_exprs.is_empty())
 {
+  auto old_top = cond_exprs.pop();
   gassign *copy = as_a  (gimple_copy (stmt));
   tree new_def = duplicate_ssa_name (lhs, copy);
   gimple_assign_set_lhs (copy, new_def);
-  auto gsi2 = gsi_for_stmt (cond_exprs[i]);
+  auto gsi2 = gsi_for_stmt (old_top);
   gsi_insert_before (&gsi2, copy, GSI_SAME_STMT);
-  gimple_assign_set_rhs1 (cond_exprs[i], new_def);
-  update_stmt (cond_exprs[i]);
+  gimple_assign_set_rhs1 (old_top, new_def);
+  update_stmt (old_top);
 }
 }
 
@@ -500,10 +512,8 @@ pass_gimple_isel::execute (struct function *fun)
continue;
 
  tree_code code = gimple_assign_rhs_code (stmt);
- tree lhs = gimple_assign_lhs (stmt);
- if (TREE_CODE_CLASS (code) == tcc_comparison
- && !has_single_use (lhs))
-   duplicate_comparison (stmt, bb);
+ if (TREE_CODE_CLASS (code) == tcc_comparison)
+   maybe_duplicate_comparison (stmt, bb);
}
 }
 
-- 
2.43.0



Re: [PATCH] c++/modules: Ensure deduction guides are always reachable [PR115231]

2024-07-25 Thread Nathaniel Shead
On Tue, Jul 23, 2024 at 04:17:22PM -0400, Jason Merrill wrote:
> On 6/15/24 10:29 PM, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > This probably isn't the most efficient approach, since we need to do
> > name lookup to find deduction guides for a type which will also
> > potentially do a bunch of pointless lazy loading from imported modules,
> > but I wasn't able to work out a better approach without completely
> > reworking how deduction guides are stored and represented.
> 
> Indeed.  We likely want to find them more directly from the template; it's
> not clear to me that DECL_INITIAL is used for TEMPLATE_DECL, or we could put
> them in an internal attribute or a separate hash table.
> 
> > -- >8 --
> > 
> > Deduction guides are represented as 'normal' functions currently, and
> > have no special handling in modules.  However, this causes some issues;
> > by [temp.deduct.guide] a deduction guide is not found by normal name
> > lookup and instead all reachable deduction guides for a class template
> > should be considered, but this does not happen currently.
> > 
> > To solve this, this patch ensures that all deduction guides are
> > considered exported to ensure that they are always visible to importers
> > if they are reachable.  Another alternative here would be to add a new
> > kind of "all reachable" flag to name lookup, but that is complicated by
> > some difficulties in handling GM entities; this may be a better way to
> > go if more kinds of entities end up needing this handling, however.
> > 
> > Another issue here is that because deduction guides are "unrelated"
> > functions, they will usually get discarded from the GMF, so this patch
> > ensures that when finding dependencies, GMF deduction guides will also
> > have bindings created.  We do this in find_dependencies so that we don't
> > unnecessarily create bindings for GMF deduction guides that are never
> > reached; for consistency we do this for *all* deduction guides, not just
> > GM ones.
> 
> If you fixed the dependency calculation, why do they also need to be
> exported?
> 
> Jason
> 

Deduction guides aren't found using normal name lookup, but any
reachable deduction guide must be considered.  This means that even if
the module interface exports no declarations whatsoever, a deduction
guide declared in the module purview must still be considered by
importers.

The other option I've considered is adding a new "ANY_REACHABLE" flag to
name lookup which would also consider non-exported reachable decls.  On
further consideration I might actually go this way; I've been thinking
about how to resolve some issues adjacent to supporting textual
redefinitions that I believe this will be necessary for anyway, and we
can probably use this in tsubst_friend_class as well rather than the
current relatively ad-hoc solution.

That said, I've realised that this patch isn't completely sufficient
anyway; consider:

  // m.cpp
  module;
  template  struct S;
  export module M;
  S(int) -> S;

  // x.cpp
  template  struct S { S(int); };
  import M;
  int main() {
S s(10);  // should be S s;
  }

This patch doesn't correctly handle this case yet, we need to also
consider cases where only the deduction guide is in purview.

Nathaniel


Re: [PATCH] c++/modules: Stream warning suppressions [PR115757]

2024-07-25 Thread Nathaniel Shead
On Tue, Jul 23, 2024 at 05:14:30PM -0400, Patrick Palka wrote:
> On Tue, 23 Jul 2024, Jason Merrill wrote:
> 
> > On 7/7/24 12:39 AM, Nathaniel Shead wrote:
> > > Bootstrapped on x86_64-pc-linux-gnu, successfully regtested modules.exp;
> > > OK for trunk if full regtest passes?
> > 
> > Patrick, I assume this change won't mess with your streaming optimizations?
> 
> Should be fine, those optimizations are currently confined to
> tree_node_bools where we stream many consecutive bools (and so we want
> to avoid conditionally streaming a bit, so that the bit buffer position
> of each streamed bit is statically known).  I don't think the technique
> can really apply to core_vals since it streams mostly trees.
> 
> > 
> > OK with Patrick's approval or on Friday, whichever comes first.
> > 

Thanks, pushed.

> > > -- >8 --
> > > 
> > > Currently we don't stream the contents of 'nowarn_map'; this means that
> > > warning suppressions don't get applied in importers, which is
> > > particularly relevant for templates (as in the linked testcase).
> > > 
> > > Rather than streaming the whole contents of 'nowarn_map', this patch
> > > instead just streams the exported suppressions for each tree node
> > > individually, to not build up additional locations and suppressions for
> > > tree nodes that do not need to be streamed.
> > > 
> > >   PR c++/115757
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * module.cc (trees_out::core_vals): Write warning specs for
> > >   DECLs and EXPRs.
> > >   (trees_in::core_vals): Read warning specs.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * tree.h (put_warning_spec_at): Declare new function.
> > >   (has_warning_spec): Likewise.
> > >   (get_warning_spec): Likewise.
> > >   (put_warning_spec): Likewise.
> > >   * diagnostic-spec.h (nowarn_spec_t::from_bits): New function.
> > >   * diagnostic-spec.cc (put_warning_spec_at): New function.
> > >   * warning-control.cc (has_warning_spec): New function.
> > >   (get_warning_spec): New function.
> > >   (put_warning_spec): New function.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/modules/warn-spec-1_a.C: New test.
> > >   * g++.dg/modules/warn-spec-1_b.C: New test.
> > > 
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >   gcc/cp/module.cc | 12 +
> > >   gcc/diagnostic-spec.cc   | 21 
> > >   gcc/diagnostic-spec.h|  7 ++
> > >   gcc/testsuite/g++.dg/modules/warn-spec-1_a.C | 10 
> > >   gcc/testsuite/g++.dg/modules/warn-spec-1_b.C |  8 ++
> > >   gcc/tree.h   |  9 +++
> > >   gcc/warning-control.cc   | 26 
> > >   7 files changed, 93 insertions(+)
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-1_a.C
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-1_b.C
> > > 
> > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > index dc5d046f04d..0f9a689dbec 100644
> > > --- a/gcc/cp/module.cc
> > > +++ b/gcc/cp/module.cc
> > > @@ -6000,6 +6000,10 @@ trees_out::core_vals (tree t)
> > >   if (state)
> > >   state->write_location (*this, t->decl_minimal.locus);
> > > +
> > > +  if (streaming_p ())
> > > + if (has_warning_spec (t))
> > > +   u (get_warning_spec (t));
> > >   }
> > >   if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))
> > > @@ -6113,6 +6117,10 @@ trees_out::core_vals (tree t)
> > > if (state)
> > >   state->write_location (*this, t->exp.locus);
> > >   +  if (streaming_p ())
> > > + if (has_warning_spec (t))
> > > +   u (get_warning_spec (t));
> > > +
> > > /* Walk in forward order, as (for instance) REQUIRES_EXPR has a
> > >bunch of unscoped parms on its first operand.  It's safer to
> > >create those in order.  */
> > > @@ -6576,6 +6584,8 @@ trees_in::core_vals (tree t)
> > > /* Don't zap the locus just yet, we don't record it correctly
> > >and thus lose all location information.  */
> > > t->decl_minimal.locus = state->read_location (*this);
> > > +  if (has_warning_spec (t))
> > > + put_warning_spec (t, u ());
> > >   }
> > >   if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))
> > > @@ -6654,6 +6664,8 @@ trees_in::core_vals (tree t)
> > > if (CODE_CONTAINS_STRUCT (code, TS_EXP))
> > >   {
> > > t->exp.locus = state->read_location (*this);
> > > +  if (has_warning_spec (t))
> > > + put_warning_spec (t, u ());
> > >   bool vl = TREE_CODE_CLASS (code) == tcc_vl_exp;
> > > for (unsigned limit = (vl ? VL_EXP_OPERAND_LENGTH (t)
> > > diff --git a/gcc/diagnostic-spec.cc b/gcc/diagnostic-spec.cc
> > > index 996ad6b273a..addaf089f03 100644
> > > --- a/gcc/diagnostic-spec.cc
> > > +++ b/gcc/diagnostic-spec.cc
> > > @@ -179,6 +179,27 @@ suppress_warning_at (location_t loc, opt_code opt /* 
> > > =
> > > all_warnings */,
> > > return true;
> > 

[PATCH]AArch64: check for vector mode in get_mask_mode [PR116074]

2024-07-25 Thread Tamar Christina
Hi All,

For historical reasons AArch64 has TI mode vector types but does not consider
TImode a vector mode.

What's happening in the PR is that get_vectype_for_scalar_type is returning
vector(1) TImode for a TImode scalar.  This then fails when we call
targetm.vectorize.get_mask_mode (vecmode).exists (&) on the TYPE_MODE.

I've checked other usages of get_mask_mode and none of them have anything that
would prevent this same issue from happening.  It only happens that normally
the vectorizer rejects the vector(1) type early, but in this case we get
further because the COND_EXPR hasn't been analyzed yet for a type.

I believe get_mask_mode shouldn't fault, and so this adds the check for vector
mode in the hook and returns nothing if it's not.  I did not add this to the
generic function because I believe this is an AArch64 quirk.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/116074
* config/aarch64/aarch64.cc (aarch64_get_mask_mode): Check vector mode.

gcc/testsuite/ChangeLog:

PR target/116074
* g++.target/aarch64/pr116074.C: New test.

---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
355ab97891cf0a7d487fa4c69ae23a5f75897851..045ac0e09b0eaa14935db3924798402c9dd1947c
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1870,6 +1870,9 @@ aarch64_sve_pred_mode (machine_mode mode)
 static opt_machine_mode
 aarch64_get_mask_mode (machine_mode mode)
 {
+  if (!VECTOR_MODE_P (mode))
+return opt_machine_mode ();
+
   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
   if (vec_flags & VEC_SVE_DATA)
 return aarch64_sve_pred_mode (mode);
diff --git a/gcc/testsuite/g++.target/aarch64/pr116074.C 
b/gcc/testsuite/g++.target/aarch64/pr116074.C
new file mode 100644
index 
..54cf561510c460499a816ab6a84603fc20a5f1e5
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/pr116074.C
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int m[40];
+
+template  struct j {
+  int length;
+  k *e;
+  void operator[](int) {
+if (length)
+  __builtin___memcpy_chk(m, m+3, sizeof (k), -1);
+  }
+};
+
+j> o;
+
+int *q;
+
+void ao(int i) {
+  for (; i > 0; i--) {
+o[1];
+*q = 1;
+  }
+}




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 355ab97891cf0a7d487fa4c69ae23a5f75897851..045ac0e09b0eaa14935db3924798402c9dd1947c 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1870,6 +1870,9 @@ aarch64_sve_pred_mode (machine_mode mode)
 static opt_machine_mode
 aarch64_get_mask_mode (machine_mode mode)
 {
+  if (!VECTOR_MODE_P (mode))
+return opt_machine_mode ();
+
   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
   if (vec_flags & VEC_SVE_DATA)
 return aarch64_sve_pred_mode (mode);
diff --git a/gcc/testsuite/g++.target/aarch64/pr116074.C b/gcc/testsuite/g++.target/aarch64/pr116074.C
new file mode 100644
index ..54cf561510c460499a816ab6a84603fc20a5f1e5
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/pr116074.C
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int m[40];
+
+template  struct j {
+  int length;
+  k *e;
+  void operator[](int) {
+if (length)
+  __builtin___memcpy_chk(m, m+3, sizeof (k), -1);
+  }
+};
+
+j> o;
+
+int *q;
+
+void ao(int i) {
+  for (; i > 0; i--) {
+o[1];
+*q = 1;
+  }
+}





[PATCH] i386: Fix AVX512 intrin macro typo

2024-07-25 Thread Haochen Jiang
Hi all,

There are several typo in AVX512 intrins macro define. They will eventually
result in errors with -O0. This patch will fix that.

Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC14,
GCC 13 and GCC 12?

Thx,
Haochen

gcc/ChangeLog:

* config/i386/avx512dqintrin.h
(_mm_mask_fpclass_ss_mask): Correct operand order.
(_mm_mask_fpclass_sd_mask): Ditto.
(_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
(_mm_reduce_round_ss): Ditto.
* config/i386/avx512vlbwintrin.h
(_mm256_mask_alignr_epi8): Correct operand usage.
(_mm_mask_alignr_epi8): Ditto.
* config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.
---
 gcc/config/i386/avx512dqintrin.h   | 16 +---
 gcc/config/i386/avx512vlbwintrin.h |  4 ++--
 gcc/config/i386/avx512vlintrin.h   |  2 +-
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 3beed7e649a..d9890c6da1d 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -572,11 +572,11 @@ _mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, 
const int __imm)
   ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),   \
 (int) (C), (__mmask8) (-1))) \
 
-#define _mm_mask_fpclass_ss_mask(X, C, U)  \
+#define _mm_mask_fpclass_ss_mask(U, X, C)  \
   ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X),\
 (int) (C), (__mmask8) (U)))
 
-#define _mm_mask_fpclass_sd_mask(X, C, U)  \
+#define _mm_mask_fpclass_sd_mask(U, X, C)  \
   ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),   \
 (int) (C), (__mmask8) (U)))
 #define _mm_reduce_sd(A, B, C) \
@@ -594,8 +594,9 @@ _mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const 
int __imm)
 (__mmask8)(U)))
 
 #define _mm_reduce_round_sd(A, B, C, R)   \
-  ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A),  \
-(__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R)))
+  ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
+(__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_avx512_setzero_pd (), \
+(__mmask8)(-1), (int)(R)))
 
 #define _mm_mask_reduce_round_sd(W, U, A, B, C, R)\
   ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
@@ -622,8 +623,9 @@ _mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const 
int __imm)
 (__mmask8)(U)))
 
 #define _mm_reduce_round_ss(A, B, C, R)   \
-  ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A),   \
-(__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R)))
+  ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A),   \
+(__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_avx512_setzero_ps (),  \
+(__mmask8)(-1), (int)(R)))
 
 #define _mm_mask_reduce_round_ss(W, U, A, B, C, R)\
   ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A),   \
@@ -631,7 +633,7 @@ _mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const 
int __imm)
 (__mmask8)(U), (int)(R)))
 
 #define _mm_maskz_reduce_round_ss(U, A, B, C, R)  \
-  ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A),   \
+  ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A),   \
 (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_avx512_setzero_ps (), \
 (__mmask8)(U), (int)(R)))
 
diff --git a/gcc/config/i386/avx512vlbwintrin.h 
b/gcc/config/i386/avx512vlbwintrin.h
index 56740054aa1..98b9099e343 100644
--- a/gcc/config/i386/avx512vlbwintrin.h
+++ b/gcc/config/i386/avx512vlbwintrin.h
@@ -2089,7 +2089,7 @@ _mm_maskz_slli_epi16 (__mmask8 __U, __m128i __A, unsigned 
int __B)
 #define _mm256_mask_alignr_epi8(W, U, X, Y, N) 
\
   ((__m256i) __builtin_ia32_palignr256_mask ((__v4di)(__m256i)(X), 
\
(__v4di)(__m256i)(Y), (int)((N) * 
8),   \
-   (__v4di)(__m256i)(X), 
(__mmask32)(U)))
+   (__v4di)(__m256i)(W), 
(__mmask32)(U)))
 
 #define _mm256_mask_srli_epi16(W, U, A, B)  \
   ((__m256i) __builtin_ia32_psrlwi256_mask ((__v16hi)(__m256i)(A),  \
@@ -2172,7 +2172,7 @@ _mm_maskz_slli_epi16 (__mmask8 __U, __m128i __A, unsigned 
int __B)
 #define _mm_mask_alignr_epi8(W, U, X, Y, N)
\
   ((__m128i) __builtin_ia32_palignr128_mask ((__v2di)(__m128i)(X), 
\

RE: [PATCH Ping] i386: Use BLKmode for {ld,st}tilecfg

2024-07-25 Thread Jiang, Haochen
Ping for this patch

Thx,
Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Thursday, July 18, 2024 9:45 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; hjl.to...@gmail.com;
> ubiz...@gmail.com
> Subject: [PATCH] i386: Use BLKmode for {ld,st}tilecfg
> 
> Hi all,
> 
> For AMX instructions related with memory, we will treat the memory
> size as not specified since there won't be different size causing
> confusion for memory.
> 
> This will change the output under Intel mode, which is broken for now when
> using with assembler and aligns to current binutils behavior.
> 
> Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk?
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-expand.cc (ix86_expand_builtin): Change
>   from XImode to BLKmode.
>   * config/i386/i386.md (ldtilecfg): Change XI to BLK.
>   (sttilecfg): Ditto.
> ---
>  gcc/config/i386/i386-expand.cc |  2 +-
>  gcc/config/i386/i386.md| 12 +---
>  2 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 9a31e6df2aa..d9ad06264aa 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -14198,7 +14198,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx
> subtarget,
> op0 = convert_memory_address (Pmode, op0);
> op0 = copy_addr_to_reg (op0);
>   }
> -  op0 = gen_rtx_MEM (XImode, op0);
> +  op0 = gen_rtx_MEM (BLKmode, op0);
>if (fcode == IX86_BUILTIN_LDTILECFG)
>   icode = CODE_FOR_ldtilecfg;
>else
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index de9f4ba0496..86989d4875a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -28975,24 +28975,22 @@
> (set_attr "type" "other")])
> 
>  (define_insn "ldtilecfg"
> -  [(unspec_volatile [(match_operand:XI 0 "memory_operand" "m")]
> +  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
>  UNSPECV_LDTILECFG)]
>"TARGET_AMX_TILE"
>"ldtilecfg\t%0"
>[(set_attr "type" "other")
> (set_attr "prefix" "maybe_evex")
> -   (set_attr "memory" "load")
> -   (set_attr "mode" "XI")])
> +   (set_attr "memory" "load")])
> 
>  (define_insn "sttilecfg"
> -  [(set (match_operand:XI 0 "memory_operand" "=m")
> -(unspec_volatile:XI [(const_int 0)] UNSPECV_STTILECFG))]
> +  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +(unspec_volatile:BLK [(const_int 0)] UNSPECV_STTILECFG))]
>"TARGET_AMX_TILE"
>"sttilecfg\t%0"
>[(set_attr "type" "other")
> (set_attr "prefix" "maybe_evex")
> -   (set_attr "memory" "store")
> -   (set_attr "mode" "XI")])
> +   (set_attr "memory" "store")])
> 
>  (include "mmx.md")
>  (include "sse.md")
> --
> 2.31.1



Re: [PATCH] aarch64: Fix target/optimize option handling with transiting between O1 to O2

2024-07-25 Thread Richard Biener
On Thu, Jul 25, 2024 at 10:25 PM Andrew Pinski  wrote:
>
> The problem here is the aarch64 backend enables -mearly-ra at -O2 and above 
> but
> it is not marked as an Optimization in the .opt file so enabling it sometimes
> reset the target options when going from -O1 to -O2 for the first time.
>
> Build and tested for aarch64-linux-gnu with no regressions.

OK.

Richard.

> PR target/116065
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.opt (mearly-ra=): Mark as Optimization rather
> than Save.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/target_optimization-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.opt   |  2 +-
>  .../aarch64/sve/target_optimization-1.c  | 16 
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 2f90f10352a..6229bcb371e 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -256,7 +256,7 @@ EnumValue
>  Enum(early_ra_scope) String(none) Value(AARCH64_EARLY_RA_NONE)
>
>  mearly-ra=
> -Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra) 
> Init(AARCH64_EARLY_RA_NONE) Save
> +Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra) 
> Init(AARCH64_EARLY_RA_NONE) Optimization
>  Specify when to enable an early register allocation pass.  The possibilities
>  are: all functions, functions that have access to strided multi-register
>  instructions, and no functions.
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c
> new file mode 100644
> index 000..3010f0c4189
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/target_optimization-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +#include 
> +
> +/* Turn off SVE overall */
> +#pragma GCC target("+nosve")
> +
> +/* But the function turns it on again so it should work.
> +   Even if changing the optimization level from O1 to O2. */
> +int __attribute__((target ("+sve"), optimize(2)))
> +bar (void)
> +{
> +  svfloat32_t xseg;
> +  return svlen_f32(xseg);
> +}
> --
> 2.43.0
>


Re: [PATCH] i386: Fix AVX512 intrin macro typo

2024-07-25 Thread Jakub Jelinek
On Fri, Jul 26, 2024 at 02:25:22PM +0800, Haochen Jiang wrote:
> Hi all,
> 
> There are several typo in AVX512 intrins macro define. They will eventually
> result in errors with -O0. This patch will fix that.

Add a testcase that verifies that?

> Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC14,
> GCC 13 and GCC 12?
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   * config/i386/avx512dqintrin.h
>   (_mm_mask_fpclass_ss_mask): Correct operand order.
>   (_mm_mask_fpclass_sd_mask): Ditto.
>   (_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
>   (_mm_reduce_round_ss): Ditto.
>   * config/i386/avx512vlbwintrin.h
>   (_mm256_mask_alignr_epi8): Correct operand usage.
>   (_mm_mask_alignr_epi8): Ditto.
>   * config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.

Jakub



[PATCH] i386: Add non-optimize prefetchi intrins

2024-07-25 Thread Haochen Jiang
Hi all,

Under -O0, with the "newly" introduced intrins, the variable will be
transformed as mem instead of the origin symbol_ref. The compiler will
then treat the operand as invalid and turn the operation into nop, which
is not expected. Use macro for non-optimize to keep the variable as
symbol_ref just as how prefetch intrin does.

Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk and backport
to GCC 14 and GCC 13?

Thx,
Haochen

---
 gcc/config/i386/prfchiintrin.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/config/i386/prfchiintrin.h b/gcc/config/i386/prfchiintrin.h
index dfca89c7d16..d6580e504c0 100644
--- a/gcc/config/i386/prfchiintrin.h
+++ b/gcc/config/i386/prfchiintrin.h
@@ -37,6 +37,7 @@
 #define __DISABLE_PREFETCHI__
 #endif /* __PREFETCHI__ */
 
+#ifdef __OPTIMIZE__
 extern __inline void
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _m_prefetchit0 (void* __P)
@@ -50,6 +51,14 @@ _m_prefetchit1 (void* __P)
 {
   __builtin_ia32_prefetchi (__P, 2);
 }
+#else
+#define _m_prefetchit0(P)  \
+  __builtin_ia32_prefetchi(P, 3);
+
+#define _m_prefetchit1(P)  \
+  __builtin_ia32_prefetchi(P, 2);
+
+#endif
 
 #ifdef __DISABLE_PREFETCHI__
 #undef __DISABLE_PREFETCHI__
-- 
2.31.1



RE: [PATCH] i386: Fix AVX512 intrin macro typo

2024-07-25 Thread Jiang, Haochen
> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, July 26, 2024 2:31 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Fix AVX512 intrin macro typo
> 
> On Fri, Jul 26, 2024 at 02:25:22PM +0800, Haochen Jiang wrote:
> > Hi all,
> >
> > There are several typo in AVX512 intrins macro define. They will
> > eventually result in errors with -O0. This patch will fix that.
> 
> Add a testcase that verifies that?

Ok, I will add testcases with -O0 for them.

Thx,
Haochen

> 
> > Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to
> > GCC14, GCC 13 and GCC 12?
> >
> > Thx,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > * config/i386/avx512dqintrin.h
> > (_mm_mask_fpclass_ss_mask): Correct operand order.
> > (_mm_mask_fpclass_sd_mask): Ditto.
> > (_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
> > (_mm_reduce_round_ss): Ditto.
> > * config/i386/avx512vlbwintrin.h
> > (_mm256_mask_alignr_epi8): Correct operand usage.
> > (_mm_mask_alignr_epi8): Ditto.
> > * config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.
> 
>   Jakub



Re: [PATCH Ping] i386: Use BLKmode for {ld,st}tilecfg

2024-07-25 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 2:28 PM Jiang, Haochen  wrote:
>
> Ping for this patch
>
> Thx,
> Haochen
>
> > -Original Message-
> > From: Haochen Jiang 
> > Sent: Thursday, July 18, 2024 9:45 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; hjl.to...@gmail.com;
> > ubiz...@gmail.com
> > Subject: [PATCH] i386: Use BLKmode for {ld,st}tilecfg
> >
> > Hi all,
> >
> > For AMX instructions related with memory, we will treat the memory
> > size as not specified since there won't be different size causing
> > confusion for memory.
> >
> > This will change the output under Intel mode, which is broken for now when
> > using with assembler and aligns to current binutils behavior.
> >
> > Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk?
Ok.
> >
> > Thx,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/i386-expand.cc (ix86_expand_builtin): Change
> >   from XImode to BLKmode.
> >   * config/i386/i386.md (ldtilecfg): Change XI to BLK.
> >   (sttilecfg): Ditto.
> > ---
> >  gcc/config/i386/i386-expand.cc |  2 +-
> >  gcc/config/i386/i386.md| 12 +---
> >  2 files changed, 6 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index 9a31e6df2aa..d9ad06264aa 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -14198,7 +14198,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx
> > subtarget,
> > op0 = convert_memory_address (Pmode, op0);
> > op0 = copy_addr_to_reg (op0);
> >   }
> > -  op0 = gen_rtx_MEM (XImode, op0);
> > +  op0 = gen_rtx_MEM (BLKmode, op0);
> >if (fcode == IX86_BUILTIN_LDTILECFG)
> >   icode = CODE_FOR_ldtilecfg;
> >else
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index de9f4ba0496..86989d4875a 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -28975,24 +28975,22 @@
> > (set_attr "type" "other")])
> >
> >  (define_insn "ldtilecfg"
> > -  [(unspec_volatile [(match_operand:XI 0 "memory_operand" "m")]
> > +  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
> >  UNSPECV_LDTILECFG)]
> >"TARGET_AMX_TILE"
> >"ldtilecfg\t%0"
> >[(set_attr "type" "other")
> > (set_attr "prefix" "maybe_evex")
> > -   (set_attr "memory" "load")
> > -   (set_attr "mode" "XI")])
> > +   (set_attr "memory" "load")])
> >
> >  (define_insn "sttilecfg"
> > -  [(set (match_operand:XI 0 "memory_operand" "=m")
> > -(unspec_volatile:XI [(const_int 0)] UNSPECV_STTILECFG))]
> > +  [(set (match_operand:BLK 0 "memory_operand" "=m")
> > +(unspec_volatile:BLK [(const_int 0)] UNSPECV_STTILECFG))]
> >"TARGET_AMX_TILE"
> >"sttilecfg\t%0"
> >[(set_attr "type" "other")
> > (set_attr "prefix" "maybe_evex")
> > -   (set_attr "memory" "store")
> > -   (set_attr "mode" "XI")])
> > +   (set_attr "memory" "store")])
> >
> >  (include "mmx.md")
> >  (include "sse.md")
> > --
> > 2.31.1
>


-- 
BR,
Hongtao


Re: [PATCH] MATCH: Optimize `VEC_SHL_INSERT (dup (A), A)` to just `dup (A) [PR116075]

2024-07-25 Thread Richard Biener
On Thu, Jul 25, 2024 at 10:57 PM Andrew Pinski  wrote:
>
> On Thu, Jul 25, 2024 at 10:20 AM Richard Biener
>  wrote:
> >
> >
> >
> > > Am 25.07.2024 um 17:56 schrieb Andrew Pinski :
> > >
> > > It was noticed if we have `.VEC_SHL_INSERT ({ 0, ... }, 0)` it was not 
> > > being
> > > simplified to just `{ 0, ... }`. This was generated from the 
> > > autovectorizer
> > > (maybe even on accident, see PR tree-optmization/116081).
> > >
> > > This adds a few SVE testcases to see if this is optimized since the
> > > auto-vectorizer or intrinsics are the only two ways of getting this
> > > produced.
> > >
> > > Build and tested for aarch64-linux-gnu with no regressions.
> >
> > For the case in question implementing fold_const_call would be better.  
> > Also …
>
> Makes sense, I have a patch which I am testing that does this.
>
> >
> > >PR target/116075
> > >
> > > gcc/ChangeLog:
> > >
> > >* match.pd (`VEC_SHL_INSERT (dup (A), A)`): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >* gcc.target/aarch64/sve/dup-insr-1.c: New test.
> > >* gcc.target/aarch64/sve/dup-insr-2.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > > gcc/match.pd  | 17 
> > > .../gcc.target/aarch64/sve/dup-insr-1.c   | 26 +++
> > > .../gcc.target/aarch64/sve/dup-insr-2.c   | 26 +++
> > > 3 files changed, 69 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 680dfea523f..a3a64bd742e 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -10657,3 +10657,20 @@ and,
> > >   }
> > >   (if (full_perm_p)
> > >(vec_perm (op@3 @0 @1) @3 @2))
> > > +
> > > +/* vec shift left insert (dup(A), A) -> dup(A) */
> > > +(simplify
> > > + (IFN_VEC_SHL_INSERT vec_same_elem_p@0 @1)
> > > +  (with {
> > > +tree elem = uniform_vector_p (@0);
> > > +if (!elem && TREE_CODE (@0) == SSA_NAME)
> > > +  {
> > > +gimple *def = SSA_NAME_DEF_STMT (@0);
> > > +if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
> > > +  elem = uniform_vector_p (gimple_assign_rhs1 (def));
> > > +else if (gimple_assign_rhs_code (def) == VEC_DUPLICATE_EXPR)
> > > +  elem = gimple_assign_rhs1 (def);
> > > +  }
> > > +   }
> > > +(if (elem && operand_equal_p (@1, elem))
> >
> > Ugh.  Two predicates involved and we still have to do this?
>
> vec_same_elem_p is not the best predicate and other uses it does the
> same as above.

Yeah, there's a lack of a way to do (IFN_VEC_SHL_INSERT (vec_same_elem_p @1) @1)
aka give (match ...) predicates an argument - it only has results.  Or
alternatively to do

(match (vec_same_elem_p @0)
 @1
 (with
   {
  tree el = uniform_vector_p (@1);
  @0 = el;
   }
  (if (el)))

The above actually works, but only if you mention @0 in the match expression,
so s/@1/@0/ "works" but of course it's not really intended this way(?)

> Anyways I have simplified this down to just supporting vec_duplicate
> and it still fixes the f1 in dup-insr-1.c.
> Once my tests are finished, I will post the patch.
>
> Thanks,
> Andrew
>
> >
> > > + @0)))
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c 
> > > b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> > > new file mode 100644
> > > index 000..41dcbba45cf
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-1.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O -fdump-tree-optimized" } */
> > > +/* PR target/116075 */
> > > +
> > > +#include 
> > > +
> > > +svint8_t f(void)
> > > +{
> > > +  svint8_t tt;
> > > +  tt = svdup_s8 (0);
> > > +  tt = svinsr (tt, 0);
> > > +  return tt;
> > > +}
> > > +
> > > +svint8_t f1(int8_t t)
> > > +{
> > > +  svint8_t tt;
> > > +  tt = svdup_s8 (t);
> > > +  tt = svinsr (tt, t);
> > > +  return tt;
> > > +}
> > > +
> > > +/* The above 2 functions should have removed the VEC_SHL_INSERT. */
> > > +
> > > +/* { dg-final { scan-tree-dump-not ".VEC_SHL_INSERT " "optimized" } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c 
> > > b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> > > new file mode 100644
> > > index 000..8eafe974624
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup-insr-2.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O -fdump-tree-optimized" } */
> > > +/* PR target/116075 */
> > > +
> > > +#include 
> > > +
> > > +svint8_t f(int8_t t)
> > > +{
> > > +  svint8_t tt;
> > > +  tt = svdup_s8 (0);
> > > +  tt = svinsr (tt, t);
> > > +  return tt;
> > > +}
> > > +
> > > +svint8_t f1(int8_t t)
> > > +{
> > > +  svint8_t tt;
> > > +  tt = svdup_s8 (t);
> > > +  tt = svinsr (tt, 0);
> > > +  return tt;
> > > +}
> > > +
> > > +/* Th

Re: [PATCH 3/3] isel: Don't duplicate comparisons for -O0 nor -fno-tree-ter [PR116101]

2024-07-25 Thread Richard Biener
On Fri, Jul 26, 2024 at 6:37 AM Andrew Pinski  wrote:
>
> While doing cleanups on this code I noticed that we do the duplicate
> of comparisons at -O0. For C and C++ code this makes no difference as
> the gimplifier never produces COND_EXPR. But it could make a difference
> for other front-ends.
> Oh and for -fno-tree-ter, duplicating the comparison is just a waste
> as it is never used for expand.
>
> I also decided to add a few testcases so this is checked in the future.
> Even added one for the duplication itself.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> PR tree-optimization/116101
>
> gcc/ChangeLog:
>
> * gimple-isel.cc (maybe_duplicate_comparison): Don't
> do anything for -O0 or -fno-tree-ter.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/dup_compare_cond-1.c: New test.
> * gcc.dg/tree-ssa/dup_compare_cond-2.c: New test.
> * gcc.dg/tree-ssa/dup_compare_cond-3.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-isel.cc|  5 +
>  .../gcc.dg/tree-ssa/dup_compare_cond-1.c  | 19 +++
>  .../gcc.dg/tree-ssa/dup_compare_cond-2.c  | 19 +++
>  .../gcc.dg/tree-ssa/dup_compare_cond-3.c  | 19 +++
>  4 files changed, 62 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c
>
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 99bfc937bd5..2817ab659af 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -407,6 +407,11 @@ maybe_duplicate_comparison (gassign *stmt, basic_block 
> bb)
>tree lhs = gimple_assign_lhs (stmt);
>unsigned cnt = 0;
>
> +  /* This is should not be used for -O0 nor it is not useful
> + when ter is turned off. */
> +  if (!optimize || !flag_tree_ter)
> +return;
> +
>FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
>  {
>if (is_gimple_debug (USE_STMT (use_p)))
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c
> new file mode 100644
> index 000..0321a60b34f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -O0 -fdump-tree-optimized " } */
> +/* PR tree-optimization/116101 */
> +
> +int __GIMPLE() f(int a, int b, int c, int d, int e)
> +{
> +  _Bool t;
> +  int ff;
> +  int gg;
> +  int res;
> +  t = a == b;
> +  ff = t ? a : e;
> +  gg = t ? d : b;
> +  res = ff+gg;
> +  return res;
> +}
> +
> +/* At -O0 we should not duplicate the comparison. */
> +/* { dg-final { scan-tree-dump-times " == " 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c
> new file mode 100644
> index 000..07e2175c612
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -O2 -fdump-tree-optimized " } */
> +/* PR middle-end/105715 */
> +
> +int __GIMPLE() f(int a, int b, int c, int d, int e)
> +{
> +  _Bool t;
> +  int ff;
> +  int gg;
> +  int res;
> +  t = a == b;
> +  ff = t ? a : e;
> +  gg = t ? d : b;
> +  res = ff+gg;
> +  return res;
> +}
> +
> +/* At -O2 we should have duplicate the comparison. */
> +/* { dg-final { scan-tree-dump-times " == " 2 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c
> new file mode 100644
> index 000..88bf19795e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/dup_compare_cond-3.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -O2 -fno-tree-ter -fdump-tree-optimized " } */
> +/* PR tree-optimization/116101 */
> +
> +int __GIMPLE() f(int a, int b, int c, int d, int e)
> +{
> +  _Bool t;
> +  int ff;
> +  int gg;
> +  int res;
> +  t = a == b;
> +  ff = t ? a : e;
> +  gg = t ? d : b;
> +  res = ff+gg;
> +  return res;
> +}
> +
> +/* With -fno-tree-ter it is not useful to duplicate the comparison. */
> +/* { dg-final { scan-tree-dump-times " == " 1 "optimized" } } */
> --
> 2.43.0
>


Re: [PATCH 1/3] isel: Move duplicate comparisons to its own function

2024-07-25 Thread Richard Biener
On Fri, Jul 26, 2024 at 6:37 AM Andrew Pinski  wrote:
>
> This is just a small cleanup to isel and no functional changes just.
> The loop inside pass_gimple_isel::execute looked was getting too
> deap so let's fix that by moving it to its own function.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> gcc/ChangeLog:
>
> * gimple-isel.cc (pass_gimple_isel::execute): Factor out
> duplicate comparisons out to ...
> (duplicate_comparison): New function.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-isel.cc | 66 --
>  1 file changed, 35 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 57f7281bb50..327a78ea408 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -395,6 +395,40 @@ gimple_expand_vec_cond_expr (struct function *fun, 
> gimple_stmt_iterator *gsi,
>  5, op0a, op0b, op1, op2, tcode_tree);
>  }
>
> +/* Duplicate COND_EXPR condition defs of STMT located in BB when they are
> +   comparisons so RTL expansion with the help of TER
> +   can perform better if conversion.  */
> +static void
> +duplicate_comparison (gassign *stmt, basic_block bb)
> +{
> +  imm_use_iterator imm_iter;
> +  use_operand_p use_p;
> +  auto_vec cond_exprs;
> +  unsigned cnt = 0;
> +  tree lhs = gimple_assign_lhs (stmt);
> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
> +{
> +  if (is_gimple_debug (USE_STMT (use_p)))
> +   continue;
> +  cnt++;
> +  if (gimple_bb (USE_STMT (use_p)) == bb
> + && is_gimple_assign (USE_STMT (use_p))
> + && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use
> + && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR)
> +   cond_exprs.safe_push (as_a  (USE_STMT (use_p)));
> +  }
> +  for (unsigned i = cond_exprs.length () == cnt ? 1 : 0;
> +   i < cond_exprs.length (); ++i)
> +{
> +  gassign *copy = as_a  (gimple_copy (stmt));
> +  tree new_def = duplicate_ssa_name (lhs, copy);
> +  gimple_assign_set_lhs (copy, new_def);
> +  auto gsi2 = gsi_for_stmt (cond_exprs[i]);
> +  gsi_insert_before (&gsi2, copy, GSI_SAME_STMT);
> +  gimple_assign_set_rhs1 (cond_exprs[i], new_def);
> +  update_stmt (cond_exprs[i]);
> +}
> +}
>
>
>  namespace {
> @@ -469,37 +503,7 @@ pass_gimple_isel::execute (struct function *fun)
>   tree lhs = gimple_assign_lhs (stmt);
>   if (TREE_CODE_CLASS (code) == tcc_comparison
>   && !has_single_use (lhs))
> -   {
> - /* Duplicate COND_EXPR condition defs when they are
> -comparisons so RTL expansion with the help of TER
> -can perform better if conversion.  */
> - imm_use_iterator imm_iter;
> - use_operand_p use_p;
> - auto_vec cond_exprs;
> - unsigned cnt = 0;
> - FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
> -   {
> - if (is_gimple_debug (USE_STMT (use_p)))
> -   continue;
> - cnt++;
> - if (gimple_bb (USE_STMT (use_p)) == bb
> - && is_gimple_assign (USE_STMT (use_p))
> - && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == 
> use_p->use
> - && gimple_assign_rhs_code (USE_STMT (use_p)) == 
> COND_EXPR)
> -   cond_exprs.safe_push (as_a  (USE_STMT 
> (use_p)));
> -   }
> - for (unsigned i = cond_exprs.length () == cnt ? 1 : 0;
> -  i < cond_exprs.length (); ++i)
> -   {
> - gassign *copy = as_a  (gimple_copy (stmt));
> - tree new_def = duplicate_ssa_name (lhs, copy);
> - gimple_assign_set_lhs (copy, new_def);
> - auto gsi2 = gsi_for_stmt (cond_exprs[i]);
> - gsi_insert_before (&gsi2, copy, GSI_SAME_STMT);
> - gimple_assign_set_rhs1 (cond_exprs[i], new_def);
> - update_stmt (cond_exprs[i]);
> -   }
> -   }
> +   duplicate_comparison (stmt, bb);
> }
>  }
>
> --
> 2.43.0
>


Re: [PATCH 2/3] isel: Small cleanup of duplicating comparisons

2024-07-25 Thread Richard Biener
On Fri, Jul 26, 2024 at 6:37 AM Andrew Pinski  wrote:
>
> This is a small cleanup of the duplicating comparison code.
> There is code generation difference but only for -O0 and -fno-tree-ter
> (both of which will be fixed in a later patch).
> The difference is instead of skipping the first use if the
> comparison uses are only in cond_expr we skip the last use.
> Also we go through the uses list in the opposite order now too.
>
> The cleanups are the following:
> * Don't call has_single_use as we will do the loop anyways
> * Change the order of the checks slightly, it is better
>   to check for cond_expr earlier
> * Use cond_exprs as a stack and pop from it.
>   Skipping the top if the use is only from cond_expr.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> gcc/ChangeLog:
>
> * gimple-isel.cc (duplicate_comparison): Rename to ...
> (maybe_duplicate_comparison): This. Add check for use here
> rather than in its caller.
> (pass_gimple_isel::execute): Don't check how many uses the
> comparison had and call maybe_duplicate_comparison instead of
> duplicate_comparison.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-isel.cc | 38 --
>  1 file changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 327a78ea408..99bfc937bd5 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -399,34 +399,46 @@ gimple_expand_vec_cond_expr (struct function *fun, 
> gimple_stmt_iterator *gsi,
> comparisons so RTL expansion with the help of TER
> can perform better if conversion.  */
>  static void
> -duplicate_comparison (gassign *stmt, basic_block bb)
> +maybe_duplicate_comparison (gassign *stmt, basic_block bb)
>  {
>imm_use_iterator imm_iter;
>use_operand_p use_p;
>auto_vec cond_exprs;
> -  unsigned cnt = 0;
>tree lhs = gimple_assign_lhs (stmt);
> +  unsigned cnt = 0;
> +
>FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
>  {
>if (is_gimple_debug (USE_STMT (use_p)))
> continue;
>cnt++;
> +  /* Add the use statement if it was a cond_expr.  */
>if (gimple_bb (USE_STMT (use_p)) == bb
>   && is_gimple_assign (USE_STMT (use_p))
> - && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use
> - && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR)
> + && gimple_assign_rhs_code (USE_STMT (use_p)) == COND_EXPR
> + && gimple_assign_rhs1_ptr (USE_STMT (use_p)) == use_p->use)
> cond_exprs.safe_push (as_a  (USE_STMT (use_p)));
> -  }
> -  for (unsigned i = cond_exprs.length () == cnt ? 1 : 0;
> -   i < cond_exprs.length (); ++i)
> +}
> +
> +  /* If the comparison has 0 or 1 uses, no reason to do anything. */
> +  if (cnt <= 1)
> +return;
> +
> +  /* If we only use the expression inside cond_exprs in that BB, we don't
> + need to duplicate for one of them so pop the top. */
> +  if (cond_exprs.length () == cnt)
> +cond_exprs.pop();
> +
> +  while (!cond_exprs.is_empty())
>  {
> +  auto old_top = cond_exprs.pop();
>gassign *copy = as_a  (gimple_copy (stmt));
>tree new_def = duplicate_ssa_name (lhs, copy);
>gimple_assign_set_lhs (copy, new_def);
> -  auto gsi2 = gsi_for_stmt (cond_exprs[i]);
> +  auto gsi2 = gsi_for_stmt (old_top);
>gsi_insert_before (&gsi2, copy, GSI_SAME_STMT);
> -  gimple_assign_set_rhs1 (cond_exprs[i], new_def);
> -  update_stmt (cond_exprs[i]);
> +  gimple_assign_set_rhs1 (old_top, new_def);
> +  update_stmt (old_top);
>  }
>  }
>
> @@ -500,10 +512,8 @@ pass_gimple_isel::execute (struct function *fun)
> continue;
>
>   tree_code code = gimple_assign_rhs_code (stmt);
> - tree lhs = gimple_assign_lhs (stmt);
> - if (TREE_CODE_CLASS (code) == tcc_comparison
> - && !has_single_use (lhs))
> -   duplicate_comparison (stmt, bb);
> + if (TREE_CODE_CLASS (code) == tcc_comparison)
> +   maybe_duplicate_comparison (stmt, bb);
> }
>  }
>
> --
> 2.43.0
>


[PATCH] MIPS: Add some floating point instructions support for MIPSr6

2024-07-25 Thread Jie Mei
This patch adds some floating point instructiions from mips32r6,
for instance, MINA/MAXA.fmt, RINT.fmt, CLASS.fmt etc.

Also add built-in functions to MIPSr6 to better handle tests
for MIPSr6.

gcc/ChangeLog:

* config/mips/i6400.md (i6400_fpu_minmax): Include
fclass type.
(i6400_fpu_fadd): Include frint type.
* config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry
for __builtin_mipsr6_xxx.
(MIPSR6_BUILTIN_PURE): Same as above.
(CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d)
(CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d)
(CODE_FOR_mipsr6_rint_s, CODE_FOR_mipsr6_rint_d)
(CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d):
New code_aliasing macros.
(mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s,
max_a_d, rint_s, rint_d, class_s, class_d builtins.
* config/mips/mips.h (ISA_HAS_FRINT): Define a new macro.
(ISA_HAS_FCLASS): Same as above.
* config/mips/mips.md (UNSPEC_FRINT): New unspec.
(UNSPEC_FCLASS): Same as above.
(type): Add frint and fclass.
(fmin_a_): Generates MINA.fmt instructions.
(fmax_a_): Generates MAXA.fmt instructions.
(frint_): Generates RINT.fmt instructions.
(fclass_): Generates CLASS.fmt instructions.
* config/mips/p6600.md (p6600_fpu_fadd): Include
frint type.
(p6600_fpu_fabs): Incled fclass type.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips-class.c: New tests for MIPSr6
* gcc.target/mips/mips-minamaxa.c: Same as above.
* gcc.target/mips/mips-rint.c: Same as above.
---
 gcc/config/mips/i6400.md  |  8 +--
 gcc/config/mips/mips.cc   | 28 ++
 gcc/config/mips/mips.h|  4 ++
 gcc/config/mips/mips.md   | 52 ++-
 gcc/config/mips/p6600.md  |  8 +--
 gcc/testsuite/gcc.target/mips/mips-class.c| 17 ++
 gcc/testsuite/gcc.target/mips/mips-minamaxa.c | 31 +++
 gcc/testsuite/gcc.target/mips/mips-rint.c | 17 ++
 8 files changed, 155 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-class.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-minamaxa.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-rint.c

diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md
index d6f691ee217..48ce980e1c2 100644
--- a/gcc/config/mips/i6400.md
+++ b/gcc/config/mips/i6400.md
@@ -219,16 +219,16 @@
(eq_attr "type" "fabs,fneg,fmove"))
   "i6400_fpu_short, i6400_fpu_apu")
 
-;; min, max
+;; min, max, fclass
 (define_insn_reservation "i6400_fpu_minmax" 2
   (and (eq_attr "cpu" "i6400")
-   (eq_attr "type" "fminmax"))
+   (eq_attr "type" "fminmax,fclass"))
   "i6400_fpu_short+i6400_fpu_logic")
 
-;; fadd, fsub, fcvt
+;; fadd, fsub, fcvt, frint
 (define_insn_reservation "i6400_fpu_fadd" 4
   (and (eq_attr "cpu" "i6400")
-   (eq_attr "type" "fadd,fcvt"))
+   (eq_attr "type" "fadd,fcvt,frint"))
   "i6400_fpu_long, i6400_fpu_apu")
 
 ;; fmul
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 6c797b62164..14a1f23eb70 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -15775,6 +15775,7 @@ AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && 
TARGET_DSPR2)
 AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI)
 AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
 AVAIL_NON_MIPS16 (msa, TARGET_MSA)
+AVAIL_NON_MIPS16 (r6, mips_isa_rev >= 6)
 
 /* Construct a mips_builtin_description from the given arguments.
 
@@ -15940,6 +15941,14 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 "__builtin_msa_" #INSN,  MIPS_BUILTIN_DIRECT_NO_TARGET,\
 FUNCTION_TYPE, mips_builtin_avail_msa, false }
 
+/* Define a MIPSr6 MIPS_BUILTIN_DIRECT pure function __builtin_mipsr6_
+   for instruction CODE_FOR_mipsr6_.  FUNCTION_TYPE is a 
builtin_description
+   field.  */
+#define MIPSR6_BUILTIN_PURE(INSN, FUNCTION_TYPE)   \
+{ CODE_FOR_mipsr6_ ## INSN, MIPS_FP_COND_f,
\
+"__builtin_mipsr6_" #INSN,  MIPS_BUILTIN_DIRECT,   \
+FUNCTION_TYPE, mips_builtin_avail_r6, true }
+
 #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
@@ -16177,6 +16186,15 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 #define CODE_FOR_msa_ldi_w CODE_FOR_msa_ldiv4si
 #define CODE_FOR_msa_ldi_d CODE_FOR_msa_ldiv2di
 
+#define CODE_FOR_mipsr6_min_a_s CODE_FOR_fmin_a_sf
+#define CODE_FOR_mipsr6_min_a_d CODE_FOR_fmin_a_df
+#define CODE_FOR_mipsr6_max_a_s CODE_FOR_fmax_a_sf
+#define CODE_FOR_mipsr6_max_a_d CODE_FOR_fmax_a_df
+#define CODE_FOR_mipsr6_rint_s CODE_FOR_frint_sf
+#define CODE_FOR_mipsr6_rint_d CODE_FOR_frint_df
+#define CODE_FOR_mipsr6_class_s CODE_FOR_fclass_sf
+#define CODE_FOR_mipsr6_class_d CODE_FOR_fclass_df
+

[PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-07-25 Thread liuhongt
(insn 98 94 387 2 (parallel [
(set (reg:TI 337 [ _32 ])
(ashift:TI (reg:TI 329)
(reg:QI 521)))
(clobber (reg:CC 17 flags))
]) "test.c":11:13 953 {ashlti3_doubleword}

is reloaded into

(insn 98 452 387 2 (parallel [
(set (reg:TI 0 ax [orig:337 _32 ] [337])
(ashift:TI (const_int 1671291085 [0x639de0cd])
(reg:QI 2 cx [521])))
(clobber (reg:CC 17 flags))

since constraint n in the pattern accepts that.
(Not sure why reload doesn't check predicate)

(define_insn "ashl3_doubleword"
  [(set (match_operand:DWI 0 "register_operand" "=&r,&r")
(ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
(match_operand:QI 2 "nonmemory_operand" "c,c")))

The patch fixes the mismatch between constraint and predicate.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/116096
* config/i386/constraints.md (BC): Move TARGET_SSE to
vector_all_ones_operand.
* config/i386/i386.md (ashl3_doubleword): Refine
constraint with BC.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr116096.c: New test.
---
 gcc/config/i386/constraints.md   |  4 ++--
 gcc/config/i386/i386.md  |  2 +-
 gcc/testsuite/gcc.target/i386/pr116096.c | 26 
 3 files changed, 29 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116096.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 7508d7a58bd..fd032c2b9f0 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -225,8 +225,8 @@ (define_constraint "Bz"
 
 (define_constraint "BC"
   "@internal integer SSE constant with all bits set operand."
-  (and (match_test "TARGET_SSE")
-   (ior (match_test "op == constm1_rtx")
+  (ior (match_test "op == constm1_rtx")
+   (and (match_test "TARGET_SSE")
(match_operand 0 "vector_all_ones_operand"
 
 (define_constraint "BF"
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6207036a2a0..9c4e847fba1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14774,7 +14774,7 @@ (define_insn_and_split "*ashl3_doubleword_mask_1"
 
 (define_insn "ashl3_doubleword"
   [(set (match_operand:DWI 0 "register_operand" "=&r,&r")
-   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
+   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0BC,r")
(match_operand:QI 2 "nonmemory_operand" "c,c")))
(clobber (reg:CC FLAGS_REG))]
   ""
diff --git a/gcc/testsuite/gcc.target/i386/pr116096.c 
b/gcc/testsuite/gcc.target/i386/pr116096.c
new file mode 100644
index 000..5ef39805f58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116096.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -flive-range-shrinkage -fno-peephole2 -mstackrealign 
-Wno-psabi" } */
+
+typedef char U __attribute__((vector_size (32)));
+typedef unsigned V __attribute__((vector_size (32)));
+typedef __int128 W __attribute__((vector_size (32)));
+U g;
+
+W baz ();
+
+static inline U
+bar (V x, W y)
+{
+  y = y | y << (W) x;
+  return (U)y;
+}
+
+void
+foo (W w)
+{
+  g = g <<
+bar ((V){baz ()[1], 3, 3, 5, 7},
+(W){w[0], ~(int) 2623676210}) >>
+bar ((V){baz ()[1]},
+(W){-w[0], ~(int) 2623676210});
+}
-- 
2.31.1