Re: [PATCH] c++/modules: Fix imported CNTTPs being considered non-constant [PR119938]

2025-04-30 Thread Jason Merrill

On 4/25/25 8:56 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?


OK.


-- >8 --

When importing a CNTTP object, since r15-3031-g0b7904e274fbd6 we
shortcut the processing of the generated NTTP so that we don't attempt
to recursively load pendings.  However, due to an oversight we do not
properly set TREE_CONSTANT or DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P
on the decl, which confuses later processing.  This patch ensures that
this happens correctly.

PR c++/119938

gcc/cp/ChangeLog:

* pt.cc (get_template_parm_object): When !check_init, add assert
that expr really is constant and mark decl as such.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-nttp-2_a.H: New test.
* g++.dg/modules/tpl-nttp-2_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc|  7 ++-
  gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H | 14 ++
  gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C | 10 ++
  3 files changed, 30 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a71705fd085..75d34532426 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7492,8 +7492,13 @@ get_template_parm_object (tree expr, tree name, bool 
check_init/*=true*/)
  {
/* The EXPR is the already processed initializer, set it on the NTTP
 object now so that cp_finish_decl doesn't do it again later.  */
+  gcc_checking_assert (reduced_constant_expression_p (expr));
DECL_INITIAL (decl) = expr;
-  DECL_INITIALIZED_P (decl) = 1;
+  DECL_INITIALIZED_P (decl) = true;
+  DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = true;
+  /* FIXME setting TREE_CONSTANT on refs breaks the back end.  */
+  if (!TYPE_REF_P (type))
+   TREE_CONSTANT (decl) = true;
  }
  
pushdecl_top_level_and_finish (decl, expr);

diff --git a/gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H 
b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H
new file mode 100644
index 000..bfae11cd185
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H
@@ -0,0 +1,14 @@
+// PR c++/119938
+// { dg-additional-options "-fmodules -std=c++20" }
+// { dg-module-cmi {} }
+
+struct A { int x; };
+
+template  struct B { static_assert(a.x == 1); };
+using C = B;
+
+template  void D() { static_assert(a.x == 2); };
+inline void E() { D(); }
+
+template  struct F { static constexpr int result = a.x; };
+template  constexpr int G() { return F::result; };
diff --git a/gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C 
b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C
new file mode 100644
index 000..7e661cbdef0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C
@@ -0,0 +1,10 @@
+// PR c++/119938
+// { dg-additional-options "-fmodules -std=c++20" }
+
+import "tpl-nttp-2_a.H";
+
+int main() {
+  C c;
+  E();
+  static_assert(G() == 3);
+}




Re: [PATCH v2] c++: Fix OpenMP support with C++20 modules [PR119864]

2025-04-30 Thread Jason Merrill

On 4/28/25 7:06 AM, Nathaniel Shead wrote:

On Thu, Apr 24, 2025 at 12:06:52PM -0400, Jason Merrill wrote:

On 4/22/25 4:48 PM, Jason Merrill wrote:

On 4/22/25 1:21 PM, Tobias Burnus wrote:

Jason Merrill wrote:

On 4/22/25 11:04 AM, Tobias Burnus wrote:

The question is why does this code trigger at all, given
that there is OpenMP but no offload code at all? And how
to fix it in case there is offload code and modules are used.


This seems to be because of:


   if (module_global_init_needed ())
     {
   // Make sure there's a default priority entry.
if (! static_init_fini_fns[true])
     static_init_fini_fns[true] = priority_map_t::create_ggc ();
   if (static_init_fini_fns[true]->get_or_insert
(DEFAULT_INIT_PRIORITY))
     has_module_inits = true;

   if (flag_openmp)
     {
   if (!static_init_fini_fns[2 + true])
     static_init_fini_fns[2 + true] =
priority_map_t::create_ggc ();
   static_init_fini_fns[2 + true]->get_or_insert
(DEFAULT_INIT_PRIORITY);
     }
     }


Here we're forcing a target module init function as well as
host. If we remove the flag_openmp block, Nathaniel's patch is
unnecessary (but may still be desirable).


I currently do not see whether the code is needed in this case or
not, but I assume it is, if we want to support static
initializers?!?


I don't think so.  For the host, we force create a map with a single
entry because we always want to emit a module init function.  The openmp
block is saying we also always want a target init function in a module,
even if it's empty, which I don't think is correct.  Or if it is, we
need to specify how to mangle it and agree that that's part of the
module ABI.


So, for now in addition to Nathaniel's patch I'd remove this flag_openmp
block so we don't get an unneeded empty function, and maybe add something
back later after more discussion.

Jason



So something like this, perhaps?

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?


OK next week if no other comments.


-- >8 --

In r15-2799-gf1bfba3a9b3f31, a new kind of global constructor was added.
Unfortunately this broke C++20 modules, as both the host and target
constructors were given the same mangled name.  This patch ensures that
only the host constructor gets the module name mangling for now, and
stops forcing the creation of the target constructor even when no such
initialization is required.

PR c++/119864

gcc/cp/ChangeLog:

* decl2.cc (start_objects): Only use module initialized for
host.
(c_parse_final_cleanups): Don't always create an OMP offload
init function in modules.

gcc/testsuite/ChangeLog:

* g++.dg/modules/openmp-1.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl2.cc | 14 +++---
  gcc/testsuite/g++.dg/modules/openmp-1.C |  9 +
  2 files changed, 16 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/openmp-1.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 21156f1dd3b..d29d93af275 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -4184,7 +4184,11 @@ start_objects (bool initp, unsigned priority, bool 
has_body,
   bool omp_target = false)
  {
bool default_init = initp && priority == DEFAULT_INIT_PRIORITY;
-  bool is_module_init = default_init && module_global_init_needed ();
+  /* FIXME: We may eventually want to treat OpenMP offload initialisers
+ in modules specially as well.  */
+  bool is_module_init = (default_init
+&& !omp_target
+&& module_global_init_needed ());
tree name = NULL_TREE;
  
if (is_module_init)

@@ -5876,12 +5880,8 @@ c_parse_final_cleanups (void)
if (static_init_fini_fns[true]->get_or_insert (DEFAULT_INIT_PRIORITY))
has_module_inits = true;
  
-  if (flag_openmp)

-   {
- if (!static_init_fini_fns[2 + true])
-   static_init_fini_fns[2 + true] = priority_map_t::create_ggc ();
- static_init_fini_fns[2 + true]->get_or_insert (DEFAULT_INIT_PRIORITY);
-   }
+  /* FIXME: We need to work out what static constructors on OpenMP offload
+target in modules will look like.  */
  }
  
/* Generate initialization and destruction functions for all

diff --git a/gcc/testsuite/g++.dg/modules/openmp-1.C 
b/gcc/testsuite/g++.dg/modules/openmp-1.C
new file mode 100644
index 000..b5a30ad8c91
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/openmp-1.C
@@ -0,0 +1,9 @@
+// PR c++/119864
+// { dg-do assemble }
+// { dg-additional-options "-fmodules -fopenmp" }
+// { dg-require-effective-target "fopenmp" }
+
+export module M;
+
+int foo();
+int x = foo();




Re: [PATCH] c++/modules: Ensure deduction guides for imported types are reachable [PR120023]

2025-04-30 Thread Jason Merrill

On 4/30/25 9:40 AM, Nathaniel Shead wrote:

Tested on x86_64-pc-linux-gnu (so far just modules.exp), OK for trunk
and maybe 15 if full regtest+bootstrap succeeds?


OK.


-- >8 --

In the linked PR, because the deduction guides depend on an imported
type, we never walk the type and so never call add_deduction_guides.
This patch ensures that we make bindings for deduction guides if we saw
any deduction guide at all.

PR c++/120023

gcc/cp/ChangeLog:

* module.cc (depset::hash::find_dependencies): Also call
add_deduction_guides when walking one.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dguide-7_a.C: New test.
* g++.dg/modules/dguide-7_b.C: New test.
* g++.dg/modules/dguide-7_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  |  7 +++
  gcc/testsuite/g++.dg/modules/dguide-7_a.C |  9 +
  gcc/testsuite/g++.dg/modules/dguide-7_b.C | 10 ++
  gcc/testsuite/g++.dg/modules/dguide-7_c.C | 12 
  4 files changed, 38 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/dguide-7_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/dguide-7_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/dguide-7_c.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index a2e0d6d2571..a58614b8459 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -14822,9 +14822,16 @@ depset::hash::find_dependencies (module_state *module)
}
  walker.end ();
  
+	  /* If we see either a class template or a deduction guide, make

+sure to add all visible deduction guides.  We need to check
+both in case they have been added in separate modules, or
+one is in the GMF and would have otherwise been discarded.  */
  if (!is_key_order ()
  && DECL_CLASS_TEMPLATE_P (decl))
add_deduction_guides (decl);
+ if (!is_key_order ()
+ && deduction_guide_p (decl))
+   add_deduction_guides (TYPE_NAME (TREE_TYPE (TREE_TYPE (decl;
  
  	  if (!is_key_order ()

  && TREE_CODE (decl) == TEMPLATE_DECL
diff --git a/gcc/testsuite/g++.dg/modules/dguide-7_a.C 
b/gcc/testsuite/g++.dg/modules/dguide-7_a.C
new file mode 100644
index 000..8d0eb808859
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dguide-7_a.C
@@ -0,0 +1,9 @@
+// PR c++/120023
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M.S }
+
+export module M.S;
+
+namespace ns {
+  export template  struct S;
+}
diff --git a/gcc/testsuite/g++.dg/modules/dguide-7_b.C 
b/gcc/testsuite/g++.dg/modules/dguide-7_b.C
new file mode 100644
index 000..85246b22dc3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dguide-7_b.C
@@ -0,0 +1,10 @@
+// PR c++/120023
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M.D }
+
+export module M.D;
+import M.S;
+
+namespace ns {
+  S(int) -> S;
+}
diff --git a/gcc/testsuite/g++.dg/modules/dguide-7_c.C 
b/gcc/testsuite/g++.dg/modules/dguide-7_c.C
new file mode 100644
index 000..9579d9d32b3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dguide-7_c.C
@@ -0,0 +1,12 @@
+// PR c++/120023
+// { dg-additional-options "-fmodules" }
+
+import M.S;
+import M.D;
+
+template <> struct ns::S { S(int) {} };
+
+int main() {
+  ns::S s(123);
+  ns::S s2 = s;
+}




Re: [PATCH RFA (fold)] c++: remove TREE_STATIC from constexpr heap vars [PR119162]

2025-04-30 Thread Jason Merrill

On 4/21/25 4:39 PM, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu, OK for trunk?


Ping.


-- 8< --

While working on PR119162 it occurred to me that it would be simpler to
detect the problem of a value referring to a heap allocation if we stopped
setting TREE_STATIC on them so they naturally are not considered to have a
constant address.  With that change we no longer need to specifically avoid
caching a value that refers to a deleted pointer.

But with this change maybe_nonzero_address is not sure whether the variable
could have address zero.  I don't understand why it returns 1 only for
variables in the current function, rather than all non-symtab decls; an auto
variable from some other function also won't have address zero.  Maybe this
made more sense when it was in tree_single_nonzero_warnv_p before r7-5868?

But assuming there is some reason for the current behavior, this patch only
changes the handling of non-symtab decls when folding_cxx_constexpr.

PR c++/119162

gcc/cp/ChangeLog:

* constexpr.cc (find_deleted_heap_var): Remove.
(cxx_eval_call_expression): Don't call it.  Don't set TREE_STATIC on
heap vars.
(cxx_eval_outermost_constant_expr): Don't mess with varpool.

gcc/ChangeLog:

* fold-const.cc (maybe_nonzero_address): Return 1 for non-symtab
vars if folding_cxx_constexpr.
---
  gcc/cp/constexpr.cc | 29 -
  gcc/fold-const.cc   | 25 -
  2 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8a11e6265f2..5b7b70f7e65 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1550,7 +1550,6 @@ static tree cxx_eval_bare_aggregate (const constexpr_ctx 
*, tree,
  static tree cxx_fold_indirect_ref (const constexpr_ctx *, location_t, tree, 
tree,
   bool * = NULL);
  static tree find_heap_var_refs (tree *, int *, void *);
-static tree find_deleted_heap_var (tree *, int *, void *);
  
  /* Attempt to evaluate T which represents a call to a builtin function.

 We assume here that all builtin functions evaluate to scalar types
@@ -2975,14 +2974,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 : heap_uninit_identifier,
 type);
  DECL_ARTIFICIAL (var) = 1;
- TREE_STATIC (var) = 1;
- // Temporarily register the artificial var in varpool,
- // so that comparisons of its address against NULL are folded
- // through nonzero_address even with
- // -fno-delete-null-pointer-checks or that comparison of
- // addresses of different heap artificial vars is folded too.
- // See PR98988 and PR99031.
- varpool_node::finalize_decl (var);
  ctx->global->heap_vars.safe_push (var);
  ctx->global->put_value (var, NULL_TREE);
  return fold_convert (ptr_type_node, build_address (var));
@@ -3454,11 +3445,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  cacheable = false;
  break;
}
- /* And don't cache a ref to a deleted heap variable (119162).  */
- if (cacheable
- && (cp_walk_tree_without_duplicates
- (&result, find_deleted_heap_var, NULL)))
-   cacheable = false;
}
  
  	/* Rewrite all occurrences of the function's RESULT_DECL with the

@@ -9025,20 +9011,6 @@ find_heap_var_refs (tree *tp, int *walk_subtrees, void 
*/*data*/)
return NULL_TREE;
  }
  
-/* Look for deleted heap variables in the expression *TP.  */

-
-static tree
-find_deleted_heap_var (tree *tp, int *walk_subtrees, void */*data*/)
-{
-  if (VAR_P (*tp)
-  && DECL_NAME (*tp) == heap_deleted_identifier)
-return *tp;
-
-  if (TYPE_P (*tp))
-*walk_subtrees = 0;
-  return NULL_TREE;
-}
-
  /* Find immediate function decls in *TP if any.  */
  
  static tree

@@ -9275,7 +9247,6 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  r = t;
  non_constant_p = true;
}
- varpool_node::get (heap_var)->remove ();
}
  }
  
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc

index c9471ea44b0..35fcf5087fb 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -9917,22 +9917,29 @@ pointer_may_wrap_p (tree base, tree offset, poly_int64 
bitpos)
  static int
  maybe_nonzero_address (tree decl)
  {
+  if (!DECL_P (decl))
+return -1;
+
/* Normally, don't do anything for variables and functions before symtab is
   built; it is quite possible that DECL will be declared weak later.
   But if folding_initializer, we need a constant answer now, so create
   the symtab entry and prevent later weak declaration.  */
-  if (DECL_P (decl) && decl_in_symtab_p (decl))
-if (s

Re: [PATCH v6 2/2] RISC-V: Add intrinsics testcases for SiFive Xsfvcp extensions.

2025-04-30 Thread Kito Cheng
pushed to trunk

On Tue, Apr 29, 2025 at 9:16 PM Kito Cheng  wrote:
>
> From: yulong 
>
> This commit adds testcases for Xsfvcp.
>
> Co-Authored by: Jiawei Chen 
> Co-Authored by: Shihua Liao 
> Co-Authored by: Yixuan Chen 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/xsfvector/sf_vc_f.c: New test.
> * gcc.target/riscv/rvv/xsfvector/sf_vc_i.c: New test.
> * gcc.target/riscv/rvv/xsfvector/sf_vc_v.c: New test.
> * gcc.target/riscv/rvv/xsfvector/sf_vc_x.c: New test.
> ---
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_f.c  |  88 +++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_i.c  | 132 +
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_v.c  | 107 ++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_x.c  | 138 ++
>  4 files changed, 465 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_v.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_x.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
> new file mode 100644
> index 000..7667e56a4c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
> @@ -0,0 +1,88 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_xsfvcp -mabi=lp64d -O3" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sifive_vector.h"
> +
> +typedef _Float16 float16_t;
> +typedef float float32_t;
> +typedef double float64_t;
> +
> +/*
> +** test_sf_vc_v_fv_u16mf4:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,mf4,ta,ma+
> +** sf\.vc\.v\.fv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +vuint16mf4_t test_sf_vc_v_fv_u16mf4(vuint16mf4_t vs2, float16_t fs1, size_t 
> vl) {
> +return __riscv_sf_vc_v_fv_u16mf4(1, vs2, fs1, vl);
> +}
> +
> +/*
> +** test_sf_vc_v_fv_se_u16mf4:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,mf4,ta,ma+
> +** sf\.vc\.v\.fv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +vuint16mf4_t test_sf_vc_v_fv_se_u16mf4(vuint16mf4_t vs2, float16_t fs1, 
> size_t vl) {
> +return __riscv_sf_vc_v_fv_se_u16mf4(1, vs2, fs1, vl);
> +}
> +
> +/*
> +** test_sf_vc_fv_se_u16mf2:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,mf2,ta,ma+
> +** sf\.vc\.fv\t[0-9]+,[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +void test_sf_vc_fv_se_u16mf2(vuint16mf2_t vs2, float16_t fs1, size_t vl) {
> +__riscv_sf_vc_fv_se_u16mf2(1, 3, vs2, fs1, vl);
> +}
> +
> +/*
> +** test_sf_vc_v_fvv_u16m1:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,m1,ta,ma+
> +** sf\.vc\.v\.fvv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +vuint16m1_t test_sf_vc_v_fvv_u16m1(vuint16m1_t vd, vuint16m1_t vs2, 
> float16_t fs1, size_t vl) {
> +return __riscv_sf_vc_v_fvv_u16m1(1, vd, vs2, fs1, vl);
> +}
> +
> +/*
> +** test_sf_vc_v_fvv_se_u16m1:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,m1,ta,ma+
> +** sf\.vc\.v\.fvv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +vuint16m1_t test_sf_vc_v_fvv_se_u16m1(vuint16m1_t vd, vuint16m1_t vs2, 
> float16_t fs1, size_t vl) {
> +return __riscv_sf_vc_v_fvv_se_u16m1(1, vd, vs2, fs1, vl);
> +}
> +
> +/*
> +** test_sf_vc_fvv_se_u32m8:
> +** ...
> +** vsetivli\s+zero+,0+,e32+,m8,ta,ma+
> +** sf\.vc\.fvv\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +void test_sf_vc_fvv_se_u32m8(vuint32m8_t vd, vuint32m8_t vs2, float32_t fs1, 
> size_t vl) {
> +__riscv_sf_vc_fvv_se_u32m8(1, vd, vs2, fs1, vl);
> +}
> +
> +
> +/*
> +** test_sf_vc_fvw_se_u32m2:
> +** ...
> +** vsetivli\s+zero+,0+,e32+,m2,ta,ma+
> +** sf\.vc\.fvw\t[0-9]+,v[0-9]+,v[0-9]+,fa[0-9]+
> +** ...
> +*/
> +void test_sf_vc_fvw_se_u32m2(vuint64m4_t vd, vuint32m2_t vs2, float32_t fs1, 
> size_t vl) {
> +__riscv_sf_vc_fvw_se_u32m2(1, vd, vs2, fs1, vl);
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
> new file mode 100644
> index 000..5528cc52ac7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
> @@ -0,0 +1,132 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_xsfvcp -mabi=lp64d -O3" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sifive_vector.h"
> +
> +
> +/*
> +** test_sf_vc_v_i_u16m4:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,m4,ta,ma+
> +** sf\.vc\.v\.i\t[0-9]+,[0-9]+,v[0-9]+,[0-9]+
> +** ...
> +*/
> +vuint16m4_t test_sf_vc_v_i_u16m4(size_t vl) {
> +return __riscv_sf_vc_v_i_u16m4(1, 2, 4, vl);
> +}
> +
> +/*
> +** test_sf_vc_v_i_se_u16m4:
> +** ...
> +** vsetivli\s+zero+,0+,e16+,m4,ta,ma+
> +** sf\.vc\.v\.i\t[0-9]+,[0-9]+,v[0-9]+,[0-9]+
> +** ...
> +*/
> +vuint16m4_t test_sf_vc_v_i_se_u16m4(size_t vl) {
> +return __riscv_sf_vc_v_i_se_u16m4(1, 2, 4, vl);
> +}
> +
> +/*
> +** test_sf_vc_i_se_u16mf4:
> +** ...
> +** 

Re: [PATCH v5 02/10] libstdc++: Add header mdspan to the build-system.

2025-04-30 Thread Luc Grosheintz




On 4/29/25 3:11 PM, Jonathan Wakely wrote:

On Tue, 29 Apr 2025 at 13:59, Luc Grosheintz  wrote:


Creates a nearly empty header mdspan and adds it to the build-system and
Doxygen config file.

libstdc++-v3/ChangeLog:

 * doc/doxygen/user.cfg.in: Add .
 * include/Makefile.am: Ditto.
 * include/Makefile.in: Ditto.
 * include/precompiled/stdc++.h: Ditto.
 * include/std/mdspan: New file.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
  libstdc++-v3/include/Makefile.am  |  1 +
  libstdc++-v3/include/Makefile.in  |  1 +
  libstdc++-v3/include/precompiled/stdc++.h |  1 +
  libstdc++-v3/include/std/mdspan   | 48 +++
  5 files changed, 52 insertions(+)
  create mode 100644 libstdc++-v3/include/std/mdspan

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 19ae67a67ba..e926c6707f6 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -880,6 +880,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc 
\
   include/list \
   include/locale \
   include/map \
+ include/mdspan \
   include/memory \
   include/memory_resource \
   include/mutex \
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 537774c2668..1140fa0dffd 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -38,6 +38,7 @@ std_freestanding = \
 ${std_srcdir}/generator \
 ${std_srcdir}/iterator \
 ${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
 ${std_srcdir}/memory \
 ${std_srcdir}/numbers \
 ${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 7b96b2207f8..c96e981acd6 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -396,6 +396,7 @@ std_freestanding = \
 ${std_srcdir}/generator \
 ${std_srcdir}/iterator \
 ${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
 ${std_srcdir}/memory \
 ${std_srcdir}/numbers \
 ${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index f4b312d9e47..e7d89c92704 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -228,6 +228,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
new file mode 100644
index 000..4094a416d1e
--- /dev/null
+++ b/libstdc++-v3/include/std/mdspan
@@ -0,0 +1,48 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2025 Free Software Foundation, Inc.


I've just noticed that this file claims to be copyright FSF, but if
you're contributing under the https://gcc.gnu.org/dco.html terms
rather than via a copyright assignment to the FSF, then that's
incorrect.

Please see the  header for the DCO-compatible way to mention
that the header is covered by copyright without being overly precise.

Otherwise these patches look good and I'll start pushing them this
week - thanks!


That's exciting to hear! I'll fix the issues and strip anything layout
related from this series.

I'm slightly nervous because the first time I used  outside of
the test harness I was greeted with an error due to not including a
header inside ; and because I was using PCH, it didn't cause an
error during testing.

I've since reconfigured with `--disable-libstdcxx-pch` and also run
with `--target_board='unix/-Wall/-Wextra/-pedantic'`.

Is there more I can do to make sure the patches are correct?





+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file mdspan
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _

Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-04-30 Thread Richard Biener
On Wed, Apr 30, 2025 at 9:03 AM Andrew Pinski  wrote:
>
> On Tue, Apr 29, 2025 at 11:49 PM Richard Biener
>  wrote:
> >
> > On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  
> > wrote:
> > >
> > > When we have an empty function, things can go wrong with
> > > cfi_startproc/cfi_endproc and a few other things like exceptions. So if
> > > the only thing the function does is a call to __builtin_unreachable,
> > > let's expand that to a __builtin_trap instead. For most targets that
> > > is one instruction wide so it won't hurt things that much and we get
> > > correct behavior for exceptions and some linkers will be better for it.
> > >
> > > The only thing I have a concern about is that some targets still
> > > don't define a trap instruction. I tried to emit a nop instead of
> > > an abort but that nop is removed during RTL DCE.
> > > Should we just push targets to define a trap instead?
> > > E.g. BPF, avr and sh are the 3 semi active targets which still don't
> > > have a trap defined.
> >
> > Do any of those targets have the cfi_startproc/cfi_endproc issue
> > or exceptions are relevant on those?
>
> Yes, the sh target is the one which can run fully Linux even. There is
> an open bug about sh not having trap pattern implemented yet;
> https://gcc.gnu.org/PR70216; been open for 9 years now too.
>
> >
> > I'd say guard this with targetm.have_trap (), there's the chance that
> > say on avr the expansion to abort() might fail to link in a
> > freestanding environment.
>
> I was thinking of that even (I even accidently left in the include for
> target.h :) )
>
> >
> > As for the nop, if you mark it volatile does it prevail?
>
> I don't even know how to mark the rtl insn as volatile.
> the volatil field for INSN is listed as being if it was deleted:
>  1 in an INSN, CALL_INSN, JUMP_INSN, CODE_LABEL, BARRIER, or NOTE
>  if it has been deleted.
> So that won't help.
>
> Now we could use the `used` field for this marking. I have not looked
> at what it could take to make sure it does not get deleted though.

I wonder if a general fallback for expanding a trap could be

label:
jmp label;

a nop in general wouldn't do (in this particular case it would, but then
not as expansion for __builtin_unreachable_trap ()).

But yeah, we should possibly force targets to implement a trap
instruction, but more thorougly document what should happen
(the program should stop [making progress]).

Richard.

> Thanks,
> Andrew
>
>
> >
> > > The QOI idea for basic block reorder is recorded as PR 120004.
> > >
> > > Changes since v1:
> > > * v2: Move to final gimple cfg cleanup instead of expand and use
> > >   BUILT_IN_UNREACHABLE_TRAP.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu.
> > >
> > > PR middle-end/109267
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If 
> > > the first
> > > non debug statement in the first (and only) basic block is a call
> > > to __builtin_unreachable change it to a call to __builtin_trap.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/pr109267-1.c: New test.
> > > * gcc.dg/pr109267-2.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
> > >  gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
> > >  gcc/tree-cfgcleanup.cc| 14 ++
> > >  3 files changed, 42 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> > > b/gcc/testsuite/gcc.dg/pr109267-1.c
> > > new file mode 100644
> > > index 000..d6df2c3b49a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +
> > > +/* PR middle-end/109267 */
> > > +
> > > +int f(void)
> > > +{
> > > +  __builtin_unreachable();
> > > +}
> > > +
> > > +/* This unreachable should be changed to be a trap. */
> > > +
> > > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 1 
> > > "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" 
> > > "optimized"} } */
> > > diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> > > b/gcc/testsuite/gcc.dg/pr109267-2.c
> > > new file mode 100644
> > > index 000..6cd1419a1e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +
> > > +/* PR middle-end/109267 */
> > > +void g(void);
> > > +int f(int *t)
> > > +{
> > > +  g();
> > > +  __builtin_unreachable();
> > > +}
> > > +
> > > +/* The unreachable should stay a unreachable. */
> > > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable trap \\\(" 
> > > "optimized"} } */
> > > +/* { dg-final 

Re: [PATCH] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-04-30 Thread Jonathan Wakely
On Wed, 30 Apr 2025 at 13:54, Tomasz Kaminski  wrote:
>
>
>
> On Wed, Apr 30, 2025 at 1:26 PM Tomasz Kamiński  wrote:
>>
>> This commits adjust the way how the arguments are stored in the _Arg_value
>> (and thus basic_format_args), by preserving the types of fixed width
>> floating-point types, that were previously converted to float, double,
>> long double.
>>
>> The _Arg_value union now contains alternatives with std::bfloat16_t,
>> std::float16_t, std::float32_t, std::float64_t that use pre-existing
>> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
>>
>> This does not affect formatting, as specialization of formatters for
>> formats them by casting to the corresponding standard floating point
>> type.
>>
>> For the 128bit floating we need to handle the ppc64 architecture,
>> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
>> basis) designate either __ibm128 and __ieee128 type, we need to store both
>> types in the _Arg_value and have two _Arg_types (_Arg_ibm128, _Arg_ieee128).
>> On other architectures we use extra enumerator value to store __float128,
>> that is different from long double and _Float128. This is consistent with 
>> ppc64,
>> for which __float128 is same type as __ieee128 if present. We use 
>> _Arg_float128
>> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize that
>> this flag is not used for std::float128_t (_Float128_t) type, that is 
>> consistenly
>> formatted via handle.
>>
>> The __format::_float128_t type is renamed to __format::__flt128_t, to 
>> mitigate
>> visual confusion between this type and __float128. We also introduce 
>> __bflt16_t
>> typedef instead of using of decltype.
>>
>> We add new alternative for the _Arg_value and allow them to be accessed via 
>> _S_get,
>> when the types are available. However, we produce and handle corresponding 
>> _Arg_type,
>> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
>>
>> The formatter<_Float128, _CharT> that formats via __flt128_t is always
>> provided, when type is available. It is still correct __flt128_t is 
>> _Float128_t.
>>
>> We also provide formatter<__float128, _CharT> that formats via __flt128_t.
>> As this type may be disabled (-mno-float128), extra care needs to be taken,
>> for situation when __float128 is same as long double. If the formatter would 
>> be
>> defined in such case, the formatter would be generated 
>> from
>> different specializations, and have different mangling:
>>   * formatter<__float128, _CharT> if __float128 is present,
>>   * formatter<_format::__formattable_float, _CharT> otherwise.
>> To best of my knowledge this happens only on ppc64 for __ieee128 and 
>> __float128,
>> so the formatter is not defined in this case. static_assert is added to 
>> detect
>> other configurations like that. In such case we should replace it with 
>> constraint.
>>
>> PR libstdc++/119246
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/format (__format::__bflt16_t): Define.
>> (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128 is 
>> used.
>> (__format::__float128_t): Renamed to __format::_flt128_t.
>> (std::formatter<_Float128, _CharT>): Define always if there is 
>> formattable
>> 128bit float.
>> (std::formatter<__float128, _CharT>): Define.
>> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
>> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
>> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
>> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
>> (_Arg_value::_M_ieee128, _Arg_value::_M_float128, 
>> _Arg_value::_M_bf16)
>> (_Arg_value::_M_f16, _Arg_value::_M_f32, _Arg_value::_M_f64): Define.
>> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle __bflt16,
>> _Float16, _Float32, _Float64, and __float128 types.
>> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
>> _Float32, _Float64 and __float128 types.
>> (basic_format_arg::_M_visit): Hadndle _Arg_float128, _Arg_ieee128,
>> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
>> * testsuite/std/format/arguments/args.cc: Updated to illustrate  that
>> extended floating point types use handles now. Added test for 
>> __float128.
>> * testsuite/std/format/parse_ctx.cc: Extended test to cover class to
>> check_dynamic_spec with floating point types and handles.
>> ---
>> Tested on x86_64-linux and powerpc64le-unknown-linux-gnu.
>> Running additional test on powerpc64le with
>> unix\{-mabi=ibmlongdouble,-mabi=ieeelongdouble,-mno-float128}.
>
> The -mabi=ibmlongdouble and -mabi=ieeelongdouble passed.
> The  -mno-float128 seem to be not handled on trunk due use of __float128 
> instead of __ieee128 here:
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/std/charconv;h=dda49ce72d0b53c7a6e86c2e3fb510d0218fd5a6;hb=HEAD#l8

[PATCH 2/3] Remove non-SLP path from vectorizable_conversion

2025-04-30 Thread Richard Biener
Prunes code from the trivial true/false conditions.

* tree-vect-stmts.cc (vectorizable_conversion):
---
 gcc/tree-vect-stmts.cc | 63 +-
 1 file changed, 13 insertions(+), 50 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cf986d030a1..21832d3e460 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5706,12 +5706,7 @@ vectorizable_conversion (vec_info *vinfo,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (1)
-ncopies = 1;
-  else if (modifier == NARROW_DST)
-ncopies = vect_get_num_copies (loop_vinfo, vectype_out);
-  else
-ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
+  ncopies = 1;
 
   /* Sanity check: make sure that at least one copy of the vectorized stmt
  needs to be generated.  */
@@ -5871,16 +5866,11 @@ vectorizable_conversion (vec_info *vinfo,
   else if (code == FLOAT_EXPR)
{
  wide_int op_min_value, op_max_value;
- if (1)
-   {
- tree def;
- /* ???  Merge ranges in case of more than one lane.  */
- if (SLP_TREE_LANES (slp_op0) != 1
- || !(def = vect_get_slp_scalar_def (slp_op0, 0))
- || !vect_get_range_info (def, &op_min_value, &op_max_value))
-   goto unsupported;
-   }
- else if (!vect_get_range_info (op0, &op_min_value, &op_max_value))
+ tree def;
+ /* ???  Merge ranges in case of more than one lane.  */
+ if (SLP_TREE_LANES (slp_op0) != 1
+ || !(def = vect_get_slp_scalar_def (slp_op0, 0))
+ || !vect_get_range_info (def, &op_min_value, &op_max_value))
goto unsupported;
 
  cvt_type
@@ -5916,9 +5906,8 @@ vectorizable_conversion (vec_info *vinfo,
 
   if (!vec_stmt)   /* transformation not required.  */
 {
-  if (1
- && (!vect_maybe_update_slp_op_vectype (slp_op0, vectype_in)
- || !vect_maybe_update_slp_op_vectype (slp_op1, vectype_in)))
+  if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype_in)
+ || !vect_maybe_update_slp_op_vectype (slp_op1, vectype_in))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5937,8 +5926,7 @@ vectorizable_conversion (vec_info *vinfo,
{
  STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
  /* The final packing step produces one vector result per copy.  */
- unsigned int nvectors
-   = (1 ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies);
+ unsigned int nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
  vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
  multi_step_cvt, cost_vec,
  widen_arith);
@@ -5950,9 +5938,7 @@ vectorizable_conversion (vec_info *vinfo,
 per copy.  MULTI_STEP_CVT is 0 for a single conversion,
 so >> MULTI_STEP_CVT divides by 2^(number of steps - 1).  */
  unsigned int nvectors
-   = (1
-  ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) >> multi_step_cvt
-  : ncopies * 2);
+   = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) >> multi_step_cvt;
  vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
  multi_step_cvt, cost_vec,
  widen_arith);
@@ -6004,18 +5990,6 @@ vectorizable_conversion (vec_info *vinfo,
? vectype_out : cvt_type);
 
   int ninputs = 1;
-  if (0)
-{
-  if (modifier == WIDEN)
-   ;
-  else if (modifier == NARROW_SRC || modifier == NARROW_DST)
-   {
- if (multi_step_cvt)
-   ninputs = vect_pow2 (multi_step_cvt);
- ninputs *= 2;
-   }
-}
-
   switch (modifier)
 {
 case NONE:
@@ -6046,10 +6020,7 @@ vectorizable_conversion (vec_info *vinfo,
  gimple_set_lhs (new_stmt, new_temp);
  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
 
- if (1)
-   slp_node->push_vec_def (new_stmt);
- else
-   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ slp_node->push_vec_def (new_stmt);
}
   break;
 
@@ -6102,10 +6073,7 @@ vectorizable_conversion (vec_info *vinfo,
  else
new_stmt = SSA_NAME_DEF_STMT (vop0);
 
- if (1)
-   slp_node->push_vec_def (new_stmt);
- else
-   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ slp_node->push_vec_def (new_stmt);
}
   break;
 
@@ -6148,16 +6116,11 @@ vectorizable_conversion (vec_info *vinfo,
  /* This is the last step of the conversion se

[PATCH 1/3] Remove non-SLP path from vectorizable_conversion

2025-04-30 Thread Richard Biener
This replaces trivially to fold conditions.

* tree-vect-stmts.cc (vectorizable_conversion):
---
 gcc/tree-vect-stmts.cc | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 38612a16619..cf986d030a1 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5706,7 +5706,7 @@ vectorizable_conversion (vec_info *vinfo,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (slp_node)
+  if (1)
 ncopies = 1;
   else if (modifier == NARROW_DST)
 ncopies = vect_get_num_copies (loop_vinfo, vectype_out);
@@ -5871,7 +5871,7 @@ vectorizable_conversion (vec_info *vinfo,
   else if (code == FLOAT_EXPR)
{
  wide_int op_min_value, op_max_value;
- if (slp_node)
+ if (1)
{
  tree def;
  /* ???  Merge ranges in case of more than one lane.  */
@@ -5916,7 +5916,7 @@ vectorizable_conversion (vec_info *vinfo,
 
   if (!vec_stmt)   /* transformation not required.  */
 {
-  if (slp_node
+  if (1
  && (!vect_maybe_update_slp_op_vectype (slp_op0, vectype_in)
  || !vect_maybe_update_slp_op_vectype (slp_op1, vectype_in)))
{
@@ -5938,7 +5938,7 @@ vectorizable_conversion (vec_info *vinfo,
  STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
  /* The final packing step produces one vector result per copy.  */
  unsigned int nvectors
-   = (slp_node ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies);
+   = (1 ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies);
  vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
  multi_step_cvt, cost_vec,
  widen_arith);
@@ -5950,7 +5950,7 @@ vectorizable_conversion (vec_info *vinfo,
 per copy.  MULTI_STEP_CVT is 0 for a single conversion,
 so >> MULTI_STEP_CVT divides by 2^(number of steps - 1).  */
  unsigned int nvectors
-   = (slp_node
+   = (1
   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) >> multi_step_cvt
   : ncopies * 2);
  vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
@@ -6004,7 +6004,7 @@ vectorizable_conversion (vec_info *vinfo,
? vectype_out : cvt_type);
 
   int ninputs = 1;
-  if (!slp_node)
+  if (0)
 {
   if (modifier == WIDEN)
;
@@ -6046,7 +6046,7 @@ vectorizable_conversion (vec_info *vinfo,
  gimple_set_lhs (new_stmt, new_temp);
  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
 
- if (slp_node)
+ if (1)
slp_node->push_vec_def (new_stmt);
  else
STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
@@ -6102,7 +6102,7 @@ vectorizable_conversion (vec_info *vinfo,
  else
new_stmt = SSA_NAME_DEF_STMT (vop0);
 
- if (slp_node)
+ if (1)
slp_node->push_vec_def (new_stmt);
  else
STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
@@ -6148,7 +6148,7 @@ vectorizable_conversion (vec_info *vinfo,
  /* This is the last step of the conversion sequence. Store the
 vectors in SLP_NODE or in vector info of the scalar statement
 (or in STMT_VINFO_RELATED_STMT chain).  */
- if (slp_node)
+ if (1)
slp_node->push_vec_def (new_stmt);
  else
STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
@@ -6156,7 +6156,7 @@ vectorizable_conversion (vec_info *vinfo,
}
   break;
 }
-  if (!slp_node)
+  if (0)
 *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
 
   vec_oprnds0.release ();
-- 
2.43.0



[PATCH 3/3] Remove non-SLP path from vectorizable_conversion

2025-04-30 Thread Richard Biener
This removes the non-SLP paths from vectorizable_conversion and
in the process eliminates uses of 'ncopies' and 'STMT_VINFO_VECTYPE'
from the function.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vectorizable_conversion): Remove non-SLP
paths.
---
 gcc/tree-vect-stmts.cc | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21832d3e460..42b6059520a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5528,7 +5528,6 @@ vectorizable_conversion (vec_info *vinfo,
   tree vec_dest, cvt_op = NULL_TREE;
   tree scalar_dest;
   tree op0, op1 = NULL_TREE;
-  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
   tree_code tc1;
   code_helper code, code1, code2;
   code_helper codecvt1 = ERROR_MARK, codecvt2 = ERROR_MARK;
@@ -5538,7 +5537,7 @@ vectorizable_conversion (vec_info *vinfo,
   poly_uint64 nunits_in;
   poly_uint64 nunits_out;
   tree vectype_out, vectype_in;
-  int ncopies, i;
+  int i;
   tree lhs_type, rhs_type;
   /* For conversions between floating point and integer, there're 2 NARROW
  cases. NARROW_SRC is for FLOAT_EXPR, means
@@ -5605,7 +5604,7 @@ vectorizable_conversion (vec_info *vinfo,
   /* Check types of lhs and rhs.  */
   scalar_dest = gimple_get_lhs (stmt);
   lhs_type = TREE_TYPE (scalar_dest);
-  vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  vectype_out = SLP_TREE_VECTYPE (slp_node);
 
   /* Check the operands of the operation.  */
   slp_tree slp_op0, slp_op1 = NULL;
@@ -5703,15 +5702,6 @@ vectorizable_conversion (vec_info *vinfo,
   modifier = WIDEN;
 }
 
-  /* Multiple types in SLP are handled by creating the appropriate number of
- vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
- case of SLP.  */
-  ncopies = 1;
-
-  /* Sanity check: make sure that at least one copy of the vectorized stmt
- needs to be generated.  */
-  gcc_assert (ncopies >= 1);
-
   bool found_mode = false;
   scalar_mode lhs_mode = SCALAR_TYPE_MODE (lhs_type);
   scalar_mode rhs_mode = SCALAR_TYPE_MODE (rhs_type);
@@ -5918,8 +5908,7 @@ vectorizable_conversion (vec_info *vinfo,
   if (modifier == NONE)
 {
  STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
- vect_model_simple_cost (vinfo, stmt_info,
- ncopies * (1 + multi_step_cvt),
+ vect_model_simple_cost (vinfo, stmt_info, (1 + multi_step_cvt),
  dt, ndts, slp_node, cost_vec);
}
   else if (modifier == NARROW_SRC || modifier == NARROW_DST)
@@ -5949,8 +5938,7 @@ vectorizable_conversion (vec_info *vinfo,
 
   /* Transform.  */
   if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "transform conversion. ncopies = %d.\n", ncopies);
+dump_printf_loc (MSG_NOTE, vect_location, "transform conversion.\n");
 
   if (op_type == binary_op)
 {
@@ -5989,11 +5977,10 @@ vectorizable_conversion (vec_info *vinfo,
widen_or_narrow_float_p
? vectype_out : cvt_type);
 
-  int ninputs = 1;
   switch (modifier)
 {
 case NONE:
-  vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
+  vect_get_vec_defs (vinfo, stmt_info, slp_node, 1,
 op0, vectype_in, &vec_oprnds0);
   /* vec_dest is intermediate type operand when multi_step_cvt.  */
   if (multi_step_cvt)
@@ -6029,7 +6016,7 @@ vectorizable_conversion (vec_info *vinfo,
 of elements that we can fit in a vectype (nunits), we have to
 generate more than one vector stmt - i.e - we need to "unroll"
 the vector stmt by a factor VF/nunits.  */
-  vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies * ninputs,
+  vect_get_vec_defs (vinfo, stmt_info, slp_node, 1,
 op0, vectype_in, &vec_oprnds0,
 code == WIDEN_LSHIFT_EXPR ? NULL_TREE : op1,
 vectype_in, &vec_oprnds1);
@@ -6083,7 +6070,7 @@ vectorizable_conversion (vec_info *vinfo,
 of elements that we can fit in a vectype (nunits), we have to
 generate more than one vector stmt - i.e - we need to "unroll"
 the vector stmt by a factor VF/nunits.  */
-  vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies * ninputs,
+  vect_get_vec_defs (vinfo, stmt_info, slp_node, 1,
 op0, vectype_in, &vec_oprnds0);
   /* Arguments are ready.  Create the new vector stmts.  */
   if (cvt_type && modifier == NARROW_DST)
-- 
2.43.0


[PATCH] c++/modules: Ensure deduction guides for imported types are reachable [PR120023]

2025-04-30 Thread Nathaniel Shead
Tested on x86_64-pc-linux-gnu (so far just modules.exp), OK for trunk
and maybe 15 if full regtest+bootstrap succeeds?

-- >8 --

In the linked PR, because the deduction guides depend on an imported
type, we never walk the type and so never call add_deduction_guides.
This patch ensures that we make bindings for deduction guides if we saw
any deduction guide at all.

PR c++/120023

gcc/cp/ChangeLog:

* module.cc (depset::hash::find_dependencies): Also call
add_deduction_guides when walking one.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dguide-7_a.C: New test.
* g++.dg/modules/dguide-7_b.C: New test.
* g++.dg/modules/dguide-7_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc  |  7 +++
 gcc/testsuite/g++.dg/modules/dguide-7_a.C |  9 +
 gcc/testsuite/g++.dg/modules/dguide-7_b.C | 10 ++
 gcc/testsuite/g++.dg/modules/dguide-7_c.C | 12 
 4 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/dguide-7_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/dguide-7_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/dguide-7_c.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index a2e0d6d2571..a58614b8459 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -14822,9 +14822,16 @@ depset::hash::find_dependencies (module_state *module)
}
  walker.end ();
 
+ /* If we see either a class template or a deduction guide, make
+sure to add all visible deduction guides.  We need to check
+both in case they have been added in separate modules, or
+one is in the GMF and would have otherwise been discarded.  */
  if (!is_key_order ()
  && DECL_CLASS_TEMPLATE_P (decl))
add_deduction_guides (decl);
+ if (!is_key_order ()
+ && deduction_guide_p (decl))
+   add_deduction_guides (TYPE_NAME (TREE_TYPE (TREE_TYPE (decl;
 
  if (!is_key_order ()
  && TREE_CODE (decl) == TEMPLATE_DECL
diff --git a/gcc/testsuite/g++.dg/modules/dguide-7_a.C 
b/gcc/testsuite/g++.dg/modules/dguide-7_a.C
new file mode 100644
index 000..8d0eb808859
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dguide-7_a.C
@@ -0,0 +1,9 @@
+// PR c++/120023
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M.S }
+
+export module M.S;
+
+namespace ns {
+  export template  struct S;
+}
diff --git a/gcc/testsuite/g++.dg/modules/dguide-7_b.C 
b/gcc/testsuite/g++.dg/modules/dguide-7_b.C
new file mode 100644
index 000..85246b22dc3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dguide-7_b.C
@@ -0,0 +1,10 @@
+// PR c++/120023
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M.D }
+
+export module M.D;
+import M.S;
+
+namespace ns {
+  S(int) -> S;
+}
diff --git a/gcc/testsuite/g++.dg/modules/dguide-7_c.C 
b/gcc/testsuite/g++.dg/modules/dguide-7_c.C
new file mode 100644
index 000..9579d9d32b3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dguide-7_c.C
@@ -0,0 +1,12 @@
+// PR c++/120023
+// { dg-additional-options "-fmodules" }
+
+import M.S;
+import M.D;
+
+template <> struct ns::S { S(int) {} };
+
+int main() {
+  ns::S s(123);
+  ns::S s2 = s;
+}
-- 
2.47.0



Re: [RFC 0/3] Use automatic make dependencies in aarch64

2025-04-30 Thread Alice Carlotti
On Wed, Apr 30, 2025 at 01:29:25PM +0100, Richard Sandiford wrote:
> Alice Carlotti  writes:
> > On Tue, Apr 29, 2025 at 02:47:21PM +0100, Alice Carlotti wrote:
> >> This demonstrates a clear benefit to make the makefile rules automatic. I
> >> thought this might be quite tricky, but it turns out to be fairly
> >> straightforward.
> >
> > Actually, it turns out I missed at least one more thing that's needed, so 
> > the
> > first two patches combined don't even build cleanly.  The issue is that
> > dependencies on generated files need some mechanism to ensure that the
> > generated files are available before their dependants are built during a 
> > clean
> > build.  This means that I can't just delete the dependency
> >
> > aarch64-builtins.o: aarch64-builtin-iterators.h
> >
> >
> > Many other generated files are currently specified as prerequisites via the
> > rule:
> > $(ALL_HOST_OBJS) : | $(generated_files)
> >
> > I think it would make sense to include the backend generated files into this
> > variable.  Currently some backend files are included, but this is done using
> > the variables TM_H, TM_P_H, TM_D_H and TM_RUST_H variables, which looks 
> > like a
> > misuse of these variables.
> >
> > The intended meaning/use of the TM_* variables is also unclear.  As far as I
> > can tell, it looks like they should list the dependencies of the 
> > corresponding
> > files generated by mkconfig, of which the direct includes are added
> > automatically, but this isn't quite consistent with the current values in
> > t-aarch64.
> 
> Which files are you thinking of when you say that those macros are being
> misused, and that the current values in t-aarch64 aren't consistent with
> the intended usage?  It looks from a quick glance at:
> 
> TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
>   $(srcdir)/config/aarch64/aarch64-tuning-flags.def \
>   $(srcdir)/config/aarch64/aarch64-option-extensions.def \
>   $(srcdir)/config/aarch64/aarch64-cores.def \
>   $(srcdir)/config/aarch64/aarch64-isa-modes.def \
>   $(srcdir)/config/aarch64/aarch64-arches.def
> 
> that aarch64-fusion-pairs.def and aarch64-tuning-flags.def could be put
> in TM_P_H instead of TM_H, since they are included via aarch64-protos.h
> but not (apparently) via aarch64.h.  But the others look correct.

I think config/arm/aarch-common.h is missing from TM_P_H and bbitmap.h is
missing from both.  The latter omission was introduced by me last year, but the
former has been missing since 2021.

(I haven't actually dumped the full value of TM_H to check, so I might have
missed some mechanism in the maze of macro definitions.  Similarly I think
input.h is covered via some chain of macros, but I haven't verified this.)

I think it's possible to do some programmatic verification of these macros; I
might try that at some point.

> > Another related observation is that aarch64-builtin-iterators.h is missing 
> > from
> > MOSTLYCLEANFILES, so it isn't removed in a clean build.  It ought to be
> > included, and I think it would be good if we could use the same list (or 
> > mostly
> > the same list) of generated files for both the order-only prerequiste rule 
> > and
> > in MOSTLYCLEANFILES.
> 
> I agree that it would be good for the common subset to be specified once
> rather than twice.  But generated_files includes things generated by
> configure, which shouldn't be removed by "make mostlyclean".

I couldn't spot any at a glance - do you have a specific example?  And why
would these need an explicit order dependency?  I might be missing some detail
of the configure/build flow here.

> And MOSTLYCLEANFILES includes .cc files, which shouldn't be
> included in the ordering dependency.  And I think some targets
> use tools to generate installed headers, which also shouldn't be
> included in the ordering dependency.
> 
> So I suppose this amounts to a macro that means "generated header files
> that are included by host code".  I'm not going to suggest a name for that :)
> 
> Richard


Re: [PATCH] RISC-V: Minimal support for ssnpm, smnpm and smmpm extensions.

2025-04-30 Thread Dongyan Chen

The patch has been modified as follows:

This patch support ssnpm, smnpm, smmpm, sspm and supm extensions[1].
To enable GCC to recognize and process ssnpm, smnpm, smmpm, sspm and 
supm extensions correctly at compile time.


[1]https://github.com/riscv/riscv-j-extension/blob/master/zjpm/instructions.adoc

Changes for v3:
- Fix the error messages in gcc/testsuite/gcc.target/riscv/arch-46.c
Changes for v2:
- Add the sspm and supm extensions.
- Add the check_conflict_ext function to check the compatibility of 
ssnpm, smnpm, smmpm, sspm and supm extensions.

- Add the test cases for ssnpm, smnpm, smmpm, sspm and supm extensions.

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc
(riscv_subset_list::check_conflict_ext): New extension.
 * config/riscv/riscv.opt: Ditto.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/arch-45.c: New test.
 * gcc.target/riscv/arch-46.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc  | 36 
 gcc/config/riscv/riscv.opt   | 19 +
 gcc/testsuite/gcc.target/riscv/arch-45.c |  5 
 gcc/testsuite/gcc.target/riscv/arch-46.c | 10 +++
 4 files changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-45.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-46.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index 15df22d53770..619bf9059caf 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -258,6 +258,10 @@ static const riscv_implied_info_t 
riscv_implied_info[] =

   {"ssstateen", "zicsr"},
   {"sstc", "zicsr"},

+  {"ssnpm", "zicsr"},
+  {"smnpm", "zicsr"},
+  {"smmpm", "zicsr"},
+
   {"xsfvcp", "zve32x"},

   {NULL, NULL}
@@ -440,6 +444,12 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =

   {"ssstateen", ISA_SPEC_CLASS_NONE, 1, 0},
   {"sstc",  ISA_SPEC_CLASS_NONE, 1, 0},

+  {"ssnpm", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smnpm", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smmpm", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"sspm",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"supm",  ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1340,6 +1350,26 @@ riscv_subset_list::check_conflict_ext ()
 error_at (m_loc, "%<-march=%s%>: zcf extension supports in rv32 only",
   m_arch);

+  if (lookup ("ssnpm") && m_xlen == 32)
+    error_at (m_loc, "%<-march=%s%>: ssnpm extension supports in rv64 
only",

+  m_arch);
+
+  if (lookup ("smnpm") && m_xlen == 32)
+    error_at (m_loc, "%<-march=%s%>: smnpm extension supports in rv64 
only",

+  m_arch);
+
+  if (lookup ("smmpm") && m_xlen == 32)
+    error_at (m_loc, "%<-march=%s%>: smmpm extension supports in rv64 
only",

+  m_arch);
+
+  if (lookup ("sspm") && m_xlen == 32)
+    error_at (m_loc, "%<-march=%s%>: sspm extension supports in rv64 only",
+  m_arch);
+
+  if (lookup ("supm") && m_xlen == 32)
+    error_at (m_loc, "%<-march=%s%>: supm extension supports in rv64 only",
+  m_arch);
+
   if (lookup ("zfinx") && lookup ("f"))
 error_at (m_loc,
   "%<-march=%s%>: z*inx conflicts with floating-point "
@@ -1767,6 +1797,12 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =

   RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
   RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),

+  RISCV_EXT_FLAG_ENTRY ("ssnpm", x_riscv_ss_subext, MASK_SSNPM),
+  RISCV_EXT_FLAG_ENTRY ("smnpm", x_riscv_sm_subext, MASK_SMNPM),
+  RISCV_EXT_FLAG_ENTRY ("smmpm", x_riscv_sm_subext, MASK_SMMPM),
+  RISCV_EXT_FLAG_ENTRY ("sspm", x_riscv_ss_subext, MASK_SSPM),
+  RISCV_EXT_FLAG_ENTRY ("supm", x_riscv_su_subext, MASK_SUPM),
+
   RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),

   RISCV_EXT_FLAG_ENTRY ("xcvmac",  x_riscv_xcv_subext, MASK_XCVMAC),
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7515c8ea13dd..cec866350b64 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -472,6 +472,25 @@ Mask(SVNAPOT) Var(riscv_sv_subext)

 Mask(SVVPTC) Var(riscv_sv_subext)

+TargetVariable
+int riscv_ss_subext
+
+Mask(SSNPM) Var(riscv_ss_subext)
+
+Mask(SSPM) Var(riscv_ss_subext)
+
+TargetVariable
+int riscv_sm_subext
+
+Mask(SMNPM) Var(riscv_sm_subext)
+
+Mask(SMMPM) Var(riscv_sm_subext)
+
+TargetVariable
+int riscv_su_subext
+
+Mask(SUPM) Var(riscv_su_subext)
+
 TargetVariable
 int riscv_ztso_subext

diff --git a/gcc/testsuite/gcc.target/riscv/arch-45.c 
b/gcc/testsuite/gcc.target/riscv/arch-45.c

new file mode 100644
index ..8f95737b248f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-45.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssnpm_smnpm_smmpm_sspm_supm -mabi=lp64" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/riscv/a

[PATCH v4] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-30 Thread Jerry Zhang Jian
The Zve32x extension depends on the Zicsr extension.
Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr

gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-19.c: set the march to rv64im_zve32x
  instead of rv64gc_zve32x to avoid Zicsr implied by g. Extra m is
  added to avoid current 'V' extension requires 'M' extension

Signed-off-by: Jerry Zhang Jian 
---
 gcc/common/config/riscv/riscv-common.cc|  1 +
 gcc/testsuite/gcc.target/riscv/predef-19.c | 34 +-
 2 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 15df22d5377..145a0f2bd95 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zve64f", "f"},
   {"zve64d", "d"},
 
+  {"zve32x", "zicsr"},
   {"zve32x", "zvl32b"},
   {"zve32f", "zve32x"},
   {"zve32f", "zvl32b"},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
b/gcc/testsuite/gcc.target/riscv/predef-19.c
index 2b90702192b..ca3d57abca9 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-19.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
-misa-spec=2.2" } */
+/* { dg-options "-O2 -march=rv64im_zve32x -mabi=lp64 -mcmodel=medlow 
-misa-spec=2.2" } */
 
 int main () {
 
@@ -15,50 +15,30 @@ int main () {
 #error "__riscv_i"
 #endif
 
-#if !defined(__riscv_c)
-#error "__riscv_c"
-#endif
-
 #if defined(__riscv_e)
 #error "__riscv_e"
 #endif
 
-#if !defined(__riscv_a)
-#error "__riscv_a"
-#endif
-
 #if !defined(__riscv_m)
 #error "__riscv_m"
 #endif
 
-#if !defined(__riscv_f)
-#error "__riscv_f"
-#endif
-
-#if !defined(__riscv_d)
-#error "__riscv_d"
-#endif
-
-#if defined(__riscv_v)
-#error "__riscv_v"
+#if !defined(__riscv_zicsr)
+#error "__riscv_zicsr"
 #endif
 
-#if defined(__riscv_zvl128b)
-#error "__riscv_zvl128b"
+#if !defined(_riscv_zmmul)
+#error "__riscv_zmmul"
 #endif
 
-#if defined(__riscv_zvl64b)
-#error "__riscv_zvl64b"
+#if !defined(__riscv_zve32x)
+#error "__riscv_zve32x"
 #endif
 
 #if !defined(__riscv_zvl32b)
 #error "__riscv_zvl32b"
 #endif
 
-#if !defined(__riscv_zve32x)
-#error "__riscv_zve32x"
-#endif
-
 #if !defined(__riscv_vector)
 #error "__riscv_vector"
 #endif
-- 
2.49.0



Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-30 Thread Jennifer Schmitz


> On 29 Apr 2025, at 18:21, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
>> ptrue predicate can be replaced by neon instructions (LDR and STR),
>> thus avoiding the predicate altogether. This also enables formation of
>> LDP/STP pairs.
>> 
>> For example, the test cases
>> 
>> svfloat64_t
>> ptrue_load (float64_t *x)
>> {
>>  svbool_t pg = svptrue_b64 ();
>>  return svld1_f64 (pg, x);
>> }
>> void
>> ptrue_store (float64_t *x, svfloat64_t data)
>> {
>>  svbool_t pg = svptrue_b64 ();
>>  return svst1_f64 (pg, x, data);
>> }
>> 
>> were previously compiled to
>> (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128):
>> 
>> ptrue_load:
>>ptrue   p3.b, vl16
>>ld1dz0.d, p3/z, [x0]
>>ret
>> ptrue_store:
>>ptrue   p3.b, vl16
>>st1dz0.d, p3, [x0]
>>ret
>> 
>> Now the are compiled to:
>> 
>> ptrue_load:
>>ldr q0, [x0]
>>ret
>> ptrue_store:
>>str q0, [x0]
>>ret
>> 
>> The implementation includes the if-statement
>> if (known_eq (GET_MODE_SIZE (mode), 16)
>>&& aarch64_classify_vector_mode (mode) == VEC_SVE_DATA)
>> which checks for 128-bit VLS and excludes partial modes with a
>> mode size < 128 (e.g. VNx2QI).
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>  * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move):
>>  Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS.
>> 
>> gcc/testsuite/
>>  * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test.
>>  * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise.
>>  * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise.
>>  * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise.
>> ---
>> gcc/config/aarch64/aarch64.cc | 29 --
>> .../gcc.target/aarch64/sve/cond_arith_6.c |  3 +-
>> .../aarch64/sve/ldst_ptrue_128_to_neon.c  | 48 
>> .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 +
>> .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 +
>> .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 56 +++
>> 6 files changed, 118 insertions(+), 96 deletions(-)
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
> 
> OK, thanks.
Thanks, pushed to trunk: 83bb288faa39a0bf5ce2d62e21a090a130d8dda4
Jennifer
> 
> Richard



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-30 Thread Jonathan Wakely
On Wed, 30 Apr 2025 at 09:55, Luc Grosheintz  wrote:
> On 4/30/25 4:37 AM, Tomasz Kaminski wrote:
> > On Tue, Apr 29, 2025 at 11:52 PM Jonathan Wakely  wrote:
> >> On Tue, 29 Apr 2025 at 14:55, Tomasz Kaminski  wrote:
> >>> On Tue, Apr 29, 2025 at 2:55 PM Luc Grosheintz 
> >> wrote:
>  +  template
>  +   friend constexpr bool
>  +   operator==(const extents& __self,
>  +  const extents<_OIndexType, _OExtents...>& __other)
> >> noexcept
>  +   {
>  + if constexpr (!_S_is_compatible_extents<_OExtents...>())
>  +   return false;
> >>>
> >>> We can add here:
> >>> // N.B. if extents are compatible and static, it implies
> >> that they are equal
> >>> else if constexpr ((_Extents != dynamic_extent) && ...)
> >>> return true;
> >>
> >> Can't we have a case where _OExtents==dynamic_extent and
> >> _Extents!=dynamic_extents, which would mean they're compatible, but
> >> not equal?
> >>
> > Ah, indeed the conditions needs to be:
> > (_Extents != dynamic_extent && _Extents == _OExtents) && ...
> >
>
> The reason I didn't follow through with this is because when looking
> at the generated code I saw no advantage of the additional constexpr
> branch. Therefore, I preferred the simplicity of Tomasz first proposal,
> i.e. the current implementation.

Thanks for checking it. So the runtime 'for' loop is able to optimize
to a constant, because the .extent(i) calls are constants in the cases
that the constexpr-if branch would handle.

> Would you still like me to add the additional `else if constexpr`
> branch?

No, we can always add it later if it turns out to help (e.g. when
sizeof...(_Extents) is large, if the optimizer gives up on the 'for'
loop).


> Here's the output of objdump when compiling with `-O2`:
>
> bool same1(const std::extents& e1,
> const std::extents& e2)
> { return e1 == e2; }
> 0:  b8 01 00 00 00  mov$0x1,%eax
> 5:  c3  ret
>
> bool same2(const std::extents& e1,
> const std::extents& e2)
> { return e1 == e2; }
> 0:  31 c0   xor%eax,%eax
> 2:  c3  ret
>
> bool same3(const std::extents& e1,
> const std::extents& e2)
> { return e1 == e2; }
> 0:  31 c0   xor%eax,%eax
> 2:  c3  ret
>
> bool same4(const std::extents& e1,
> const std::extents& e2)
> { return e1 == e2; }
> 0:  31 c0   xor%eax,%eax
> 2:  c3  ret
>
> bool same5(const std::extents& e1,
> const std::extents& e2)
> { return e1 == e2; }
> 0:  31 c0   xor%eax,%eax
> 2:  c3  ret
>
>
> >>
> >>
> 
>  + else
>  +   {
>  + for (size_t __i = 0; __i < __self.rank(); ++__i)
>  +   if (!cmp_equal(__self.extent(__i), __other.extent(__i)))
>  + return false;
>  + return true;
>  +   }
>  +   }
>  +
>  +private:
>  +  using _S_storage = __mdspan::_ExtentsStorage<
>  +   _IndexType, array{_Extents...}>;
>  +  [[no_unique_address]] _S_storage _M_dynamic_extents;
>  +
>  +  template
>  +   friend class extents;
>  +};
>  +
>  +  namespace __mdspan
>  +  {
>  +template
>  +  auto __build_dextents_type(integer_sequence)
>  +   -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
>  +
>  +template
>  +  consteval size_t
>  +  __dynamic_extent() { return dynamic_extent; }
>  +  }
>  +
>  +  template
>  +using dextents =
> >> decltype(__mdspan::__build_dextents_type<_IndexType>(
>  +   make_index_sequence<_Rank>()));
>  +
>  +  template
>  +requires (is_convertible_v<_Integrals, size_t> && ...)
>  +explicit extents(_Integrals...) ->
>  +  extents()...>;
> 
>    _GLIBCXX_END_NAMESPACE_VERSION
>    }
>  diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
> >> std.cc.in
>  index 930a489ff44..0df27cd7e7d 100644
>  --- a/libstdc++-v3/src/c++23/std.cc.in
>  +++ b/libstdc++-v3/src/c++23/std.cc.in
>  @@ -1833,7 +1833,11 @@ export namespace std
>  }
>    }
> 
>  -// FIXME 
>  +// 
>  +{
>  +  using std::extents;
>  +  // FIXME layout_*, default_accessor and mdspan
>  +}
> 
>    // 20.2 
>    export namespace std
>  --
>  2.49.0
> 
> >>
> >>
> >
>



[Ada] Fix PR ada/112958

2025-04-30 Thread Eric Botcazou
This fixes the long-standing build failure of GNAT for x86/FreeBSD.

Applied on all active branches.


2025-04-30  Eric Botcazou  

PR ada/112958
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) [x86 FreeBSD]: Add specific
version of s-dorepr.adb.
* libgnat/s-dorepr__freebsd.adb: New file.

-- 
Eric Botcazoudiff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index 61600adf1f3..cb41e6887cd 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -1900,6 +1900,7 @@ ifeq ($(strip $(filter-out %86 freebsd%,$(target_cpu) $(target_os))),)
   $(TRASYM_DWARF_UNIX_PAIRS) \
   $(ATOMICS_TARGET_PAIRS) \
   $(X86_TARGET_PAIRS) \
+  s-dorepr.adbhttp://www.gnu.org/licenses/>.  --
+--  --
+-- GNAT was originally developed  by the GNAT team at  New York University. --
+-- Extensive contributions were provided by Ada Core Technologies Inc.  --
+--  --
+--
+
+--  This is the x86/FreeBSD version of the separate package body
+
+with Interfaces; use Interfaces;
+
+separate (System.Double_Real)
+
+package body Product is
+
+   procedure Split (N : Num; Hi : out Num; Lo : out Num);
+   --  Compute high part and low part of N
+
+   ---
+   -- Split --
+   ---
+
+   --  We use a bit manipulation algorithm instead of Veltkamp's splitting
+   --  because it is faster and has the property that the magnitude of the
+   --  high part is never larger than that of the input number, which will
+   --  avoid spurious overflows in the Two_Prod algorithm.
+
+   --  See the recent paper by Claude-Pierre Jeannerod, Jean-Michel Muller
+   --  and Paul Zimmermann: On various ways to split a floating-point number
+   --  ARITH 2018 - 25th IEEE Symposium on Computer Arithmetic, Jun 2018,
+   --  Amherst (MA), United States, pages 53-60.
+
+   procedure Split (N : Num; Hi : out Num; Lo : out Num) is
+  X : Num;
+
+   begin
+  --  Spill the input into the appropriate (maybe larger) bit container,
+  --  mask out the low bits and reload the modified value.
+
+  case Num'Machine_Mantissa is
+ when 24 =>
+declare
+   Rep32 : aliased Interfaces.Unsigned_32;
+   Temp  : Num := N with Address => Rep32'Address;
+   pragma Annotate (CodePeer, Modified, Rep32);
+
+begin
+   --  Mask out the low 12 bits
+
+   Rep32 := Rep32 and 16#F000#;
+
+   X := Temp;
+end;
+
+ when 53 =>
+declare
+   Rep64 : aliased array (1 .. 2) of Interfaces.Unsigned_64;
+   Temp  : Num := N with Address => Rep64'Address;
+   pragma Annotate (CodePeer, Modified, Rep64);
+
+begin
+   --  Mask out the low 27 bits
+
+   Rep64 (1) := Rep64 (1) and 16#F800#;
+
+   X := Temp;
+end;
+
+ when 64 =>
+declare
+   Rep80 : aliased array (1 .. 2) of Interfaces.Unsigned_64;
+   Temp  : Num := N with Address => Rep80'Address;
+   pragma Annotate (CodePeer, Modified, Rep80);
+
+begin
+   --  Mask out the low 32 bits
+
+   if System.Default_Bit_Order = High_Order_First then
+  Rep80 (1) := Rep80 (1) and 16##;
+  Rep80 (2) := Rep80 (2) and 16##;
+   else
+  Rep80 (1) := Rep80 (1) and 16##;
+   end if;
+
+   X := Temp;
+end;
+
+ when others =>
+raise Program_Error;
+  end case;
+
+  --  Deal with denormalized numbers
+
+  if X = 0.0 then
+ Hi := N;
+ Lo := 0.0;
+  else
+ Hi := X;
+ Lo := N - X;
+  end if;
+   end Split;
+
+   --
+   -- Two_Prod --
+   --
+
+   function Two_Prod (A, B : Num) return Double_T is
+  P : constant Num := A * B;
+
+  Ahi, Alo, Bhi, Blo, E : Num;
+
+   begin
+  if Is_Infinity (P) or else Is_Zero (P) then
+ return (P, 0.0);
+
+  else
+ Split (A, Ahi, Alo);
+ Split (B, Bhi, Blo);
+
+ E := ((Ahi * Bhi - P) + Ahi * Blo + Alo * Bhi) + Alo * Blo;
+
+ return (P, E);
+  end if;
+   end Two_Prod;
+
+   -
+   -- Two_Sqr --
+   -
+
+   function Two_Sqr (A : Num) return Double_T is
+  Q : constant Num := A * A;
+
+  Hi, Lo, E : Num;
+
+   begin
+  if Is_Infinity (Q) or else Is_Zero (Q) then
+ return (Q, 0.0);
+
+  else
+ Split (A, Hi, Lo);
+
+ E := ((Hi * Hi - Q) + 2.0 * Hi * Lo) + Lo * Lo;
+
+ return (Q, E);
+  end if;
+   end Two_Sqr;
+
+end Product;


Re: [PATCH] libstdc++: Rewrite atomic builtin checks [PR70560]

2025-04-30 Thread Tomasz Kaminski
On Tue, Apr 29, 2025 at 10:11 PM Jonathan Wakely  wrote:

> Currently the GLIBCXX_ENABLE_ATOMIC_BUILTINS macro checks for a variety
> of __atomic built-ins for bool, short and int. If all those checks pass,
> then it defines _GLIBCXX_ATOMIC_BUILTINS and uses the definitions from
> config/cpu/generic/atomicity_builtins/atomicity.h for the non-inline
> versions of __exchange_and_add and __atomic_add that get compiled into
> libsupc++.
>
> However, the config/cpu/generic/atomicity_builtins/atomicity.h
> definitions only depend on __atomic_fetch_add not on
> __atomic_test_and_set or __atomic_compare_exchange. And they only
> operate on a variable of type _Atomic word, which is not necessarily one
> of bool, short or int (e.g. for sparcv9 _Atomic_word is 64-bit long).
>
> This means that for a target where _Atomic_word is int but there are no
> 1-byte or 2-byte atomic instructions, GLIBCXX_ENABLE_ATOMIC_BUILTINS
> will fail the checks for bool and short and not define the macro
> _GLIBCXX_ATOMIC_BUILTINS. That means that we will use a single global
> mutex for reference counting in the COW std::string and std::locale,
> even though we could use __atomic_fetch_add to do it lock-free.
>
> This commit removes most of the GLIBCXX_ENABLE_ATOMIC_BUILTINS checks,
> so that it only checks __atomic_fetch_add on _Atomic_word. This will
> enable the atomic versions of __exchange_and_add and __atomic_add for
> more targets. This is not an ABI change, because for targets which
> didn't previously use the atomic definitions of those function, they
> always make a non-inlined call to the functions in the library. If the
> definition of those functions now start using atomics, that doesn't
> change the semantics for the code calling those functions.
>
> On affected targets, new code compiled after this change will see the
> _GLIBCXX_ATOMIC_BUILTINS macro and so will use the always-inline
> versions of __exchange_and_add and __atomic_add, which use
> __atomic_fetch_add directly. That is also compatible with older code
> which calls the non-inline definitions, because those non-inline
> definitions now also use __atomic_fetch_add.
>
> The only configuration where this could be an ABI change is for a target
> which currently defines _GLIBCXX_ATOMIC_BUILTINS (because all the atomic
> built-ins for bool, short and int are supported), but which defines
> _Atomic_word to some other type for which __atomic_fetch_add is _not_
> supported. For such a target, we would previously have used inline calls
> to __atomic_fetch_add, which would have dependend on libatomic. After
> this commit, we would make non-inline calls into the library where
> __exchange_and_add and __atomic_add would use the global mutex. That
> would be an ABI break. I don't consider that a realistic scenario,
> because it wouldn't have made any sense to define _Atomic_word to a
> wider type than int, when doing so would have required libatomic to make
> libstdc++.so work. Surely such a target would have just used int for its
> _Atomic_word type.
>
> The GLIBCXX_ENABLE_BACKTRACE macro currently uses the
> glibcxx_ac_atomic_int macro defined by the checks that this commit
> removes from GLIBCXX_ENABLE_ATOMIC_BUILTINS. That wasn't a good check
> anyway, because libbacktrace actually depends on atomic loads+stores for
> pointers as well as int, and for atomic stores for size_t. This commit
> replaces the glibcxx_ac_atomic_int check with a proper test for all the
> required atomic operations on all three of int, void* and size_t. This
> ensures that the libbacktrace code used for std::stacktrace will either
> use native atomics, or implement those loads and stores only in terms of
> __sync_bool_compare_and_swap (possibly requiring that to come from
> libatomic or elsewhere).
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/70560
> PR libstdc++/119667
> * acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Only check for
> __atomic_fetch_add on _Atomic_word.
> (GLIBCXX_ENABLE_BACKTRACE): Check for __atomic_load_n and
> __atomic_store_n on int, void* and size_t.
> * config.h.in: Regenerate.
> * configure: Regenerate.
> * configure.host: Fix typo in comment.
> ---
>
> Tested x86_64-linux, no changes to the c++config.h results.
> I need to do more testing on other targets.
>
O would rename _GLIBCXX_ATOMIC_BUILTINS to _GLIBCXX_ATOMIC_WORLD_BUILTINS,
to better reflect new reality. I have checked that it seems to be one
libstdc++-v3/include/ext/atomicity.h
file change. And if I mistaken, then the rename will show all affected
files in commit.

>
>  libstdc++-v3/acinclude.m4   | 301 +-
>  libstdc++-v3/config.h.in|   2 +-
>  libstdc++-v3/configure  | 418 
>  libstdc++-v3/configure.host |   2 +-
>  4 files changed, 239 insertions(+), 484 deletions(-)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index a0094c2dd95..8dddcada071 100

Re: [RFC 0/3] Use automatic make dependencies in aarch64

2025-04-30 Thread Richard Sandiford
Alice Carlotti  writes:
> On Wed, Apr 30, 2025 at 01:29:25PM +0100, Richard Sandiford wrote:
>> Alice Carlotti  writes:
>> > On Tue, Apr 29, 2025 at 02:47:21PM +0100, Alice Carlotti wrote:
>> >> This demonstrates a clear benefit to make the makefile rules automatic. I
>> >> thought this might be quite tricky, but it turns out to be fairly
>> >> straightforward.
>> >
>> > Actually, it turns out I missed at least one more thing that's needed, so 
>> > the
>> > first two patches combined don't even build cleanly.  The issue is that
>> > dependencies on generated files need some mechanism to ensure that the
>> > generated files are available before their dependants are built during a 
>> > clean
>> > build.  This means that I can't just delete the dependency
>> >
>> > aarch64-builtins.o: aarch64-builtin-iterators.h
>> >
>> >
>> > Many other generated files are currently specified as prerequisites via the
>> > rule:
>> > $(ALL_HOST_OBJS) : | $(generated_files)
>> >
>> > I think it would make sense to include the backend generated files into 
>> > this
>> > variable.  Currently some backend files are included, but this is done 
>> > using
>> > the variables TM_H, TM_P_H, TM_D_H and TM_RUST_H variables, which looks 
>> > like a
>> > misuse of these variables.
>> >
>> > The intended meaning/use of the TM_* variables is also unclear.  As far as 
>> > I
>> > can tell, it looks like they should list the dependencies of the 
>> > corresponding
>> > files generated by mkconfig, of which the direct includes are added
>> > automatically, but this isn't quite consistent with the current values in
>> > t-aarch64.
>> 
>> Which files are you thinking of when you say that those macros are being
>> misused, and that the current values in t-aarch64 aren't consistent with
>> the intended usage?  It looks from a quick glance at:
>> 
>> TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
>>  $(srcdir)/config/aarch64/aarch64-tuning-flags.def \
>>  $(srcdir)/config/aarch64/aarch64-option-extensions.def \
>>  $(srcdir)/config/aarch64/aarch64-cores.def \
>>  $(srcdir)/config/aarch64/aarch64-isa-modes.def \
>>  $(srcdir)/config/aarch64/aarch64-arches.def
>> 
>> that aarch64-fusion-pairs.def and aarch64-tuning-flags.def could be put
>> in TM_P_H instead of TM_H, since they are included via aarch64-protos.h
>> but not (apparently) via aarch64.h.  But the others look correct.
>
> I think config/arm/aarch-common.h is missing from TM_P_H and bbitmap.h is
> missing from both.  The latter omission was introduced by me last year, but 
> the
> former has been missing since 2021.

Ah, ok, so it was more about missing entries?

> (I haven't actually dumped the full value of TM_H to check, so I might have
> missed some mechanism in the maze of macro definitions.  Similarly I think
> input.h is covered via some chain of macros, but I haven't verified this.)
>
> I think it's possible to do some programmatic verification of these macros; I
> might try that at some point.
>
>> > Another related observation is that aarch64-builtin-iterators.h is missing 
>> > from
>> > MOSTLYCLEANFILES, so it isn't removed in a clean build.  It ought to be
>> > included, and I think it would be good if we could use the same list (or 
>> > mostly
>> > the same list) of generated files for both the order-only prerequiste rule 
>> > and
>> > in MOSTLYCLEANFILES.
>> 
>> I agree that it would be good for the common subset to be specified once
>> rather than twice.  But generated_files includes things generated by
>> configure, which shouldn't be removed by "make mostlyclean".
>
> I couldn't spot any at a glance - do you have a specific example?  And why
> would these need an explicit order dependency?  I might be missing some detail
> of the configure/build flow here.

I was thinking of config.h, tm.h and tm_p.h, which are removed by clean
(of course!) but not by mostlyclean.  I'm not 100% sure why they're in
the ordering rule, but perhaps it's in case config.status is rerun due
to an update to the configuration files.

Thanks,
Richard


Re: [PATCH] c++: UNBOUND_CLASS_TEMPLATE context substitution [PR119981]

2025-04-30 Thread Jason Merrill

On 4/29/25 8:50 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/15/14?


OK.


-- >8 --

In r15-123 and r14-11434 we unconditionally set processing_template_decl
when substituting the context of an UNBOUND_CLASS_TEMPLATE, in order to
handle instantiation of the dependently scoped friend declaration

   template
   template
   friend class A::B;

where the scope A remains dependent after instantiation.  But this
turns out to misbehave for the UNBOUND_CLASS_TEMPLATE in the below
testcase

   g<[]{}>::template fn

since with the flag set substituting the args of test3 into the lambda
causes us to defer the substitution and yield a lambda that still looks
dependent, which in turn make g<[]{}> still dependent and not suitable
for qualified name lookup.

This patch restricts setting processing_template_decl during
UNBOUND_CLASS_TEMPLATE substitution to the case where there are multiple
levels of captured template parameters, as in the friend declaration.
(This means we need to substitute the template parameter list(s) first,
which makes sense since they lexically appear first.)

PR c++/119981
PR c++/119378

gcc/cp/ChangeLog:

* pt.cc (tsubst) : Substitute
into template parameter list first.  When substituting the
context, only set processing_template_decl if there's more
than one level of template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ15.C: New test.
---
  gcc/cp/pt.cc   | 20 +---
  gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C | 17 +
  2 files changed, 30 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e8d342f99f6d..26ed9de430c0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17181,18 +17181,24 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
  
  case UNBOUND_CLASS_TEMPLATE:

{
-   ++processing_template_decl;
-   tree ctx = tsubst_entering_scope (TYPE_CONTEXT (t), args,
- complain, in_decl);
-   --processing_template_decl;
tree name = TYPE_IDENTIFIER (t);
+   if (name == error_mark_node)
+ return error_mark_node;
+
tree parm_list = DECL_TEMPLATE_PARMS (TYPE_NAME (t));
+   parm_list = tsubst_template_parms (parm_list, args, complain);
+   if (parm_list == error_mark_node)
+ return error_mark_node;
  
-	if (ctx == error_mark_node || name == error_mark_node)

+   if (parm_list && TMPL_PARMS_DEPTH (parm_list) > 1)
+ ++processing_template_decl;
+   tree ctx = tsubst_entering_scope (TYPE_CONTEXT (t), args,
+ complain, in_decl);
+   if (parm_list && TMPL_PARMS_DEPTH (parm_list) > 1)
+ --processing_template_decl;
+   if (ctx == error_mark_node)
  return error_mark_node;
  
-	if (parm_list)

- parm_list = tsubst_template_parms (parm_list, args, complain);
return make_unbound_class_template (ctx, name, parm_list, complain);
}
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C b/gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C

new file mode 100644
index ..90160a52a6ef
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ15.C
@@ -0,0 +1,17 @@
+// PR c++/119981
+// { dg-do compile { target c++20 } }
+
+template class P>
+struct mp_copy_if{};
+
+template
+struct g {
+  template struct fn{};
+};
+
+template
+void test3() {
+  mp_copy_if::template fn> b;
+}
+
+template void test3();




Re: [PATCH] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-04-30 Thread Tomasz Kaminski
On Wed, Apr 30, 2025 at 3:06 PM Jonathan Wakely  wrote:

> On Wed, 30 Apr 2025 at 13:54, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Wed, Apr 30, 2025 at 1:26 PM Tomasz Kamiński 
> wrote:
> >>
> >> This commits adjust the way how the arguments are stored in the
> _Arg_value
> >> (and thus basic_format_args), by preserving the types of fixed width
> >> floating-point types, that were previously converted to float, double,
> >> long double.
> >>
> >> The _Arg_value union now contains alternatives with std::bfloat16_t,
> >> std::float16_t, std::float32_t, std::float64_t that use pre-existing
> >> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
> >>
> >> This does not affect formatting, as specialization of formatters for
> >> formats them by casting to the corresponding standard floating point
> >> type.
> >>
> >> For the 128bit floating we need to handle the ppc64 architecture,
> >> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per
> TU
> >> basis) designate either __ibm128 and __ieee128 type, we need to store
> both
> >> types in the _Arg_value and have two _Arg_types (_Arg_ibm128,
> _Arg_ieee128).
> >> On other architectures we use extra enumerator value to store
> __float128,
> >> that is different from long double and _Float128. This is consistent
> with ppc64,
> >> for which __float128 is same type as __ieee128 if present. We use
> _Arg_float128
> >> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize
> that
> >> this flag is not used for std::float128_t (_Float128_t) type, that is
> consistenly
> >> formatted via handle.
> >>
> >> The __format::_float128_t type is renamed to __format::__flt128_t, to
> mitigate
> >> visual confusion between this type and __float128. We also introduce
> __bflt16_t
> >> typedef instead of using of decltype.
> >>
> >> We add new alternative for the _Arg_value and allow them to be accessed
> via _S_get,
> >> when the types are available. However, we produce and handle
> corresponding _Arg_type,
> >> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
> >>
> >> The formatter<_Float128, _CharT> that formats via __flt128_t is always
> >> provided, when type is available. It is still correct __flt128_t is
> _Float128_t.
> >>
> >> We also provide formatter<__float128, _CharT> that formats via
> __flt128_t.
> >> As this type may be disabled (-mno-float128), extra care needs to be
> taken,
> >> for situation when __float128 is same as long double. If the formatter
> would be
> >> defined in such case, the formatter would be
> generated from
> >> different specializations, and have different mangling:
> >>   * formatter<__float128, _CharT> if __float128 is present,
> >>   * formatter<_format::__formattable_float, _CharT> otherwise.
> >> To best of my knowledge this happens only on ppc64 for __ieee128 and
> __float128,
> >> so the formatter is not defined in this case. static_assert is added to
> detect
> >> other configurations like that. In such case we should replace it with
> constraint.
> >>
> >> PR libstdc++/119246
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >> * include/std/format (__format::__bflt16_t): Define.
> >> (_GLIBCXX_FORMAT_F128): Separate value for cases where
> _Float128 is used.
> >> (__format::__float128_t): Renamed to __format::_flt128_t.
> >> (std::formatter<_Float128, _CharT>): Define always if there is
> formattable
> >> 128bit float.
> >> (std::formatter<__float128, _CharT>): Define.
> >> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust
> value.
> >> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
> >> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
> >> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
> >> (_Arg_value::_M_ieee128, _Arg_value::_M_float128,
> _Arg_value::_M_bf16)
> >> (_Arg_value::_M_f16, _Arg_value::_M_f32, _Arg_value::_M_f64):
> Define.
> >> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle
> __bflt16,
> >> _Float16, _Float32, _Float64, and __float128 types.
> >> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
> >> _Float32, _Float64 and __float128 types.
> >> (basic_format_arg::_M_visit): Hadndle _Arg_float128,
> _Arg_ieee128,
> >> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
> >> * testsuite/std/format/arguments/args.cc: Updated to
> illustrate  that
> >> extended floating point types use handles now. Added test for
> __float128.
> >> * testsuite/std/format/parse_ctx.cc: Extended test to cover
> class to
> >> check_dynamic_spec with floating point types and handles.
> >> ---
> >> Tested on x86_64-linux and powerpc64le-unknown-linux-gnu.
> >> Running additional test on powerpc64le with
> >> unix\{-mabi=ibmlongdouble,-mabi=ieeelongdouble,-mno-float128}.
> >
> > The -mabi=ibmlongdouble and -mabi=ieeelongdouble passed.
> > The  -mno-float12

Re: [PATCH] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-04-30 Thread Tomasz Kaminski
On Wed, Apr 30, 2025 at 1:26 PM Tomasz Kamiński  wrote:

> This commits adjust the way how the arguments are stored in the _Arg_value
> (and thus basic_format_args), by preserving the types of fixed width
> floating-point types, that were previously converted to float, double,
> long double.
>
> The _Arg_value union now contains alternatives with std::bfloat16_t,
> std::float16_t, std::float32_t, std::float64_t that use pre-existing
> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
>
> This does not affect formatting, as specialization of formatters for
> formats them by casting to the corresponding standard floating point
> type.
>
> For the 128bit floating we need to handle the ppc64 architecture,
> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
> basis) designate either __ibm128 and __ieee128 type, we need to store both
> types in the _Arg_value and have two _Arg_types (_Arg_ibm128,
> _Arg_ieee128).
> On other architectures we use extra enumerator value to store __float128,
> that is different from long double and _Float128. This is consistent with
> ppc64,
> for which __float128 is same type as __ieee128 if present. We use
> _Arg_float128
> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize
> that
> this flag is not used for std::float128_t (_Float128_t) type, that is
> consistenly
> formatted via handle.
>
> The __format::_float128_t type is renamed to __format::__flt128_t, to
> mitigate
> visual confusion between this type and __float128. We also introduce
> __bflt16_t
> typedef instead of using of decltype.
>
> We add new alternative for the _Arg_value and allow them to be accessed
> via _S_get,
> when the types are available. However, we produce and handle corresponding
> _Arg_type,
> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
>
> The formatter<_Float128, _CharT> that formats via __flt128_t is always
> provided, when type is available. It is still correct __flt128_t is
> _Float128_t.
>
> We also provide formatter<__float128, _CharT> that formats via __flt128_t.
> As this type may be disabled (-mno-float128), extra care needs to be taken,
> for situation when __float128 is same as long double. If the formatter
> would be
> defined in such case, the formatter would be generated
> from
> different specializations, and have different mangling:
>   * formatter<__float128, _CharT> if __float128 is present,
>   * formatter<_format::__formattable_float, _CharT> otherwise.
> To best of my knowledge this happens only on ppc64 for __ieee128 and
> __float128,
> so the formatter is not defined in this case. static_assert is added to
> detect
> other configurations like that. In such case we should replace it with
> constraint.
>
> PR libstdc++/119246
>
> libstdc++-v3/ChangeLog:
>
> * include/std/format (__format::__bflt16_t): Define.
> (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128
> is used.
> (__format::__float128_t): Renamed to __format::_flt128_t.
> (std::formatter<_Float128, _CharT>): Define always if there is
> formattable
> 128bit float.
> (std::formatter<__float128, _CharT>): Define.
> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
> (_Arg_value::_M_ieee128, _Arg_value::_M_float128,
> _Arg_value::_M_bf16)
> (_Arg_value::_M_f16, _Arg_value::_M_f32, _Arg_value::_M_f64):
> Define.
> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle
> __bflt16,
> _Float16, _Float32, _Float64, and __float128 types.
> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
> _Float32, _Float64 and __float128 types.
> (basic_format_arg::_M_visit): Hadndle _Arg_float128, _Arg_ieee128,
> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
> * testsuite/std/format/arguments/args.cc: Updated to illustrate
> that
> extended floating point types use handles now. Added test for
> __float128.
> * testsuite/std/format/parse_ctx.cc: Extended test to cover class
> to
> check_dynamic_spec with floating point types and handles.
> ---
> Tested on x86_64-linux and powerpc64le-unknown-linux-gnu.
> Running additional test on powerpc64le with
> unix\{-mabi=ibmlongdouble,-mabi=ieeelongdouble,-mno-float128}.
>
> OK for trunk?
>
>  libstdc++-v3/include/std/format   | 217 --
>  .../testsuite/std/format/arguments/args.cc|  45 ++--
>  .../testsuite/std/format/parse_ctx.cc |  72 +-
>  3 files changed, 227 insertions(+), 107 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/format
> b/libstdc++-v3/include/std/format
> index 054ce350440..73819f52f50 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/

[PATCH 0/3] Remove non-SLP path from vectorizable_conversion

2025-04-30 Thread Richard Biener


This is an example on how I'd like to see cleanup for SLP happening
in the vectorizable_* and related functions.  While this example,
vectorizable_conversion, is quite straight-forward it helps to
isolate errors.  I've done this in 3 steps:

 1) fold trivially true/false conditions based on the slp_node argument
without code block removal/reindent, etc.
 2) do trivial dead code elimination
 3) cleanup simple things - it's expected that the 'ncopies' variable
vanishes (a vec_num one might remain), the function should no
longer access STMT_VINFO_VECTYPE (but SLP_TREE_VECTYPE), the
callers then no longer need to swap those in.

Before committing the steps should be squashed into a single commit,
I've put the actual changelog into [3/3].

Richard.


Re: [PATCH] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-04-30 Thread Tomasz Kaminski
On Wed, Apr 30, 2025 at 3:09 PM Tomasz Kaminski  wrote:

>
>
> On Wed, Apr 30, 2025 at 1:26 PM Tomasz Kamiński 
> wrote:
>
>> This commits adjust the way how the arguments are stored in the _Arg_value
>> (and thus basic_format_args), by preserving the types of fixed width
>> floating-point types, that were previously converted to float, double,
>> long double.
>>
>> The _Arg_value union now contains alternatives with std::bfloat16_t,
>> std::float16_t, std::float32_t, std::float64_t that use pre-existing
>> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
>>
>> This does not affect formatting, as specialization of formatters for
>> formats them by casting to the corresponding standard floating point
>> type.
>>
>> For the 128bit floating we need to handle the ppc64 architecture,
>> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
>> basis) designate either __ibm128 and __ieee128 type, we need to store both
>> types in the _Arg_value and have two _Arg_types (_Arg_ibm128,
>> _Arg_ieee128).
>> On other architectures we use extra enumerator value to store __float128,
>> that is different from long double and _Float128. This is consistent with
>> ppc64,
>> for which __float128 is same type as __ieee128 if present. We use
>> _Arg_float128
>> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize
>> that
>> this flag is not used for std::float128_t (_Float128_t) type, that is
>> consistenly
>> formatted via handle.
>>
>> The __format::_float128_t type is renamed to __format::__flt128_t, to
>> mitigate
>> visual confusion between this type and __float128. We also introduce
>> __bflt16_t
>> typedef instead of using of decltype.
>>
>> We add new alternative for the _Arg_value and allow them to be accessed
>> via _S_get,
>> when the types are available. However, we produce and handle
>> corresponding _Arg_type,
>> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
>>
>> The formatter<_Float128, _CharT> that formats via __flt128_t is always
>> provided, when type is available. It is still correct __flt128_t is
>> _Float128_t.
>>
>> We also provide formatter<__float128, _CharT> that formats via __flt128_t.
>> As this type may be disabled (-mno-float128), extra care needs to be
>> taken,
>> for situation when __float128 is same as long double. If the formatter
>> would be
>> defined in such case, the formatter would be
>> generated from
>> different specializations, and have different mangling:
>>   * formatter<__float128, _CharT> if __float128 is present,
>>   * formatter<_format::__formattable_float, _CharT> otherwise.
>> To best of my knowledge this happens only on ppc64 for __ieee128 and
>> __float128,
>> so the formatter is not defined in this case. static_assert is added to
>> detect
>> other configurations like that. In such case we should replace it with
>> constraint.
>>
>> PR libstdc++/119246
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/format (__format::__bflt16_t): Define.
>> (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128
>> is used.
>> (__format::__float128_t): Renamed to __format::_flt128_t.
>> (std::formatter<_Float128, _CharT>): Define always if there is
>> formattable
>> 128bit float.
>> (std::formatter<__float128, _CharT>): Define.
>> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
>> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
>> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
>> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
>> (_Arg_value::_M_ieee128, _Arg_value::_M_float128,
>> _Arg_value::_M_bf16)
>> (_Arg_value::_M_f16, _Arg_value::_M_f32, _Arg_value::_M_f64):
>> Define.
>> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle
>> __bflt16,
>> _Float16, _Float32, _Float64, and __float128 types.
>> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
>> _Float32, _Float64 and __float128 types.
>> (basic_format_arg::_M_visit): Hadndle _Arg_float128, _Arg_ieee128,
>> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
>> * testsuite/std/format/arguments/args.cc: Updated to illustrate
>> that
>> extended floating point types use handles now. Added test for
>> __float128.
>> * testsuite/std/format/parse_ctx.cc: Extended test to cover class
>> to
>> check_dynamic_spec with floating point types and handles.
>> ---
>> Tested on x86_64-linux and powerpc64le-unknown-linux-gnu.
>> Running additional test on powerpc64le with
>> unix\{-mabi=ibmlongdouble,-mabi=ieeelongdouble,-mno-float128}.
>>
>> OK for trunk?
>>
>>  libstdc++-v3/include/std/format   | 217 --
>>  .../testsuite/std/format/arguments/args.cc|  45 ++--
>>  .../testsuite/std/format/parse_ctx.cc |  72 +-
>>  3 files changed, 227 insertions(+), 107 deletions(-)
>>
>> diff --git a/

Re: [PATCH] RISC-V: Implment H modifier for printing the next register name

2025-04-30 Thread Jeff Law




On 4/27/25 1:28 AM, Jin Ma wrote:

For RV32 inline assembly, when handling 64-bit integer data, it is
often necessary to process the lower and upper 32 bits separately.
Unfortunately, we can only output the current register name
(lower 32 bits) but not the next register name (upper 32 bits).

To address this, the modifier 'H' has been added to allow users
to handle the upper 32 bits of the data. While I believe the
modifier 'N' (representing the next register name) might be more
suitable for this functionality, 'N' is already in use.
Therefore, 'H' (representing the high register) was chosen instead.

Co-Authored-By: Dimitar Dimitrov 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add H.
* doc/extend.texi: Document for H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/modifier-H-error-1.c: New test.
* gcc.target/riscv/modifier-H-error-2.c: New test.
* gcc.target/riscv/modifier-H.c: New test.

OK for the trunk.  Thanks!

jeff



Re: [PATCH] c++/modules: Catch exposures of TU-local values through inline references [PR119996]

2025-04-30 Thread Jason Merrill

On 4/29/25 3:59 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu (so far just
modules.exp), OK for trunk and 15 if full regtest succeeds?


OK.


-- >8 --

In r15-9136-g0210bedf481a9f we started erroring for inline variables
that exposed TU-local entities in their definition, as such variables
would need to have their definitions emitted in importers but would not
know about the TU-local entities they referenced.

A case we mised was potentially-constant references, which disable
streaming of their definitions in make_dependency so as to comply with
[expr.const] p9.2.  This meant that we didn't see the definition
referencing a TU-local entity, leading to nonsensical results.

PR c++/119551
PR c++/119996

gcc/cp/ChangeLog:

* module.cc (depset::hash::make_dependency): Also mark inline
variables referencing TU-local values as exposures here.
(depset::hash::finalize_dependencies): Add error message for
inline variables.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-13.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 27 +-
  gcc/testsuite/g++.dg/modules/internal-13.C | 33 ++
  2 files changed, 53 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/internal-13.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index a2e0d6d2571..7e3b24e2e42 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -14062,9 +14062,10 @@ depset::hash::make_dependency (tree decl, entity_kind 
ek)
 streaming the definition in such cases.  */
  dep->clear_flag_bit ();
  
-		  if (DECL_DECLARED_CONSTEXPR_P (decl))

-   /* Also, a constexpr variable initialized to a TU-local
-  value is an exposure.  */
+ if (DECL_DECLARED_CONSTEXPR_P (decl)
+ || DECL_INLINE_VAR_P (decl))
+   /* A constexpr variable initialized to a TU-local value,
+  or an inline value (PR c++/119996), is an exposure.  */
dep->set_flag_bit ();
}
}
@@ -15025,12 +15026,24 @@ depset::hash::finalize_dependencies ()
break;
  }
  
-	  if (!explained && VAR_P (decl) && DECL_DECLARED_CONSTEXPR_P (decl))

+ if (!explained
+ && VAR_P (decl)
+ && (DECL_DECLARED_CONSTEXPR_P (decl)
+ || DECL_INLINE_VAR_P (decl)))
{
  auto_diagnostic_group d;
- error_at (DECL_SOURCE_LOCATION (decl),
-   "%qD is declared % and is initialized to "
-   "a TU-local value", decl);
+ if (DECL_DECLARED_CONSTEXPR_P (decl))
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD is declared % and is initialized to "
+ "a TU-local value", decl);
+ else
+   {
+ /* This can only occur with references.  */
+ gcc_checking_assert (TYPE_REF_P (TREE_TYPE (decl)));
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qD is a reference declared % and is "
+   "constant-initialized to a TU-local value", decl);
+   }
  bool informed = is_tu_local_value (decl, DECL_INITIAL (decl),
 /*explain=*/true);
  gcc_checking_assert (informed);
diff --git a/gcc/testsuite/g++.dg/modules/internal-13.C 
b/gcc/testsuite/g++.dg/modules/internal-13.C
new file mode 100644
index 000..ce1454e17bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-13.C
@@ -0,0 +1,33 @@
+// PR c++/119996
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi !M }
+// Similar to internal-11.C, but for potentially-constant variables.
+
+export module M;
+
+static int tu_local = 5;
+static int& foo() { return tu_local; }
+
+// For implementation reasons, we adjust [basic.link] p14.2 to restrict ignored
+// exposures to non-inline variables, since for inline variables without
+// dynamic initialisation we need to emit their initialiser for importer use.
+
+int& a = tu_local;  // OK
+inline int& b = tu_local;  // { dg-error "initialized to a TU-local value" }
+inline auto& bf = foo;  // { dg-error "initialized to a TU-local value" }
+
+// But dynamic initialisers are fine, importers will just treat them as 
external.
+inline int& c = foo();  // OK
+
+// For consistency, we follow the same rules with templates, noting that
+// we still need to emit definitions with dynamic initializers so we error.
+template  int& d = tu_local;  // OK
+template  inline int& e = tu_local;  // { dg-error "exposes TU-local 
entity" }
+template  inline int& f = foo();  // { dg-error "exposes TU-local 
entity" }
+template  inline auto& ff = foo;  // { dg-error "expos

[pushed] testsuite: Force -mcmodel=small for gcc.target/aarch64/pr115258.c

2025-04-30 Thread Richard Sandiford
The test implicitly assumed the default code model and so failed
for -mcmodel=tiny.

Tested on aarch64-linux-gnu & pushed to trunk, gcc-15 and gcc-14.

Richard


gcc/testsuite/
* gcc.target/aarch64/pr115258.c: Add -mcmodel=small.
---
 gcc/testsuite/gcc.target/aarch64/pr115258.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258.c 
b/gcc/testsuite/gcc.target/aarch64/pr115258.c
index 9a489d4604c..f60b50a0a3c 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr115258.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr115258.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mcmodel=small" } */
 /* { dg-final { check-function-bodies "**" "" "" } } */
 
 /*
-- 
2.43.0



[patch, avr, applied] PR119989: Missing clobbers in xload__libgcc

2025-04-30 Thread Georg-Johann Lay

libgcc's __xload_1...4 is clobbering Z (and also R21 is some cases),
but avr.md had clobbers of respective GPRs only up to reload.
Outcome was that code reading from the same __memx address twice
could be wrong.  This patch adds respective clobbers.

Applied as obvious.

Johann

--

AVR: target/119989 - Add missing clobbers to xload__libgcc.

libgcc's __xload_1...4 is clobbering Z (and also R21 is some cases),
but avr.md had clobbers of respective GPRs only up to reload.
Outcome was that code reading from the same __memx address twice
could be wrong.  This patch adds respective clobbers.

PR target/119989
gcc/
* config/avr/avr.md (xload__libgcc): Clobber R21, Z.

gcc/testsuite/
* gcc.target/avr/torture/pr119989.h: New file.
* gcc.target/avr/torture/pr119989-memx-1.c: New test.
* gcc.target/avr/torture/pr119989-memx-2.c: New test.
* gcc.target/avr/torture/pr119989-memx-3.c: New test.
* gcc.target/avr/torture/pr119989-memx-4.c: New test.AVR: target/119989 - Add missing clobbers to xload__libgcc.

libgcc's __xload_1...4 is clobbering Z (and also R21 is some cases),
but avr.md had clobbers of respective GPRs only up to reload.
Outcome was that code reading from the same __memx address twice
could be wrong.  This patch adds respective clobbers.

PR target/119989
gcc/
* config/avr/avr.md (xload__libgcc): Clobber R21, Z.

gcc/testsuite/
* gcc.target/avr/torture/pr119989.h: New file.
* gcc.target/avr/torture/pr119989-memx-1.c: New test.
* gcc.target/avr/torture/pr119989-memx-2.c: New test.
* gcc.target/avr/torture/pr119989-memx-3.c: New test.
* gcc.target/avr/torture/pr119989-memx-4.c: New test.

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 823fc716f2c..e07626dc109 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -674,12 +674,16 @@ (define_insn_and_split "xload__libgcc"
   [(parallel [(set (reg:MOVMODE 22)
   (mem:MOVMODE (lo_sum:PSI (reg:QI 21)
(reg:HI REG_Z
+  (clobber (reg:QI 21))
+  (clobber (reg:HI REG_Z))
   (clobber (reg:CC REG_CC))])])
 
 (define_insn "*xload__libgcc"
   [(set (reg:MOVMODE 22)
 (mem:MOVMODE (lo_sum:PSI (reg:QI 21)
  (reg:HI REG_Z
+   (clobber (reg:QI 21))
+   (clobber (reg:HI REG_Z))
(clobber (reg:CC REG_CC))]
   "avr_xload_libgcc_p (mode)
&& reload_completed"
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-1.c b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-1.c
new file mode 100644
index 000..27b89e437bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-1.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=gnu99" } */
+
+typedef __UINT8_TYPE__ TYP;
+
+#include "pr119989.h"
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-2.c b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-2.c
new file mode 100644
index 000..a8011a25b83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-2.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=gnu99" } */
+
+typedef __UINT16_TYPE__ TYP;
+
+#include "pr119989.h"
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-3.c b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-3.c
new file mode 100644
index 000..ea1c4b62c04
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-3.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=gnu99" } */
+
+__extension__ typedef __uint24 TYP;
+
+#include "pr119989.h"
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-4.c b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-4.c
new file mode 100644
index 000..32b5cd3b55f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr119989-memx-4.c
@@ -0,0 +1,6 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=gnu99" } */
+
+typedef __UINT32_TYPE__ TYP;
+
+#include "pr119989.h"
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr119989.h b/gcc/testsuite/gcc.target/avr/torture/pr119989.h
new file mode 100644
index 000..ab9d14a208b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr119989.h
@@ -0,0 +1,37 @@
+const __memx TYP some_data[] = { 1, 2, 3, 4, 5 };
+const __memx TYP *IP;
+
+TYP DT, a, b;
+
+__attribute__((noipa))
+void do_test1 (void)
+{
+DT = *IP;
+DT = *IP--;
+}
+
+__attribute__((noipa))
+void do_test2 (void)
+{
+DT = *IP;
+__asm volatile ("" ::: "memory"); // Prevents unwanted optimization
+DT = *IP--;
+}
+
+TYP difference(void)
+{
+IP = &some_data[3];
+do_test1();
+a = DT;
+IP = &some_data[3];
+do_test2();
+b = DT;
+return a - b; // Expected

Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-04-30 Thread Andrew Pinski
On Tue, Apr 29, 2025 at 11:49 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  
> wrote:
> >
> > When we have an empty function, things can go wrong with
> > cfi_startproc/cfi_endproc and a few other things like exceptions. So if
> > the only thing the function does is a call to __builtin_unreachable,
> > let's expand that to a __builtin_trap instead. For most targets that
> > is one instruction wide so it won't hurt things that much and we get
> > correct behavior for exceptions and some linkers will be better for it.
> >
> > The only thing I have a concern about is that some targets still
> > don't define a trap instruction. I tried to emit a nop instead of
> > an abort but that nop is removed during RTL DCE.
> > Should we just push targets to define a trap instead?
> > E.g. BPF, avr and sh are the 3 semi active targets which still don't
> > have a trap defined.
>
> Do any of those targets have the cfi_startproc/cfi_endproc issue
> or exceptions are relevant on those?

Yes, the sh target is the one which can run fully Linux even. There is
an open bug about sh not having trap pattern implemented yet;
https://gcc.gnu.org/PR70216; been open for 9 years now too.

>
> I'd say guard this with targetm.have_trap (), there's the chance that
> say on avr the expansion to abort() might fail to link in a
> freestanding environment.

I was thinking of that even (I even accidently left in the include for
target.h :) )

>
> As for the nop, if you mark it volatile does it prevail?

I don't even know how to mark the rtl insn as volatile.
the volatil field for INSN is listed as being if it was deleted:
 1 in an INSN, CALL_INSN, JUMP_INSN, CODE_LABEL, BARRIER, or NOTE
 if it has been deleted.
So that won't help.

Now we could use the `used` field for this marking. I have not looked
at what it could take to make sure it does not get deleted though.

Thanks,
Andrew


>
> > The QOI idea for basic block reorder is recorded as PR 120004.
> >
> > Changes since v1:
> > * v2: Move to final gimple cfg cleanup instead of expand and use
> >   BUILT_IN_UNREACHABLE_TRAP.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > PR middle-end/109267
> >
> > gcc/ChangeLog:
> >
> > * tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If the 
> > first
> > non debug statement in the first (and only) basic block is a call
> > to __builtin_unreachable change it to a call to __builtin_trap.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/pr109267-1.c: New test.
> > * gcc.dg/pr109267-2.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
> >  gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
> >  gcc/tree-cfgcleanup.cc| 14 ++
> >  3 files changed, 42 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> > b/gcc/testsuite/gcc.dg/pr109267-1.c
> > new file mode 100644
> > index 000..d6df2c3b49a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/109267 */
> > +
> > +int f(void)
> > +{
> > +  __builtin_unreachable();
> > +}
> > +
> > +/* This unreachable should be changed to be a trap. */
> > +
> > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 1 
> > "optimized"} } */
> > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" 
> > "optimized"} } */
> > diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> > b/gcc/testsuite/gcc.dg/pr109267-2.c
> > new file mode 100644
> > index 000..6cd1419a1e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/109267 */
> > +void g(void);
> > +int f(int *t)
> > +{
> > +  g();
> > +  __builtin_unreachable();
> > +}
> > +
> > +/* The unreachable should stay a unreachable. */
> > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable trap \\\(" 
> > "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable \\\(" 1 
> > "optimized"} } */
> > diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
> > index 9a8a668e12b..38a62499f93 100644
> > --- a/gcc/tree-cfgcleanup.cc
> > +++ b/gcc/tree-cfgcleanup.cc
> > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "cgraph.h"
> >  #include "tree-into-ssa.h"
> >  #include "tree-cfgcleanup.h"
> > +#include "target.h"
> >
> >
> >  /* The set of blocks in that at least one of the following changes 
> > happened:
> > @@ -1530,6 +1531,19 @@ execute_cleanup_cfg_post_optimizing (void)
> >cleanup_dead_labels ();
> >if (group_case_labels ())
> >  todo |= TODO_cle

[PATCH v3 2/6] dwarf: create annotation DIEs for btf tags

2025-04-30 Thread David Faust
The btf_decl_tag and btf_type_tag attributes provide a means to annotate
declarations and types respectively with arbitrary user provided
strings.  These strings are recorded in debug information for
post-compilation uses, and despite the name they are meant to be
recorded in DWARF as well as BTF.  New DWARF extensions
DW_TAG_GNU_annotation and DW_AT_GNU_annotation are used to represent
these user annotations in DWARF.

This patch introduces the new DWARF extension DIE and attribute, and
generates them as necessary to represent user annotations from
btf_decl_tag and btf_type_tag.

The format of the new DIE is as follows:

DW_TAG_GNU_annotation
DW_AT_name: "btf_decl_tag" or "btf_type_tag"
DW_AT_const_value: 
DW_AT_GNU_annotation: 

DW_AT_GNU_annotation is a new attribute extension used to refer to these
new annotation DIEs.  If non-null in any given declaration or type DIE,
it is a reference to a DW_TAG_GNU_annotation DIE holding an annotation
for that declaration or type.  In addition, the DW_TAG_GNU_annotation
DIEs may also have a non-null DW_AT_GNU_annotation, referring to another
annotation DIE.  This allows chains of annotation DIEs to be formed,
such as in the case where a single declaration has multiple instances of
btf_decl_tag with different string annotations.

gcc/
* dwarf2out.cc (struct annotation_node, struct annotation_node_hasher)
(btf_tag_htab): New ancillary structures and hash table.
(annotation_node_hasher::hash, annotation_node_hasher::equal): New.
(hash_btf_tag, gen_btf_tag_dies, gen_btf_type_tag_dies)
(maybe_gen_btf_type_tag_dies, gen_btf_decl_tag_dies): New functions.
(modified_type_die): Handle btf_type_tag attribute.
(gen_array_type_die): Call maybe_gen_btf_type_tags for the type.
(gen_formal_parameter_die): Call gen_btf_decl_tags for the parameter.
(gen_decl_die): Call gen_btf_decl_tags for the decl.
(gen_tagged_type_die): Call maybe_gen_btf_type_tag_dies for the type.
(dwarf2out_early_finish): Empty btf_tag_htab hash table.
(dwarf2out_cc_finalize): Delete btf_tag_htab hash table.

include/
* dwarf2.def (DW_TAG_GNU_annotation): New DWARF extension.
(DW_AT_GNU_annotation): Likewise.

gcc/testsuite/
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-4.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-5.c: New test.
---
 gcc/dwarf2out.cc  | 270 +-
 .../debug/dwarf2/dwarf-btf-decl-tag-1.c   |  11 +
 .../debug/dwarf2/dwarf-btf-decl-tag-2.c   |  25 ++
 .../debug/dwarf2/dwarf-btf-decl-tag-3.c   |  21 ++
 .../debug/dwarf2/dwarf-btf-type-tag-1.c   |  10 +
 .../debug/dwarf2/dwarf-btf-type-tag-2.c   |  31 ++
 .../debug/dwarf2/dwarf-btf-type-tag-3.c   |  15 +
 .../debug/dwarf2/dwarf-btf-type-tag-4.c   |  33 +++
 .../debug/dwarf2/dwarf-btf-type-tag-5.c   |  10 +
 include/dwarf2.def|   4 +
 10 files changed, 426 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-5.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 34ffeed86ff..1ec3b24e773 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3696,6 +3696,32 @@ static bool frame_pointer_fb_offset_valid;
 
 static vec base_types;
 
+/* A cached btf_type_tag or btf_decl_tag user annotation.  */
+struct GTY ((for_user)) annotation_node
+{
+  const char *name;
+  const char *value;
+  hashval_t hash;
+  dw_die_ref die;
+  struct annotation_node *next;
+};
+
+struct annotation_node_hasher : ggc_ptr_hash
+{
+  typedef const struct annotation_node *compare_type;
+
+  static hashval_t hash (struct annotation_node *);
+  static bool equal (const struct annotation_node *,
+const struct annotation_node *);
+};
+
+/* A hash table of tag annotation nodes for btf_type_tag and btf_decl_tag C
+   attributes.  DIEs for these user annotations may be reused if they are
+   structurally equivalent; this hash table is used to ensure the

[PATCH v3 1/6] c-family: add btf_type_tag and btf_decl_tag attributes

2025-04-30 Thread David Faust
Add two new c-family attributes, "btf_type_tag" and "btf_decl_tag"
along with a simple shared handler for them.

gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add btf_decl_tag and
btf_type_tag attributes.
(handle_btf_tag_attribute): New handler for both new attributes.
---
 gcc/c-family/c-attribs.cc | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..cc1efaeaaec 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -189,6 +189,8 @@ static tree handle_fd_arg_attribute (tree *, tree, tree, 
int, bool *);
 static tree handle_flag_enum_attribute (tree *, tree, tree, int, bool *);
 static tree handle_null_terminated_string_arg_attribute (tree *, tree, tree, 
int, bool *);
 
+static tree handle_btf_tag_attribute (tree *, tree, tree, int, bool *);
+
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)  \
   { name, function, type, variable }
@@ -640,7 +642,11 @@ const struct attribute_spec c_common_gnu_attributes[] =
   { "flag_enum", 0, 0, false, true, false, false,
  handle_flag_enum_attribute, NULL },
   { "null_terminated_string_arg", 1, 1, false, true, true, false,
- handle_null_terminated_string_arg_attribute, NULL}
+ handle_null_terminated_string_arg_attribute, 
NULL},
+  { "btf_type_tag",  1, 1, false, true, false, false,
+ handle_btf_tag_attribute, NULL},
+  { "btf_decl_tag",  1, 1, true, false, false, false,
+ handle_btf_tag_attribute, NULL}
 };
 
 const struct scoped_attribute_specs c_common_gnu_attribute_table =
@@ -5101,6 +5107,23 @@ handle_null_terminated_string_arg_attribute (tree *node, 
tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle the "btf_decl_tag" and "btf_type_tag" attributes.  */
+
+static tree
+handle_btf_tag_attribute (tree * ARG_UNUSED (node), tree name, tree args,
+ int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (!args)
+*no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  error ("%qE attribute requires a string", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Handle the "nonstring" variable attribute.  */
 
 static tree
-- 
2.47.2



[PATCH v3 6/6] bpf: add tests for CO-RE and BTF tag interaction

2025-04-30 Thread David Faust
Add a couple of tests to ensure that BTF type/decl tags do not interfere
with generation of BPF CO-RE relocations.

gcc/testsuite/
* gcc.target/bpf/core-btf-tag-1.c: New test.
* gcc.target/bpf/core-btf-tag-2.c: New test.
---
 gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c | 23 +++
 gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c | 23 +++
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c

diff --git a/gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c 
b/gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
new file mode 100644
index 000..bd0fb3e40be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
@@ -0,0 +1,23 @@
+/* Test that BTF type tags do not interfere with CO-RE relocations.  */
+
+/* { dg-do compile } */
+/* { dg-options "-gbtf -dA -mco-re" } */
+
+struct bpf_cpumask {
+  int i;
+  char c;
+} __attribute__((preserve_access_index));
+
+struct kptr_nested {
+   struct bpf_cpumask * __attribute__((btf_type_tag("kptr"))) mask;
+} __attribute__((preserve_access_index));
+
+void foo (struct kptr_nested *nested)
+{
+  if (nested && nested->mask)
+nested->mask->i = 5;
+}
+
+/* { dg-final { scan-assembler-times "bpfcr_insn" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_type \\(struct" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:0\"\\)" 3 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c 
b/gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c
new file mode 100644
index 000..6654ffe3ae0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c
@@ -0,0 +1,23 @@
+/* Test that BTF decl tags do not interfere with CO-RE relocations.  */
+
+/* { dg-do compile } */
+/* { dg-options "-gbtf -dA -mco-re" } */
+
+struct bpf_cpumask {
+  int i;
+  char c;
+} __attribute__((preserve_access_index));
+
+struct kptr_nested {
+   struct bpf_cpumask * mask __attribute__((btf_decl_tag ("decltag")));
+} __attribute__((preserve_access_index));
+
+void foo (struct kptr_nested *nested __attribute__((btf_decl_tag ("foo"
+{
+  if (nested && nested->mask)
+nested->mask->i = 5;
+}
+
+/* { dg-final { scan-assembler-times "bpfcr_insn" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_type \\(struct" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:0\"\\)" 3 } } */
-- 
2.47.2



[PATCH v3 0/6] c, dwarf, btf: Add btf_decl_tag and btf_type_tag C attributes

2025-04-30 Thread David Faust
[v2: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675241.html
 Changs from v2:
 - Change BTF format to match what is currently in use by clang, pahole and
   the linux kernel.  The format in prior versions of this series was a new
   format meant to address issues with the existing one.  However, during
   discussion at LSFMM/BPF in March, it was decided that it is not desirable
   to change the BTF format at this time, and the issues are not problematic
   in practice for current use cases.  Therefore this version of the series
   reverts to the 'old' BTF format, where type_tag can only be represented
   on pointer types.  This 'old' format is described below.
 - Address review comments on v2, including new patch 6 with tests for some
   BPF-target specific interactions.  ]

This patch series adds support for the btf_decl_tag and btf_type_tag attributes
to GCC. This entails:

- Two new C-family attributes that allow to associate (to "tag") particular
  declarations and types with arbitrary strings. As explained below, this is
  intended to be used to, for example, characterize certain pointer types.  A
  single declaration or type may have multiple occurrences of these attributes.

- The conveyance of that information in the DWARF output in the form of a new
  DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
  kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF
  kinds are already supported by LLVM and other tools in the BPF ecosystem.

Both of these attributes are already supported by clang, and are already being
used in various ways by BPF support inside the Linux kernel.

It is worth noting that while the Linux kernel and BPF/BTF is the motivating use
case of this feature, the format of the new DWARF extension is generic.  This
work could be easily adapted to provide a general way for program authors to
annotate types and declarations with arbitrary information for any
post-compilation analysis needs, not just the Linux kernel BPF verifier.  For
example, these annotations could be used to aid in ABI analysis.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
tags on certain language elements, such as struct fields.

The purpose of these annotations is to provide additional information about
types, variables, and function parameters of interest to the kernel. A
driving use case is to tag pointer types within the Linux kernel and BPF
programs with additional semantic information, such as '__user' or '__rcu'.

For example, consider the Linux kernel function do_execve with the
following declaration:

  static int do_execve(struct filename *filename,
 const char __user *const __user *__argv,
 const char __user *const __user *__envp);

Here, __user could be defined with these annotations to record semantic
information about the pointer parameters (e.g., they are user-provided) in
DWARF and BTF information. Other kernel facilities such as the BPF verifier
can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

The main motivation for emitting the tags in DWARF is that the Linux kernel
generates its BTF information via pahole, using DWARF as a source:

++  BTF  BTF   +--+
| pahole |---> vmlinux.btf --->| verifier |
++ +--+
^^
||
  DWARF |BTF |
||
 vmlinux  +-+
 module1.ko   | BPF program |
 module2.ko   +-+
   ...

This is because:

a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

b)  GCC can generate BTF for whatever target with -gbtf, but there is no
support for linking/deduplicating BTF in the linker.

c)  pahole injects additional BTF information based on specific knowledge
of kernel objects which is not available to the compiler.

In the scenario above, the verifier needs access to the pointer tags of
both the kernel types/declarations (conveyed in the DWARF and translated
to BTF by pahole) and those of the BPF program (available directly in BTF).

Another motivation for having the tag information in DWARF, unrelated to
BPF and BTF, is that the drgn project (another DWARF consumer) also wants
to benefit from these tags in order to differentiate between different
kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

This is easy: the main purpose of having this inf

Re: [PATCH RFA (fold)] c++: remove TREE_STATIC from constexpr heap vars [PR119162]

2025-04-30 Thread Jakub Jelinek
On Mon, Apr 21, 2025 at 04:39:55PM -0400, Jason Merrill wrote:
> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> -- 8< --
> 
> While working on PR119162 it occurred to me that it would be simpler to
> detect the problem of a value referring to a heap allocation if we stopped
> setting TREE_STATIC on them so they naturally are not considered to have a
> constant address.  With that change we no longer need to specifically avoid
> caching a value that refers to a deleted pointer.
> 
> But with this change maybe_nonzero_address is not sure whether the variable
> could have address zero.  I don't understand why it returns 1 only for
> variables in the current function, rather than all non-symtab decls; an auto
> variable from some other function also won't have address zero.  Maybe this
> made more sense when it was in tree_single_nonzero_warnv_p before r7-5868?
> 
> But assuming there is some reason for the current behavior, this patch only
> changes the handling of non-symtab decls when folding_cxx_constexpr.
> 
>   PR c++/119162
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (find_deleted_heap_var): Remove.
>   (cxx_eval_call_expression): Don't call it.  Don't set TREE_STATIC on
>   heap vars.
>   (cxx_eval_outermost_constant_expr): Don't mess with varpool.
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (maybe_nonzero_address): Return 1 for non-symtab
>   vars if folding_cxx_constexpr.

LGTM.

Jakub



Re: [PATCH 3/5] ipa: Dump cgraph_node UID instead of order into ipa-clones dump file

2025-04-30 Thread Michal Jires
On Mon, 2025-04-28 at 16:10:58 +0200, Martin Jambor wrote:
> Hi,
> 
> starting with GCC 15 the order is not unique for any symtab_nodes but
> m_uid is, I believe we ought to dump the latter in the ipa-clones dump,
> if only so that people can reliably match entries about new clones to
> those about removed nodes (if any).
> 
> Bootstrapped and tested on x86_64-linux. OK for master and gcc 15?
> 
> Thanks,
> 
> Martin
> 

We probably want the following changes as well.
These should cover all dumps affected by the order/uid change.

Not sure whether as part of this patch or a separate one.

Michal

---
 gcc/ipa-cp.cc  | 2 +-
 gcc/ipa-sra.cc | 2 +-
 gcc/symtab.cc  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index f7e5aa9bfd5..16ab608e82b 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -288,7 +288,7 @@ ipcp_lattice::print (FILE * f, bool dump_sources, 
bool dump_benefits)
  else
fprintf (f, " [scc: %i, from:", val->scc_no);
  for (s = val->sources; s; s = s->next)
-   fprintf (f, " %i(%f)", s->cs->caller->order,
+   fprintf (f, " %i(%f)", s->cs->caller->get_uid (),
 s->cs->sreal_frequency ().to_double ());
  fprintf (f, "]");
}
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 1331ba49b50..88bfae9502c 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -4644,7 +4644,7 @@ ipa_sra_summarize_function (cgraph_node *node)
 {
   if (dump_file)
 fprintf (dump_file, "Creating summary for %s/%i:\n", node->name (),
-node->order);
+node->get_uid ());
   gcc_obstack_init (&gensum_obstack);
   loaded_decls = new hash_set;
 
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index fe9c031247f..fc1155f4696 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -989,10 +989,10 @@ symtab_node::dump_base (FILE *f)
 same_comdat_group->dump_asm_name ());
   if (next_sharing_asm_name)
 fprintf (f, "  next sharing asm name: %i\n",
-next_sharing_asm_name->order);
+next_sharing_asm_name->get_uid ());
   if (previous_sharing_asm_name)
 fprintf (f, "  previous sharing asm name: %i\n",
-previous_sharing_asm_name->order);
+previous_sharing_asm_name->get_uid ());
 
   if (address_taken)
 fprintf (f, "  Address is taken.\n");
-- 
2.49.0



[PATCH] OpenMP: need_device_ptr and need_device_addr support for adjust_args

2025-04-30 Thread Sandra Loosemore
The attached patch adds some more missing pieces of support for the 
OpenMP "declare variant" directive -- handling for the "need_device_ptr" 
and "need_device_addr" modifiers to the "adjust_args" clause.  It 
depends on the patch waffl3x posted last week here:


https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681806.html

I've already pushed a very lightly modified backport of the current 
patch to the OG14 branch, but this is the mainline version and we would 
naturally like to get all of this stuff approved for mainline.  :-)


To give credit where it is due, this is mostly Tobias's work; he wrote 
the testcases and some patch fragments, and I just made it all work.


-Sandra

From 0cd1f6905336404f5af3c2ed4a7577e657500260 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Sat, 26 Apr 2025 02:22:39 +
Subject: [PATCH] OpenMP: need_device_ptr and need_device_addr support for
 adjust_args

This patch adds support for the "need_device_addr" modifier to the
"adjust args" clause for the "declare variant" directive, and
extends/re-works the support for "need_device_ptr" as well.

This patch builds on waffl3x's recently posted patch, "OpenMP: C/C++
adjust-args numeric ranges", here.

https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681806.html

In C++, "need_device_addr" supports mapping reference arguments to
device pointers.  In Fortran, it similarly supports arguments passed
by reference, the default for the language, in contrast to
"need_device_ptr" which is used to map arguments of c_ptr type.  The
C++ support is straightforward, but Fortran has some additional
wrinkles involving arrays passed by descriptor (a new descriptor must
be constructed with a pointer to the array data which is the only part
mapped to the device), plus special cases for passing optional
arguments and a whole array instead of a reference to its first element.

gcc/cp/ChangeLog
	* parser.cc (cp_finish_omp_declare_variant): Adjust error messages.

gcc/fortran/ChangeLog
	* trans-openmp.cc (gfc_trans_omp_declare_variant): Disallow
	polymorphic and optional arguments with need_device_addr for now, but
	don't reject need_device_addr entirely.

gcc/ChangeLog
	* gimplify.cc (modify_call_for_omp_dispatch): Rework logic for
	need_device_ptr and need_device_addr adjustments.

gcc/testsuite/Changelog
	* c-c++-common/gomp/adjust-args-10.c: Ignore the new sorry since the
	lack of proper diagnostic is already xfail'ed.
	* g++.dg/gomp/adjust-args-1.C: Adjust output patterns.
	* g++.dg/gomp/adjust-args-17.C: New.
	* gcc.dg/gomp/adjust-args-3.c: New.
	* gfortran.dg/gomp/adjust-args-14.f90: Don't expect this to fail now.

libgomp/ChangeLog
	* libgomp.texi: Mark need_device_addr as supported.
	* testsuite/libgomp.c++/need-device-ptr.C: New.
	* testsuite/libgomp.c-c++-common/dispatch-3.c: New.
	* testsuite/libgomp.fortran/adjust-args-array-descriptor.f90: New.
	* testsuite/libgomp.fortran/need-device-ptr.f90: New.

Co-Authored-By: Tobias Burnus 
---
 gcc/cp/parser.cc  |   7 +-
 gcc/fortran/trans-openmp.cc   |  44 +++--
 gcc/gimplify.cc   |  88 +++--
 .../c-c++-common/gomp/adjust-args-10.c|   2 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |   6 +-
 gcc/testsuite/g++.dg/gomp/adjust-args-17.C|  44 +
 gcc/testsuite/gcc.dg/gomp/adjust-args-3.c |  47 +
 .../gfortran.dg/gomp/adjust-args-14.f90   |   2 +-
 libgomp/libgomp.texi  |   2 +-
 .../testsuite/libgomp.c++/need-device-ptr.C   | 175 ++
 .../libgomp.c-c++-common/dispatch-3.c |  35 
 .../adjust-args-array-descriptor.f90  |  89 +
 .../libgomp.fortran/need-device-ptr.f90   | 132 +
 13 files changed, 633 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-17.C
 create mode 100644 gcc/testsuite/gcc.dg/gomp/adjust-args-3.c
 create mode 100644 libgomp/testsuite/libgomp.c++/need-device-ptr.C
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/dispatch-3.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/adjust-args-array-descriptor.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/need-device-ptr.f90

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 92f30d63b7e..8709b0c6181 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -50967,7 +50967,8 @@ cp_finish_omp_declare_variant (cp_parser *parser, cp_token *pragma_tok,
 	  else
 		{
 		  error_at (adjust_op_tok->location,
-			"expected % or %");
+			"expected %, % or "
+			"%");
 		  /* We should be trying to recover here instead of immediately
 		 failing, skipping to close paren and continuing.  */
 		  goto fail;
@@ -50978,8 +50979,8 @@ cp_finish_omp_declare_variant (cp_parser *parser, cp_token *pragma_tok,
 	  /* We should be trying to recover here instead of immediately
 		 failing, skipping to close paren and continuing.  */
 	  error_at (adjust_op_tok->location,

Re: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-04-30 Thread Vineet Gupta
Hi Pan,

On 4/27/25 18:33, Li, Pan2 wrote:
> Kindly ping.

Sorry this got backed up as I'm working on FRM overhaul - if this is not super
urgent can you please wait for a few weeks for my work to be posted.
If you prefer this go in still, fine by me as well.

Thx,
-Vineet


>
> Pan
>
> -Original Message-
> From: Li, Pan2  
> Sent: Wednesday, April 16, 2025 10:57 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
> rdapp@gmail.com; Chen, Ken ; Li, Pan2 
> 
> Subject: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore 
> volatile define_insn
>
> From: Pan Li 
>
> After we add the frm register to the global_regs, we may not need to
> define_insn that volatile to emit the frm restore insns.  The
> cooperatively-managed global register will help to handle this, instead
> of emit the volatile define_insn explicitly.
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv.cc (riscv_emit_frm_mode_set): Refactor
>   the frm mode set by removing fsrmsi_restore_volatile.
>   * config/riscv/vector-iterators.md (unspecv): Remove as
>   unnecessary.
>   * config/riscv/vector.md (fsrmsi_restore_volatile): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Adjust
>   the asm dump check times.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-75.c: Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc | 43 ++-
>  gcc/config/riscv/vector-iterators.md  |  4 --
>  gcc/config/riscv/vector.md| 13 --
>  .../rvv/base/float-point-dynamic-frm-49.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-50.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-52.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-74.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-75.c |  2 +-
>  8 files changed, 28 insertions(+), 42 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38f3ae7cd84..3878702e3a1 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -12047,27 +12047,30 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
>if (prev_mode == riscv_vector::FRM_DYN_CALL)
>  emit_insn (gen_frrmsi (backup_reg)); /* Backup frm when DYN_CALL.  */
>  
> -  if (mode != prev_mode)
> -{
> -  rtx frm = gen_int_mode (mode, SImode);
> -
> -  if (mode == riscv_vector::FRM_DYN_CALL
> - && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
> - /* No need to emit when prev mode is DYN already.  */
> - emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
> -  else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
> - && prev_mode != riscv_vector::FRM_DYN
> - && prev_mode != riscv_vector::FRM_DYN_CALL)
> - /* No need to emit when prev mode is DYN or DYN_CALL already.  */
> - emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
> -  else if (mode == riscv_vector::FRM_DYN
> - && prev_mode != riscv_vector::FRM_DYN_CALL)
> - /* Restore frm value from backup when switch to DYN mode.  */
> - emit_insn (gen_fsrmsi_restore (backup_reg));
> -  else if (riscv_static_frm_mode_p (mode))
> - /* Set frm value when switch to static mode.  */
> - emit_insn (gen_fsrmsi_restore (frm));
> +  if (mode == prev_mode)
> +return;
> +
> +  if (riscv_static_frm_mode_p (mode))
> +{
> +  /* Set frm value when switch to static mode.  */
> +  emit_insn (gen_fsrmsi_restore (gen_int_mode (mode, SImode)));
> +  return;
>  }
> +
> +  bool restore_p
> += /* No need to emit when prev mode is DYN.  */
> +  (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_CALL
> +   && prev_mode != riscv_vector::FRM_DYN)
> +  /* No need to emit if prev mode is DYN or DYN_CALL.  */
> +  || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_EXIT
> +   && prev_mode != riscv_vector::FRM_DYN
> +   && prev_mode != riscv_vector::FRM_DYN_CALL)
> +  /* Restore frm value when switch to DYN mode.  */
> +  || (mode == riscv_vector::FRM_DYN
> +   && prev_mode != riscv_vector::FRM_DYN_CALL);
> +
> +  if (restore_p)
> +emit_insn (gen_fsrmsi_restore (backup_reg));
>  }
>  
>  /* Implement Mode switching.  */
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index f8da71b1d65..28f52481952 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -122,10 +122,6 @@ (define_c_enum "unspec" [
>UNSPEC_SF_VFNRCLIPU
>  ])
>  
> -(define_c_enum "unspecv" [
> -  UNSPECV_FRM_RESTORE_EXIT
> -])
> -
>  ;; Subset of VI with fractional LMUL types
>  (d

Re: Patch ping (Re: [PATCH] combine: Special case set_noop_p in two spots)

2025-04-30 Thread Jeff Law




On 4/28/25 1:26 AM, Jakub Jelinek wrote:

On Fri, Mar 28, 2025 at 12:20:18PM +0100, Jakub Jelinek wrote:

Here is the incremental patch I was talking about.
For noop sets, we don't need to test much, they can go to i2
unless that would violate i3 JUMP condition.

With this the try_combine on the pr119291.c testcase doesn't fail,
but succeeds and we get
(insn 22 21 23 4 (set (pc)
 (pc)) "pr119291.c":27:15 2147483647 {NOOP_MOVE}
  (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
 (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
  (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
 (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (reg/v:SI 116 [ e ])
 (const_int 0 [0])) "pr119291.c":28:13 96 {*movsi_internal}
  (nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
 (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
  (nil))
after it.


I'd like to ping the 
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679573.html
patch.

Ok for trunk?

OK
jeff



Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

2025-04-30 Thread Andrew MacLeod

On 4/30/25 02:56, Richard Biener wrote:

On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod  wrote:


On 4/28/25 17:26, Andrew MacLeod wrote:

I have committed this patch to trunk after bootstrap/regression
testing again on trunk.

I'll get to gcc14/15 once I flush the current queue.

Andrew


On 4/17/25 06:44, Richard Biener wrote:

On Wed, Apr 16, 2025 at 10:55 PM Andrew MacLeod 
wrote:

This was a fun one!   An actual bug, and it took a while to sort out.
After chasing down some red herrings, this turns out to be an issue of
interaction between the range and value masks and intervening
calculations.

The original patch from 11/2023 adjusts intersection so that it can
enhance subranges based on the value mask.  ie in this testcase

[irange] int [-INF, 2147483644] MASK 0xfffc VALUE 0x1

If adjust_range() were called on this, it would eliminate the
trailing
mask/value bit ranges that are invalid and turn it into :

[-INF, -3][1, 1][4, 2147483626] MASK 0xfffc VALUE 0x1

reflecting the lower bits into the range.   The problem develops
because
we only apply adjust_range ()  during intersection in an attempt to
avoid expensive work when it isnt needed.

Unfortunately, that is what triggers this infinite loop. Rangers cache
propagates changes, and the algorithm is designed to always improve the
range.  In this case, the first iteration through, _11 receives the
above value, [irange] int [-INF, 2147483644] MASK 0xfffc VALUE 0x1
which via the mask, excludes 0, 2 and 3.

The ensuing calculations in block 7 do not trigger a successful
intersection operation, and thus the range pairs are never expanded to
eliminate the lower ranges, and it triggers another change in values
which leads to the next iteration being less precise, but not obviously
so. [irange] int [-INF, 2147483644] MASK 0xfffd VALUE 0x0 is a
result of the calculation.   As ranges as suppose to always get better
with this algorithm, we simply compare for difference.. and this range
is different, and thus we replace it. It only excludes 2 and 3.

Next iteration through the less precise range DOES trigger an
intersection operation in block 7, and when that is expanded to
[irange]
int [-INF, 1][4, 2147483644] MASK 0xfffd VALUE 0x0 using that we
can
again create the more precise range for _11 that started the cycle. and
we go on and on and on.

If we fix this so that we always expand subranges to reflect the lower
bits in a bitmask, the initial value starts with

[irange] int [-INF, -3][1, 1][4, 2147483644] MASK 0xfffc VALUE 0x1

And everything falls into place as it should.  The fix is to be
consistent about expanding those lower subranges.

I also added a couple of minor performance tweaks to avoid unnecessary
work, along with removing adjust_range () directly into
set_range_from_bitmask () .

I started at a 0.2% overall compilation increase (1.8% in VRP). In the
end, this patch is down to 0.6% in VRP, and only 0.08% overall, so
manageable for all the extra work.

It also causes a few ripples in the testsuite so 3 test cases also
needed adjustment:

* gcc.dg/pr83072-2.c :  With the addition of the expanded ranges,
CCP
use to export a global:
   Global Exported: c_3 = [irange] int [-INF, +INF] MASK 0xfffe
VALUE 0x1
and now
  Global Exported: c_3 = [irange] int [-INF, -1][1, +INF] MASK
0xfffe VALUE 0x1
Which in turn enables forwprop to collapse part of the testcase much
earlier. So I turned off forwprop for the testcase

* gcc.dg/tree-ssa/phi-opt-value-5.c  : WIth the expanded ranges, CCP2
pass use to export:
  Global Exported: d_3 = [irange] int [-INF, +INF] MASK 0xfffe
VALUE 0x1
and now
  Global Exported: d_3 = [irange] int [-INF, -1][1, +INF] MASK
0xfffe VALUE 0x1
which in turn makes the following comment obsolete as the optimization
does happen earlier.:
/* fdiv1 requires until later than phiopt2 to be able to detect that
  d is non-zero. to be able to remove the conditional.  */
Adjusted the testcase to expect everything to be taken care of by
phi-opt2 pass.

* gcc.dg/tree-ssa/vrp122.c : Previously, CCP exported:
  Global Exported: g_4 = [irange] unsigned int [0, +INF] MASK
0xfff0 VALUE 0x0
and then EVRP refined that and stored it, then the testcase tested for:
  Global Exported: g_4 = [irange] unsigned int [0, 0][16, +INF] MASK
0xfff0 VALUE 0x0
Now, CCP itself exported the expanded range, so there is nothing for
VRP
to do.
adjusted the testcase to look for the expanded range in CCP.

Now we never get into this situation where the bitmask is explicitly
applied in some places and not others.

Bootstraps on x86_64-pc-linux-gnu with no regressions. Finally.   Is
this OK for trunk, or should I hold off a little bit?

Please wait a little bit until after 15.1 is out.  It's then OK for
trunk in
stage1 and backports when no issues show up.

Thanks,
Richard.


Andrew

This is now in trunk.   Attached are the patches for gcc15 and gcc14.

Bootstrapped with no regressions o

[PATCH v3 3/6] ctf: translate annotation DIEs to internal ctf

2025-04-30 Thread David Faust
Translate DW_TAG_GNU_annotation DIEs created for C attributes
btf_decl_tag and btf_type_tag into an in-memory representation in the
CTF/BTF container.  They will be output in BTF as BTF_KIND_DECL_TAG and
BTF_KIND_TYPE_TAG records.

The new CTF kinds used to represent these annotations, CTF_K_DECL_TAG
and CTF_K_TYPE_TAG, are expected to be formalized in the next version of
the CTF specification.  For now they only exist in memory as a
translation step to BTF, and are not emitted when generating CTF
information.

gcc/
* ctfc.cc (ctf_dtu_d_union_selector): Handle CTF_K_DECL_TAG and
CTF_K_TYPE_TAG.
(ctf_add_type_tag, ctf_add_decl_tag): New.
(ctf_add_variable): Return the new ctf_dvdef_ref rather than zero.
(new_ctf_container): Initialize new members.
(ctfc_delete_container): Deallocate new members.
* ctfc.h (ctf_dvdef, ctf_dvdef_t, ctf_dvdef_ref): Move forward
declarations earlier in file.
(ctf_decl_tag_t): New typedef.
(ctf_dtdef): Add ctf_decl_tag_t member to dtd_u union.
(ctf_dtu_d_union_enum): Add new CTF_DTU_D_TAG enumerator.
(ctf_container): Add ctfc_tags vector and ctfc_type_tags_map hash_map
members.
(ctf_add_type_tag, ctf_add_decl_tag): New function protos.
(ctf_add_variable): Change prototype return type to ctf_dvdef_ref.
* dwarf2ctf.cc (gen_ctf_type_tags, gen_ctf_decl_tags)
(gen_ctf_decl_tags_for_var): New static functions.
(gen_ctf_pointer_type): Handle type tags.
(gen_ctf_sou_type): Handle decl tags.
(gen_ctf_function_type): Likewise.
(gen_ctf_variable): Likewise.
(gen_ctf_function): Likewise.
(gen_ctf_type): Handle TAG_GNU_annotation DIEs.

gcc/testsuite
* gcc.dg/debug/ctf/ctf-decl-tag-1.c: New test.
* gcc.dg/debug/ctf/ctf-type-tag-1.c: New test.

include/
* ctf.h (CTF_K_DECL_TAG, CTF_K_TYPE_TAG): New defines.
---
 gcc/ctfc.cc   |  80 ++-
 gcc/ctfc.h|  43 +-
 gcc/dwarf2ctf.cc  | 135 +-
 .../gcc.dg/debug/ctf/ctf-decl-tag-1.c |  31 
 .../gcc.dg/debug/ctf/ctf-type-tag-1.c |  19 +++
 include/ctf.h |   4 +
 6 files changed, 299 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-type-tag-1.c

diff --git a/gcc/ctfc.cc b/gcc/ctfc.cc
index 51511d69baa..49251489ae1 100644
--- a/gcc/ctfc.cc
+++ b/gcc/ctfc.cc
@@ -107,6 +107,9 @@ ctf_dtu_d_union_selector (ctf_dtdef_ref ctftype)
   return CTF_DTU_D_ARGUMENTS;
 case CTF_K_SLICE:
   return CTF_DTU_D_SLICE;
+case CTF_K_DECL_TAG:
+case CTF_K_TYPE_TAG:
+  return CTF_DTU_D_TAG;
 default:
   /* The largest member as default.  */
   return CTF_DTU_D_ARRAY;
@@ -445,6 +448,68 @@ ctf_add_reftype (ctf_container_ref ctfc, uint32_t flag, 
ctf_dtdef_ref ref,
   return dtd;
 }
 
+ctf_dtdef_ref
+ctf_add_type_tag (ctf_container_ref ctfc, uint32_t flag, const char *value,
+ ctf_dtdef_ref ref_dtd)
+{
+  ctf_dtdef_ref dtd;
+   /* Create a DTD for the tag, but do not place it in the regular types list;
+  CTF format does not (yet) encode tags.  */
+  dtd = ggc_cleared_alloc ();
+
+  dtd->dtd_name = ctf_add_string (ctfc, value, &(dtd->dtd_data.ctti_name),
+ CTF_AUX_STRTAB);
+  /* A single DW_TAG_GNU_annotation DIE may be referenced by multiple DIEs,
+ e.g. when multiple distinct types specify the same type tag.  We will
+ synthesize multiple CTF DTD records in that case, so we cannot tie them
+ all to the same key (the DW_TAG_GNU_annotation DIE) in ctfc_types.  */
+  dtd->dtd_key = NULL;
+  dtd->ref_type = ref_dtd;
+  dtd->dtd_data.ctti_info = CTF_TYPE_INFO (CTF_K_TYPE_TAG, flag, 0);
+  dtd->dtd_u.dtu_tag.ref_var = NULL; /* Not used for type tags.  */
+  dtd->dtd_u.dtu_tag.component_idx = 0; /* Not used for type tags.  */
+
+  /* Insert tag directly into the tag list.  Type ID will be assigned later.  
*/
+  vec_safe_push (ctfc->ctfc_tags, dtd);
+
+  /* Keep ctfc_aux_strlen updated.  */
+  if ((value != NULL) && strcmp (value, ""))
+ctfc->ctfc_aux_strlen += strlen (value) + 1;
+
+  return dtd;
+}
+
+ctf_dtdef_ref
+ctf_add_decl_tag (ctf_container_ref ctfc, uint32_t flag, const char *value,
+ ctf_dtdef_ref ref_dtd, uint32_t comp_idx)
+{
+   ctf_dtdef_ref dtd;
+   /* Create a DTD for the tag, but do not place it in the regular types list;
+  ctf format does not (yet) encode tags.  */
+  dtd = ggc_cleared_alloc ();
+
+  dtd->dtd_name = ctf_add_string (ctfc, value, &(dtd->dtd_data.ctti_name),
+ CTF_AUX_STRTAB);
+  /* A single DW_TAG_GNU_annotation DIE may be referenced by multiple DIEs,
+ e.g. when multiple distinct declarations specify the same decl ta

[PATCH v3 4/6] btf: generate and output DECL_TAG and TYPE_TAG records

2025-04-30 Thread David Faust
Support the btf_decl_tag and btf_type_tag attributes in BTF by creating
and emitting BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records,
respectively, for them.

Some care is required when -gprune-btf is in effect to avoid emitting
decl or type tags for declarations or types which have been pruned and
will not be emitted in BTF.

gcc/
* btfout.cc (get_btf_kind): Handle DECL_TAG and TYPE_TAG kinds.
(btf_calc_num_vbytes): Likewise.
(btf_asm_type): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_tags): New.
(btf_output): Call it here.
(btf_add_used_type): Replace with simple wrapper around...
(btf_add_used_type_1): ...the implementation.  Handle
BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
(btf_add_vars): Update btf_add_used_type call.
(btf_assign_tag_ids): New.
(btf_mark_type_used): Update btf_add_used_type call.
(btf_collect_pruned_types): Likewise.  Handle type and decl tags.
(btf_finish): Call btf_assign_tag_ids.

gcc/testsuite/
* gcc.dg/debug/btf/btf-decl-tag-1.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-2.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-3.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-4.c: New test.
* gcc.dg/debug/btf/btf-type-tag-1.c: New test.
* gcc.dg/debug/btf/btf-type-tag-2.c: New test.
* gcc.dg/debug/btf/btf-type-tag-3.c: New test.
* gcc.dg/debug/btf/btf-type-tag-4.c: New test.
* gcc.dg/debug/btf/btf-type-tag-c2x-1.c: New test.

include/
* btf.h (BTF_KIND_DECL_TAG, BTF_KIND_TYPE_TAG) New defines.
(struct btf_decl_tag): New.
---
 gcc/btfout.cc | 171 +++---
 .../gcc.dg/debug/btf/btf-decl-tag-1.c |  14 ++
 .../gcc.dg/debug/btf/btf-decl-tag-2.c |  22 +++
 .../gcc.dg/debug/btf/btf-decl-tag-3.c |  22 +++
 .../gcc.dg/debug/btf/btf-decl-tag-4.c |  34 
 .../gcc.dg/debug/btf/btf-type-tag-1.c |  26 +++
 .../gcc.dg/debug/btf/btf-type-tag-2.c |  13 ++
 .../gcc.dg/debug/btf/btf-type-tag-3.c |  28 +++
 .../gcc.dg/debug/btf/btf-type-tag-4.c |  24 +++
 .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c |  22 +++
 include/btf.h |  14 ++
 11 files changed, 366 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index ff7ea42a961..c00e0c98015 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -141,6 +141,8 @@ get_btf_kind (uint32_t ctf_kind)
 case CTF_K_VOLATILE: return BTF_KIND_VOLATILE;
 case CTF_K_CONST:return BTF_KIND_CONST;
 case CTF_K_RESTRICT: return BTF_KIND_RESTRICT;
+case CTF_K_DECL_TAG: return BTF_KIND_DECL_TAG;
+case CTF_K_TYPE_TAG: return BTF_KIND_TYPE_TAG;
 default:;
 }
   return BTF_KIND_UNKN;
@@ -217,6 +219,7 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
 case BTF_KIND_CONST:
 case BTF_KIND_RESTRICT:
 case BTF_KIND_FUNC:
+case BTF_KIND_TYPE_TAG:
 /* These kinds have no vlen data.  */
   break;
 
@@ -256,6 +259,10 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
   vlen_bytes += vlen * sizeof (struct btf_var_secinfo);
   break;
 
+case BTF_KIND_DECL_TAG:
+  vlen_bytes += sizeof (struct btf_decl_tag);
+  break;
+
 default:
   break;
 }
@@ -452,6 +459,20 @@ btf_asm_type (ctf_dtdef_ref dtd)
 and should write 0.  */
   dw2_asm_output_data (4, 0, "(unused)");
   return;
+case BTF_KIND_DECL_TAG:
+  {
+   if (dtd->ref_type)
+ break;
+   else if (dtd->dtd_u.dtu_tag.ref_var)
+ {
+   /* ref_type is NULL for decl tag attached to a variable.  */
+   ctf_dvdef_ref dvd = dtd->dtd_u.dtu_tag.ref_var;
+   dw2_asm_output_data (4, dvd->dvd_id,
+"btt_type: (BTF_KIND_VAR '%s')",
+dvd->dvd_name);
+   return;
+ }
+  }
 default:
   break;
 }
@@ -801,6 +822,12 @@ output_asm_btf_vlen_bytes (ctf_container_ref ctfc, 
ctf_dtdef_ref dtd)
 at this point.  */
   gcc_unreachable ();
 
+case BTF_KIND_DECL_TAG:
+  dw2_asm_output_data (4, dtd->dtd_u.dtu_tag.component_idx,
+  "component_idx=%d",
+  dtd->dtd_u.dtu_tag.component_idx);
+  break;
+
 

[PATCH v3 5/6] doc: document btf_type_tag and btf_decl_tag attributes

2025-04-30 Thread David Faust
gcc/
* doc/extend.texi (Common Function Attributes)
(Common Variable Attributes): Document btf_decl_tag attribute.
(Common Type Attributes): Document btf_type_tag attribute.
---
 gcc/doc/extend.texi | 79 +
 1 file changed, 79 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0978c4c41b2..365fe179f19 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1970,6 +1970,13 @@ declares that @code{my_alloc1} returns 16-byte aligned 
pointers and
 that @code{my_alloc2} returns a pointer whose value modulo 32 is equal
 to 8.
 
+@cindex @code{btf_decl_tag} function attribute
+@item btf_decl_tag
+The @code{btf_decl_tag} attribute may be used to associate function
+declarations with arbitrary strings by recording those strings in DWARF
+and/or BTF information in the same way that it is used for variables.
+See @ref{Common Variable Attributes}.
+
 @cindex @code{cold} function attribute
 @item cold
 The @code{cold} attribute on functions is used to inform the compiler that
@@ -7113,6 +7120,41 @@ align them on any target.
 The @code{aligned} attribute can also be used for functions
 (@pxref{Common Function Attributes}.)
 
+@cindex @code{btf_decl_tag} variable attribute
+@item btf_decl_tag (@var{argument})
+The @code{btf_decl_tag} attribute may be used to associate variable
+declarations, struct or union member declarations, function
+declarations, and function parameter declarations with arbitrary strings.
+These strings are not interpreted by the compiler in any way, and have
+no effect on code generation.  Instead, these user-provided strings
+are recorded in DWARF (via @code{DW_AT_GNU_annotation} and
+@code{DW_TAG_GNU_annotation} extensions) and BTF information (via
+@code{BTF_KIND_DECL_TAG} records), and associated to the attributed
+declaration.  If neither DWARF nor BTF information is generated, the
+attribute has no effect.
+
+The argument is treated as an ordinary string in the source language
+with no additional special rules.
+
+The attribute may be supplied multiple times for a single declaration,
+in which case each distinct argument string will be recorded in a
+separate DIE or BTF record, each associated to the declaration.  For
+a single declaration with multiple @code{btf_decl_tag} attributes,
+the order of the @code{DW_TAG_GNU_annotation} DIEs produced is not
+guaranteed to maintain the order of attributes in the source code.
+
+For example:
+
+@smallexample
+int *foo __attribute__ ((btf_decl_tag ("__percpu")));
+@end smallexample
+
+@noindent
+when compiled with @code{-gbtf} results in an additional
+@code{BTF_KIND_DECL_TAG} BTF record to be emitted in the BTF info,
+associating the string ``__percpu'' with the @code{BTF_KIND_VAR}
+record for the variable ``foo''.
+
 @cindex @code{counted_by} variable attribute
 @item counted_by (@var{count})
 The @code{counted_by} attribute may be attached to the C99 flexible array
@@ -8302,6 +8344,43 @@ is given by the product of arguments 1 and 2, and that
 @code{malloc_type}, like the standard C function @code{malloc},
 returns an object whose size is given by argument 1 to the function.
 
+@cindex @code{btf_type_tag} type attribute
+@item btf_type_tag (@var{argument})
+The @code{btf_type_tag} attribute may be used to associate (to ``tag'')
+particular types with arbitrary string annotations.  These annotations
+are recorded in debugging info by supported debug formats, currently
+DWARF (via @code{DW_AT_GNU_annotation} and @code{DW_TAG_GNU_annotation}
+extensions) and BTF (via @code{BTF_KIND_TYPE_TAG} records).  These
+annotation strings are not interpreted by the compiler in any way, and
+have no effect on code generation.  If neither DWARF nor BTF
+information is generated, the attribute has no effect.
+
+The argument is treated as an ordinary string in the source language
+with no additional special rules.
+
+The attribute may be supplied multiple times for a single type, in
+which case each distinct argument string will be recorded in a
+separate DIE or BTF record, each associated to the type.  For a single
+type with multiple @code{btf_type_tag} attributes, the order of the
+@code{DW_TAG_GNU_annotation} DIEs produced is not guaranteed to
+maintain the order of attributes in the source code.
+
+For example the following code:
+
+@smallexample
+int * __attribute__ ((btf_type_tag ("__user"))) foo;
+@end smallexample
+
+@noindent
+when compiled with @code{-gbtf} results in an additional
+@code{BTF_KIND_TYPE_TAG} BTF record to be emitted in the BTF info,
+associating the string ``__user'' with the normal @code{BTF_KIND_PTR}
+record for the pointer-to-integer type used in the declaration.
+
+Note that the BTF format currently only has a representation for type
+tags associated with pointer types.  Type tags on non-pointer types
+may be silently skipped when generating BTF.
+
 @cindex @code{copy} type attribute
 @item copy
 @itemx copy (@var{expression})
-- 
2.47.2


Re: [PATCH 3/5] ipa: Dump cgraph_node UID instead of order into ipa-clones dump file

2025-04-30 Thread Martin Jambor
Hi,

On Wed, Apr 30 2025, Michal Jires wrote:
> On Mon, 2025-04-28 at 16:10:58 +0200, Martin Jambor wrote:
>> Hi,
>> 
>> starting with GCC 15 the order is not unique for any symtab_nodes but
>> m_uid is, I believe we ought to dump the latter in the ipa-clones dump,
>> if only so that people can reliably match entries about new clones to
>> those about removed nodes (if any).
>> 
>> Bootstrapped and tested on x86_64-linux. OK for master and gcc 15?
>> 
>> Thanks,
>> 
>> Martin
>> 
>
> We probably want the following changes as well.
> These should cover all dumps affected by the order/uid change.
>
> Not sure whether as part of this patch or a separate one.

assuming this patch does not need making get_uid member function const
and so we don't conflict, please commit it separately (with a proper
changelog).

Thanks!

Martin


>
> Michal
>
> ---
>  gcc/ipa-cp.cc  | 2 +-
>  gcc/ipa-sra.cc | 2 +-
>  gcc/symtab.cc  | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index f7e5aa9bfd5..16ab608e82b 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -288,7 +288,7 @@ ipcp_lattice::print (FILE * f, bool 
> dump_sources, bool dump_benefits)
> else
>   fprintf (f, " [scc: %i, from:", val->scc_no);
> for (s = val->sources; s; s = s->next)
> - fprintf (f, " %i(%f)", s->cs->caller->order,
> + fprintf (f, " %i(%f)", s->cs->caller->get_uid (),
>s->cs->sreal_frequency ().to_double ());
> fprintf (f, "]");
>   }
> diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
> index 1331ba49b50..88bfae9502c 100644
> --- a/gcc/ipa-sra.cc
> +++ b/gcc/ipa-sra.cc
> @@ -4644,7 +4644,7 @@ ipa_sra_summarize_function (cgraph_node *node)
>  {
>if (dump_file)
>  fprintf (dump_file, "Creating summary for %s/%i:\n", node->name (),
> -  node->order);
> +  node->get_uid ());
>gcc_obstack_init (&gensum_obstack);
>loaded_decls = new hash_set;
>  
> diff --git a/gcc/symtab.cc b/gcc/symtab.cc
> index fe9c031247f..fc1155f4696 100644
> --- a/gcc/symtab.cc
> +++ b/gcc/symtab.cc
> @@ -989,10 +989,10 @@ symtab_node::dump_base (FILE *f)
>same_comdat_group->dump_asm_name ());
>if (next_sharing_asm_name)
>  fprintf (f, "  next sharing asm name: %i\n",
> -  next_sharing_asm_name->order);
> +  next_sharing_asm_name->get_uid ());
>if (previous_sharing_asm_name)
>  fprintf (f, "  previous sharing asm name: %i\n",
> -  previous_sharing_asm_name->order);
> +  previous_sharing_asm_name->get_uid ());
>  
>if (address_taken)
>  fprintf (f, "  Address is taken.\n");
> -- 
> 2.49.0


Re: [PATCH v5 02/10] libstdc++: Add header mdspan to the build-system.

2025-04-30 Thread Jonathan Wakely
On Wed, 30 Apr 2025 at 10:34, Luc Grosheintz  wrote:
>
>
>
> On 4/29/25 3:11 PM, Jonathan Wakely wrote:
> > On Tue, 29 Apr 2025 at 13:59, Luc Grosheintz  
> > wrote:
> >>
> >> Creates a nearly empty header mdspan and adds it to the build-system and
> >> Doxygen config file.
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >>  * doc/doxygen/user.cfg.in: Add .
> >>  * include/Makefile.am: Ditto.
> >>  * include/Makefile.in: Ditto.
> >>  * include/precompiled/stdc++.h: Ditto.
> >>  * include/std/mdspan: New file.
> >>
> >> Signed-off-by: Luc Grosheintz 
> >> ---
> >>   libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
> >>   libstdc++-v3/include/Makefile.am  |  1 +
> >>   libstdc++-v3/include/Makefile.in  |  1 +
> >>   libstdc++-v3/include/precompiled/stdc++.h |  1 +
> >>   libstdc++-v3/include/std/mdspan   | 48 +++
> >>   5 files changed, 52 insertions(+)
> >>   create mode 100644 libstdc++-v3/include/std/mdspan
> >>
> >> diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
> >> b/libstdc++-v3/doc/doxygen/user.cfg.in
> >> index 19ae67a67ba..e926c6707f6 100644
> >> --- a/libstdc++-v3/doc/doxygen/user.cfg.in
> >> +++ b/libstdc++-v3/doc/doxygen/user.cfg.in
> >> @@ -880,6 +880,7 @@ INPUT  = 
> >> @srcdir@/doc/doxygen/doxygroups.cc \
> >>include/list \
> >>include/locale \
> >>include/map \
> >> + include/mdspan \
> >>include/memory \
> >>include/memory_resource \
> >>include/mutex \
> >> diff --git a/libstdc++-v3/include/Makefile.am 
> >> b/libstdc++-v3/include/Makefile.am
> >> index 537774c2668..1140fa0dffd 100644
> >> --- a/libstdc++-v3/include/Makefile.am
> >> +++ b/libstdc++-v3/include/Makefile.am
> >> @@ -38,6 +38,7 @@ std_freestanding = \
> >>  ${std_srcdir}/generator \
> >>  ${std_srcdir}/iterator \
> >>  ${std_srcdir}/limits \
> >> +   ${std_srcdir}/mdspan \
> >>  ${std_srcdir}/memory \
> >>  ${std_srcdir}/numbers \
> >>  ${std_srcdir}/numeric \
> >> diff --git a/libstdc++-v3/include/Makefile.in 
> >> b/libstdc++-v3/include/Makefile.in
> >> index 7b96b2207f8..c96e981acd6 100644
> >> --- a/libstdc++-v3/include/Makefile.in
> >> +++ b/libstdc++-v3/include/Makefile.in
> >> @@ -396,6 +396,7 @@ std_freestanding = \
> >>  ${std_srcdir}/generator \
> >>  ${std_srcdir}/iterator \
> >>  ${std_srcdir}/limits \
> >> +   ${std_srcdir}/mdspan \
> >>  ${std_srcdir}/memory \
> >>  ${std_srcdir}/numbers \
> >>  ${std_srcdir}/numeric \
> >> diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
> >> b/libstdc++-v3/include/precompiled/stdc++.h
> >> index f4b312d9e47..e7d89c92704 100644
> >> --- a/libstdc++-v3/include/precompiled/stdc++.h
> >> +++ b/libstdc++-v3/include/precompiled/stdc++.h
> >> @@ -228,6 +228,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> diff --git a/libstdc++-v3/include/std/mdspan 
> >> b/libstdc++-v3/include/std/mdspan
> >> new file mode 100644
> >> index 000..4094a416d1e
> >> --- /dev/null
> >> +++ b/libstdc++-v3/include/std/mdspan
> >> @@ -0,0 +1,48 @@
> >> +//  -*- C++ -*-
> >> +
> >> +// Copyright (C) 2025 Free Software Foundation, Inc.
> >
> > I've just noticed that this file claims to be copyright FSF, but if
> > you're contributing under the https://gcc.gnu.org/dco.html terms
> > rather than via a copyright assignment to the FSF, then that's
> > incorrect.
> >
> > Please see the  header for the DCO-compatible way to mention
> > that the header is covered by copyright without being overly precise.
> >
> > Otherwise these patches look good and I'll start pushing them this
> > week - thanks!
>
> That's exciting to hear! I'll fix the issues and strip anything layout
> related from this series.
>
> I'm slightly nervous because the first time I used  outside of
> the test harness I was greeted with an error due to not including a
> header inside ; and because I was using PCH, it didn't cause an
> error during testing.

Yeah, that's annoying. I do all my dev work and local testing in a
build with --disable-libstdcxx-pch (which also has the benefit of not
recompiling the PCH every time you touch any header) and then I do
full testing on another system with PCH enabled.

>
> I've since reconfigured with `--disable-libstdcxx-pch` and also run
> with `--target_board='unix/-Wall/-Wextra/-pedantic'`.
>
> Is there more I can do to make sure the patches are correct?

Testing with -fsanitize=undefined can be useful, but tricky to get the
testsuite to use that (you need to bodge some things so that
libubsan.so can be found). It's probably not needed here, because
everything is constexpr and your tests are already being constant
evaluated so most U

Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
>
> AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
> avoid the following regressions with
>
> $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
>
> FAIL: gcc.dg/pr105911.c (internal compiler error: in lra_split_hard_reg_for, 
> at lra-assigns.cc:1863)
> FAIL: gcc.dg/pr105911.c (test for excess errors)
> FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
> vpro[lr]q 29
> FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
> FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
> FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
> FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
> FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
> FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
> FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
> FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
> FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
> FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
> FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
> FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
> FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
>
> Tested with glibc master branch at
>
> commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
> Author: Samuel Thibault 
> Date:   Sun Mar 2 15:16:45 2025 +0100
>
> htl: move pthread_once into libc
>
> and built Linux kernel 6.13.5 on x86-64.
>
> PR target/119083
> * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
> and BREG.

The commit message doesn't reflect what the patch does.

OTOH, this is a very delicate part of the compiler. You are risking RA
failures, the risk/benefit ratio is very high, so I wouldn't touch it
without clear benefits. Do you have a concrete example where declaring
BREG as spilled hurts?

Uros.


Re: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 12:22 PM H.J. Lu  wrote:
>
> On Tue, Apr 29, 2025 at 5:30 PM Uros Bizjak  wrote:
> >
> > On Tue, Apr 29, 2025 at 9:56 AM H.J. Lu  wrote:
> > >
> > > Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy
> > > propagation pass can eliminate multiple __tls_get_addr calls.
> >
> > __tls_get_addr needs to be called with 16-byte aligned stack, I don't
> > think the compiler will correctly handle required call alignment if
> > you emit the call without emit_libcall_block.
>
> ix86_split_tls_local_dynamic_base_64 generates the same sequence
> as emit_libcall_block.  stack alignment is handled by
>
> (define_expand "@tls_local_dynamic_base_64_"
>   [(set (match_operand:P 0 "register_operand")
> (unspec:P
>  [(match_operand 1 "constant_call_address_operand")
>   (reg:P SP_REG)]
>  UNSPEC_TLS_LD_BASE))]
>   "TARGET_64BIT"
>   "ix86_tls_descriptor_calls_expanded_in_cfun = true;")

The above is to align the initial %rsp at the beginning of the
function. When PUSH instructions in the function misaling %rsp, there
will be nothing to keep %rsp aligned before the call to
__tls_get_addr.

We have been bitten by this in the past.

Uros.


Re: [PATCH] x86: Update TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
>
> SMALL_REGISTER_CLASSES was added by
>
> commit c98f874233428d7e6ba83def7842fd703ac0ddf1
> Author: James Van Artsdalen 
> Date:   Sun Feb 9 13:28:48 1992 +
>
> Initial revision
>
> which became TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P.  It is false from
> day 1 for i386.  Since x86-64 doubles the number of registers, Change
> TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to return false for x86-64
> and update decrease_live_ranges_number to skip hard register if
> targetm.class_likely_spilled_p returns true.  These extend the live
> range of rbp, r8-r31 and xmm1-xmm31 registers.
>
> PR target/118996
> * ira.cc (decrease_live_ranges_number): Skip hard register if
> targetm.class_likely_spilled_p returns true.
> * config/i386/i386.cc (ix86_small_register_classes_for_mode_p):
> New.
> (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): Use it.

Redeclaring X86_64 is too risky. We can perhaps enable integer
registers for APX because it is a relatively new development. I'm
undecided about AVX-512, perhaps this is worth a try.

In the case of AVX-512, can you provide some performance numbers to
assess the risk/benefit ratio?

Uros.


[PATCH] libgcc: Update FMV features to latest ACLE spec 2024Q4

2025-04-30 Thread Wilco Dijkstra

Update FMV features to latest ACLE spec of 2024Q4 - several features have been 
removed
or merged.  Add FMV support for CSSC and MOPS.  Preserve the ordering in enum 
CPUFeatures.

gcc:
* common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC
and FEAT_MOPS. 
* config/aarch64/aarch64-option-extensions.def: Remove FMV support
for RPRES, use PULL rather than AES, add FMV support for CSSC and MOPS.

libgcc:
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Remove unused features, add support for CSSC and MOPS.

---

diff --git a/gcc/common/config/aarch64/cpuinfo.h 
b/gcc/common/config/aarch64/cpuinfo.h
index 
cd3c2b20c5315b035870528fa39246bbc780f369..d329d861bf73fbb03436643d09553d7eabecdfb8
 100644
--- a/gcc/common/config/aarch64/cpuinfo.h
+++ b/gcc/common/config/aarch64/cpuinfo.h
@@ -39,10 +39,10 @@ enum CPUFeatures {
   FEAT_FP,
   FEAT_SIMD,
   FEAT_CRC,
-  FEAT_SHA1,
+  FEAT_CSSC,
   FEAT_SHA2,
   FEAT_SHA3,
-  FEAT_AES,
+  FEAT_unused5,
   FEAT_PMULL,
   FEAT_FP16,
   FEAT_DIT,
@@ -53,30 +53,30 @@ enum CPUFeatures {
   FEAT_RCPC,
   FEAT_RCPC2,
   FEAT_FRINTTS,
-  FEAT_DGH,
+  FEAT_unused6,
   FEAT_I8MM,
   FEAT_BF16,
-  FEAT_EBF16,
-  FEAT_RPRES,
+  FEAT_unused7,
+  FEAT_unused8,
   FEAT_SVE,
-  FEAT_SVE_BF16,
-  FEAT_SVE_EBF16,
-  FEAT_SVE_I8MM,
+  FEAT_unused9,
+  FEAT_unused10,
+  FEAT_unused11,
   FEAT_SVE_F32MM,
   FEAT_SVE_F64MM,
   FEAT_SVE2,
-  FEAT_SVE_AES,
+  FEAT_unused12,
   FEAT_SVE_PMULL128,
   FEAT_SVE_BITPERM,
   FEAT_SVE_SHA3,
   FEAT_SVE_SM4,
   FEAT_SME,
-  FEAT_MEMTAG,
+  FEAT_unused13,
   FEAT_MEMTAG2,
-  FEAT_MEMTAG3,
+  FEAT_unused14,
   FEAT_SB,
   FEAT_unused1,
-  FEAT_SSBS,
+  FEAT_unused15,
   FEAT_SSBS2,
   FEAT_BTI,
   FEAT_unused2,
@@ -87,6 +87,7 @@ enum CPUFeatures {
   FEAT_SME_I64,
   FEAT_SME2,
   FEAT_RCPC3,
+  FEAT_MOPS,
   FEAT_MAX,
   FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
in __aarch64_cpu_features.  */
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
79b79358c5d4a9e23c7601f7a1ba742dddadb778..b111b33d9bc75c8b85faf672fba051e0e417b796
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -128,7 +128,9 @@ AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), 
"sha1 sha2")
 
 AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
 
-AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
+AARCH64_OPT_EXTENSION("aes", AES, (SIMD), (), (), "aes")
+
+AARCH64_FMV_FEATURE("aes", PMULL, (AES))
 
 /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
(such as SHA3 and the SVE2 crypto extensions).  */
@@ -171,8 +173,6 @@ AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), 
"i8mm")
instructions.  */
 AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
 
-AARCH64_FMV_FEATURE("rpres", RPRES, ())
-
 AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16, FCMA), (), (), "sve")
 
 /* This specifically does not imply +sve.  */
@@ -190,7 +190,7 @@ AARCH64_OPT_FMV_EXTENSION("sve2", SVE2, (SVE), (), (), 
"sve2")
 
 AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), (), (), "sveaes")
 
-AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2_AES))
+AARCH64_FMV_FEATURE("sve2-aes", SVE_PMULL128, (SVE2_AES))
 
 AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
  "svebitperm")
@@ -245,9 +245,9 @@ AARCH64_OPT_EXTENSION("sme-b16b16", SME_B16B16, (SME2, 
SVE_B16B16), (), (), "sme
 
 AARCH64_OPT_EXTENSION("sme-f16f16", SME_F16F16, (SME2), (), (), "smef16f16")
 
-AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")
+AARCH64_OPT_FMV_EXTENSION("mops", MOPS, (), (), (), "mops")
 
-AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
+AARCH64_OPT_FMV_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
 AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
 
diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
index 
dda9dc696893cd392dd1e15d03672053cc481c6f..14877f5d8410a51fe47f94cb1a2ff5399f95c253
 100644
--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -230,9 +230,15 @@ struct {
 #ifndef HWCAP2_SVE_EBF16
 #define HWCAP2_SVE_EBF16 (1UL << 33)
 #endif
+#ifndef HWCAP2_CSSC
+#define HWCAP2_CSSC (1UL << 34)
+#endif
 #ifndef HWCAP2_SME2
 #define HWCAP2_SME2 (1UL << 37)
 #endif
+#ifndef HWCAP2_MOPS
+#define HWCAP2_MOPS (1UL << 43)
+#endif
 #ifndef HWCAP2_LRCPC3
 #define HWCAP2_LRCPC3  (1UL << 46)
 #endif
@@ -269,10 +275,6 @@ __init_cpu_features_constructor (unsigned long hwcap,
 setCPUFeature(FEAT_DIT);
   if (hwcap & HWCAP_ASIMDRDM)
 setCPUFeature(FEAT_RDM);
-  if (hwcap & HWCAP_AES)
-setCPUFeature(FEAT_AES);
-  if (hwcap & HWCAP_SHA1)
-setCPUFeature(FEAT_SHA1);
   if (hwcap & HWCAP_SHA2)
 setCPUFeature(FEAT_SHA2);
   if (hwcap & HWCAP_JSCVT)
@@ -282,19 +284,9 @@ __init_cpu_features_c

Re: [PATCH v5 03/10] libstdc++: Implement std::extents [PR107761].

2025-04-30 Thread Luc Grosheintz




On 4/30/25 4:37 AM, Tomasz Kaminski wrote:

On Tue, Apr 29, 2025 at 11:52 PM Jonathan Wakely  wrote:


On Tue, 29 Apr 2025 at 14:55, Tomasz Kaminski  wrote:




On Tue, Apr 29, 2025 at 2:55 PM Luc Grosheintz 

wrote:


This implements std::extents from  according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This
precondition is not checked at runtime.

The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
For extents this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

 PR libstdc++/107761

libstdc++-v3/ChangeLog:

 * include/std/mdspan (extents): New class.
 * src/c++23/std.cc.in: Add 'using std::extents'.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan  | 262 +++
  libstdc++-v3/src/c++23/std.cc.in |   6 +-
  2 files changed, 267 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan

b/libstdc++-v3/include/std/mdspan

index 4094a416d1e..39ced1d6301 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -33,6 +33,12 @@
  #pragma GCC system_header
  #endif

+#include 
+#include 
+#include 
+#include 
+#include 
+
  #define __glibcxx_want_mdspan
  #include 

@@ -41,6 +47,262 @@
  namespace std _GLIBCXX_VISIBILITY(default)
  {
  _GLIBCXX_BEGIN_NAMESPACE_VERSION
+  namespace __mdspan
+  {
+template
+  class _ExtentsStorage
+  {
+  public:
+   static consteval bool
+   _S_is_dyn(size_t __ext) noexcept
+   { return __ext == dynamic_extent; }
+
+   template
+ static constexpr _IndexType
+ _S_int_cast(const _OIndexType& __other) noexcept
+ { return _IndexType(__other); }
+
+   static constexpr size_t _S_rank = _Extents.size();
+
+   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
+   // of dynamic extents up to (and not including) __r.
+   //
+   // If __r is the index of a dynamic extent, then
+   // _S_dynamic_index[__r] is the index of that extent in
+   // _M_dynamic_extents.
+   static constexpr auto _S_dynamic_index = [] consteval
+   {
+ array __ret;
+ size_t __dyn = 0;
+ for(size_t __i = 0; __i < _S_rank; ++__i)
+   {
+ __ret[__i] = __dyn;
+ __dyn += _S_is_dyn(_Extents[__i]);
+   }
+ __ret[_S_rank] = __dyn;
+ return __ret;
+   }();
+
+   static constexpr size_t _S_rank_dynamic =

_S_dynamic_index[_S_rank];

+
+   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r]

is the

+   // index of the __r-th dynamic extent in _Extents.
+   static constexpr auto _S_dynamic_index_inv = [] consteval
+   {
+ array __ret;
+ for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
+   if (_S_is_dyn(_Extents[__i]))
+ __ret[__r++] = __i;
+ return __ret;
+   }();
+
+   static constexpr size_t
+   _S_static_extent(size_t __r) noexcept
+   { return _Extents[__r]; }
+
+   constexpr _IndexType
+   _M_extent(size_t __r) const noexcept
+   {
+ auto __se = _Extents[__r];
+ if (__se == dynamic_extent)
+   return _M_dynamic_extents[_S_dynamic_index[__r]];
+ else
+   return __se;
+   }
+
+   template
+ constexpr void
+ _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
+ {
+   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
+ {
+   size_t __di = __i;
+   if constexpr (_OtherRank != _S_rank_dynamic)
+ __di = _S_dynamic_index_inv[__i];
+   _M_dynamic_extents[__i] =

_S_int_cast(__get_extent(__di));

+ }
+ }
+
+   constexpr
+   _ExtentsStorage() noexcept = default;
+
+   template
+ constexpr
+ _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
+ __other) noexcept
+ {
+   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
+ { return __other._M_extent(__i); });
+ }
+
+   template
+ constexpr
+ _ExtentsStorage(span __exts) noexcept
+ {
+   _M_init_dynamic_extents<_Nm>(
+ [&__exts](size_t __i) -> const _OIndexType&
+ { return __exts[__i]; });
+ }
+
+  private:
+   using _S_storage = __array_traits<_IndexType,

_S_rank_dynamic>::_Type;

+   [[no_unique_address]] _S_storage _M_dynamic_extents;
+  };
+
+template
+  concept __valid_index_type =
+   is_convertible_v<_OIndexType, _SIndexType> &&
+   is_nothrow_constructible_v<_SIndexType, _OIndexType>;
+
+template
+  concept
+  __valid_static_extent =

Re: [PATCH v4] RISC-V: Fix missing implied Zicsr from Zve32x

2025-04-30 Thread Kito Cheng
Thanks, pushe to trunk :)

On Wed, Apr 30, 2025 at 3:35 PM Jerry Zhang Jian
 wrote:
>
> The Zve32x extension depends on the Zicsr extension.
> Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.
>
> gcc/ChangeLog:
> * common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/predef-19.c: set the march to rv64im_zve32x
>   instead of rv64gc_zve32x to avoid Zicsr implied by g. Extra m is
>   added to avoid current 'V' extension requires 'M' extension
>
> Signed-off-by: Jerry Zhang Jian 
> ---
>  gcc/common/config/riscv/riscv-common.cc|  1 +
>  gcc/testsuite/gcc.target/riscv/predef-19.c | 34 +-
>  2 files changed, 8 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 15df22d5377..145a0f2bd95 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"zve64f", "f"},
>{"zve64d", "d"},
>
> +  {"zve32x", "zicsr"},
>{"zve32x", "zvl32b"},
>{"zve32f", "zve32x"},
>{"zve32f", "zvl32b"},
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
> b/gcc/testsuite/gcc.target/riscv/predef-19.c
> index 2b90702192b..ca3d57abca9 100644
> --- a/gcc/testsuite/gcc.target/riscv/predef-19.c
> +++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
> -misa-spec=2.2" } */
> +/* { dg-options "-O2 -march=rv64im_zve32x -mabi=lp64 -mcmodel=medlow 
> -misa-spec=2.2" } */
>
>  int main () {
>
> @@ -15,50 +15,30 @@ int main () {
>  #error "__riscv_i"
>  #endif
>
> -#if !defined(__riscv_c)
> -#error "__riscv_c"
> -#endif
> -
>  #if defined(__riscv_e)
>  #error "__riscv_e"
>  #endif
>
> -#if !defined(__riscv_a)
> -#error "__riscv_a"
> -#endif
> -
>  #if !defined(__riscv_m)
>  #error "__riscv_m"
>  #endif
>
> -#if !defined(__riscv_f)
> -#error "__riscv_f"
> -#endif
> -
> -#if !defined(__riscv_d)
> -#error "__riscv_d"
> -#endif
> -
> -#if defined(__riscv_v)
> -#error "__riscv_v"
> +#if !defined(__riscv_zicsr)
> +#error "__riscv_zicsr"
>  #endif
>
> -#if defined(__riscv_zvl128b)
> -#error "__riscv_zvl128b"
> +#if !defined(_riscv_zmmul)
> +#error "__riscv_zmmul"
>  #endif
>
> -#if defined(__riscv_zvl64b)
> -#error "__riscv_zvl64b"
> +#if !defined(__riscv_zve32x)
> +#error "__riscv_zve32x"
>  #endif
>
>  #if !defined(__riscv_zvl32b)
>  #error "__riscv_zvl32b"
>  #endif
>
> -#if !defined(__riscv_zve32x)
> -#error "__riscv_zve32x"
> -#endif
> -
>  #if !defined(__riscv_vector)
>  #error "__riscv_vector"
>  #endif
> --
> 2.49.0
>


Re: [PATCH v6 1/2] RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.

2025-04-30 Thread Kito Cheng
pushed to trunk

On Tue, Apr 29, 2025 at 9:14 PM Kito Cheng  wrote:
>
> From: yulong 
>
> This version is same as v5, but rebase to trunk, send out to trigger CI.
>
> This commit adds intrinsics support for Xsfvcp extension.
> Diff with V4: Delete the sifive_vector.h file.
>
> Co-Authored by: Jiawei Chen 
> Co-Authored by: Shihua Liao 
> Co-Authored by: Yixuan Chen 
>
> gcc/ChangeLog:
>
> * config/riscv/constraints.md (Ou01): New constraint.
> (Ou02): Ditto.
> * config/riscv/generic-vector-ooo.md (vec_sf_vcp): New reservation.
> * config/riscv/genrvv-type-indexer.cc (main): New type.
> * config/riscv/riscv-c.cc (riscv_pragma_intrinsic): Add xsfvcp 
> strings.
> * config/riscv/riscv-vector-builtins-shapes.cc (struct 
> sf_vcix_se_def): New function.
> (struct sf_vcix_def): Ditto.
> (SHAPE): Ditto.
> * config/riscv/riscv-vector-builtins-shapes.h: Ditto.
> * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_X2_U_OPS): 
> New type.
> (DEF_RVV_X2_WU_OPS): Ditto.
> (vuint8mf8_t): Ditto.
> (vuint8mf4_t): Ditto.
> (vuint8mf2_t): Ditto.
> (vuint8m1_t): Ditto.
> (vuint8m2_t): Ditto.
> (vuint8m4_t): Ditto.
> (vuint16mf4_t): Ditto.
> (vuint16mf2_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint16m2_t): Ditto.
> (vuint16m4_t): Ditto.
> (vuint32mf2_t): Ditto.
> (vuint32m1_t): Ditto.
> (vuint32m2_t): Ditto.
> (vuint32m4_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc (DEF_RVV_X2_U_OPS): New 
> builtins def.
> (DEF_RVV_X2_WU_OPS): Ditto.
> (rvv_arg_type_info::get_scalar_float_type): Ditto.
> (function_instance::modifies_global_state_p): Ditto.
> * config/riscv/riscv-vector-builtins.def (v_x): New base type.
> (i): Ditto.
> (v_i): Ditto.
> (xv): Ditto.
> (iv): Ditto.
> (fv): Ditto.
> (vvv): Ditto.
> (xvv): Ditto.
> (ivv): Ditto.
> (fvv): Ditto.
> (vvw): Ditto.
> (xvw): Ditto.
> (ivw): Ditto.
> (fvw): Ditto.
> (v_vv): Ditto.
> (v_xv): Ditto.
> (v_iv): Ditto.
> (v_fv): Ditto.
> (v_vvv): Ditto.
> (v_xvv): Ditto.
> (v_ivv): Ditto.
> (v_fvv): Ditto.
> (v_vvw): Ditto.
> (v_xvw): Ditto.
> (v_ivw): Ditto.
> (v_fvw): Ditto.
> (x2_vector): Ditto.
> (scalar_float): Ditto.
> * config/riscv/riscv-vector-builtins.h (enum required_ext): New 
> extension.
> (required_ext_to_isa_name): Ditto.
> (required_extensions_specified): Ditto.
> (struct rvv_arg_type_info): Ditto.
> (struct function_group_info): Ditto.
> * config/riscv/riscv.md: New attr.
> * config/riscv/sifive-vector-builtins-bases.cc (class sf_vc): New 
> function.
> (BASE): New base_name.
> * config/riscv/sifive-vector-builtins-bases.h: New function_base.
> * config/riscv/sifive-vector-builtins-functions.def 
> (REQUIRED_EXTENSIONS): New intrinsics def.
> (sf_vc): Ditto.
> * config/riscv/sifive-vector.md (@sf_vc_x_se): New RTL mode.
> (@sf_vc_v_x_se): Ditto.
> (@sf_vc_v_x): Ditto.
> (@sf_vc_i_se): Ditto.
> (@sf_vc_v_i_se): Ditto.
> (@sf_vc_v_i): Ditto.
> (@sf_vc_vv_se): Ditto.
> (@sf_vc_v_vv_se): Ditto.
> (@sf_vc_v_vv): Ditto.
> (@sf_vc_xv_se): Ditto.
> (@sf_vc_v_xv_se): Ditto.
> (@sf_vc_v_xv): Ditto.
> (@sf_vc_iv_se): Ditto.
> (@sf_vc_v_iv_se): Ditto.
> (@sf_vc_v_iv): Ditto.
> (@sf_vc_fv_se): Ditto.
> (@sf_vc_v_fv_se): Ditto.
> (@sf_vc_v_fv): Ditto.
> (@sf_vc_vvv_se): Ditto.
> (@sf_vc_v_vvv_se): Ditto.
> (@sf_vc_v_vvv): Ditto.
> (@sf_vc_xvv_se): Ditto.
> (@sf_vc_v_xvv_se): Ditto.
> (@sf_vc_v_xvv): Ditto.
> (@sf_vc_ivv_se): Ditto.
> (@sf_vc_v_ivv_se): Ditto.
> (@sf_vc_v_ivv): Ditto.
> (@sf_vc_fvv_se): Ditto.
> (@sf_vc_v_fvv_se): Ditto.
> (@sf_vc_v_fvv): Ditto.
> (@sf_vc_vvw_se): Ditto.
> (@sf_vc_v_vvw_se): Ditto.
> (@sf_vc_v_vvw): Ditto.
> (@sf_vc_xvw_se): Ditto.
> (@sf_vc_v_xvw_se): Ditto.
> (@sf_vc_v_xvw): Ditto.
> (@sf_vc_ivw_se): Ditto.
> (@sf_vc_v_ivw_se): Ditto.
> (@sf_vc_v_ivw): Ditto.
> (@sf_vc_fvw_se): Ditto.
> (@sf_vc_v_fvw_se): Ditto.
> (@sf_vc_v_fvw): Ditto.
> * config/riscv/vector-iterators.md: New iterator.
> * config/riscv/vector.md: New vtype.
>
> ---
>  gcc/config/riscv/constraints.md   |  10 +
>  gcc/config/riscv/generic-vector-ooo.md|   4 +
>  gcc/config/riscv/genrvv-type-indexer.cc   |   9 +
>  gcc/config/riscv/riscv-c.cc   |   3 +

Re: [PATCH] RISC-V: Allow different dynamic floating point mode to be merged [PR119832]

2025-04-30 Thread Kito Cheng
pushed to the trunk :)

Vineet: Feel free to drop TARGET_MODE_CONFLUENCE once you have a
better solution :)

On Wed, Apr 30, 2025 at 12:50 PM Robin Dapp  wrote:
>
> > Although we already try to set the mode needed to FRM_DYN after a function 
> > call,
> > there are still some corner cases where both FRM_DYN and FRM_DYN_CALL may 
> > appear
> > on incoming edges.
> >
> > Therefore, we use TARGET_MODE_CONFLUENCE to tell GCC that FRM_DYN, 
> > FRM_DYN_CALL,
> > and FRM_DYN_EXIT modes are compatible.
>
> Just a note: Vineet is working on similar issues right now and mentioned that
> this patch/hook might not be necessary.  But it's going tot take some more 
> time
> until his patches are ready.  So we can go ahead here or wait a bit.
>
> --
> Regards
>  Robin
>


[PATCH] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-04-30 Thread Tomasz Kamiński
This commits adjust the way how the arguments are stored in the _Arg_value
(and thus basic_format_args), by preserving the types of fixed width
floating-point types, that were previously converted to float, double,
long double.

The _Arg_value union now contains alternatives with std::bfloat16_t,
std::float16_t, std::float32_t, std::float64_t that use pre-existing
_Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.

This does not affect formatting, as specialization of formatters for
formats them by casting to the corresponding standard floating point
type.

For the 128bit floating we need to handle the ppc64 architecture,
(_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
basis) designate either __ibm128 and __ieee128 type, we need to store both
types in the _Arg_value and have two _Arg_types (_Arg_ibm128, _Arg_ieee128).
On other architectures we use extra enumerator value to store __float128,
that is different from long double and _Float128. This is consistent with ppc64,
for which __float128 is same type as __ieee128 if present. We use _Arg_float128
_M_float128 names that deviate from _Arg_fN naming scheme, to emphasize that
this flag is not used for std::float128_t (_Float128_t) type, that is 
consistenly
formatted via handle.

The __format::_float128_t type is renamed to __format::__flt128_t, to mitigate
visual confusion between this type and __float128. We also introduce __bflt16_t
typedef instead of using of decltype.

We add new alternative for the _Arg_value and allow them to be accessed via 
_S_get,
when the types are available. However, we produce and handle corresponding 
_Arg_type,
only when we can format them. See also r14-3329-g27d0cfcb2b33de.

The formatter<_Float128, _CharT> that formats via __flt128_t is always
provided, when type is available. It is still correct __flt128_t is _Float128_t.

We also provide formatter<__float128, _CharT> that formats via __flt128_t.
As this type may be disabled (-mno-float128), extra care needs to be taken,
for situation when __float128 is same as long double. If the formatter would be
defined in such case, the formatter would be generated from
different specializations, and have different mangling:
  * formatter<__float128, _CharT> if __float128 is present,
  * formatter<_format::__formattable_float, _CharT> otherwise.
To best of my knowledge this happens only on ppc64 for __ieee128 and __float128,
so the formatter is not defined in this case. static_assert is added to detect
other configurations like that. In such case we should replace it with 
constraint.

PR libstdc++/119246

libstdc++-v3/ChangeLog:

* include/std/format (__format::__bflt16_t): Define.
(_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128 is 
used.
(__format::__float128_t): Renamed to __format::_flt128_t.
(std::formatter<_Float128, _CharT>): Define always if there is 
formattable
128bit float.
(std::formatter<__float128, _CharT>): Define.
(_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
(_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
(_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
(_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
(_Arg_value::_M_ieee128, _Arg_value::_M_float128, _Arg_value::_M_bf16)
(_Arg_value::_M_f16, _Arg_value::_M_f32, _Arg_value::_M_f64): Define.
(_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle __bflt16,
_Float16, _Float32, _Float64, and __float128 types.
(basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
_Float32, _Float64 and __float128 types.
(basic_format_arg::_M_visit): Hadndle _Arg_float128, _Arg_ieee128,
_Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
* testsuite/std/format/arguments/args.cc: Updated to illustrate  that
extended floating point types use handles now. Added test for 
__float128.
* testsuite/std/format/parse_ctx.cc: Extended test to cover class to
check_dynamic_spec with floating point types and handles.
---
Tested on x86_64-linux and powerpc64le-unknown-linux-gnu.
Running additional test on powerpc64le with
unix\{-mabi=ibmlongdouble,-mabi=ieeelongdouble,-mno-float128}.

OK for trunk?

 libstdc++-v3/include/std/format   | 217 --
 .../testsuite/std/format/arguments/args.cc|  45 ++--
 .../testsuite/std/format/parse_ctx.cc |  72 +-
 3 files changed, 227 insertions(+), 107 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 054ce350440..73819f52f50 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1863,20 +1863,24 @@ namespace __format
   _Spec<_CharT> _M_spec{};
 };
 
+#ifdef __BFLT16_DIG__
+   using __bflt16_t = decltype(0.0bf16);
+#endif
+
   // Decide how 128-bit floating-point types should be formatted (or not).
-  // When su

[PATCH] tree-optimization/120003 - missed jump threading

2025-04-30 Thread Richard Biener
The following allows the entry and exit block of a jump thread path
to be equal, which can easily happen when there isn't a forwarder
on the interesting edge for an FSM thread conditional.  We just
don't want to enlarge the path from such a block.

Bootstrapped and tested on x86_64-unknown-linux-gnu, tested the
gcc.dg/tree-ssa/ssa-dom-thread-7.c adjustment on aarch64-linux with
a cross, pushed to trunk.

Richard.

PR tree-optimization/120003
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Allow block re-use but do not enlarge the path beyond such a
re-use.

* gcc.dg/tree-ssa/ssa-thread-23.c: New testcase.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
---
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c|  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-23.c | 19 +++
 gcc/tree-ssa-threadbackward.cc|  8 +++-
 3 files changed, 24 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-23.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index d84aceebc5d..8be9878e0cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -11,8 +11,8 @@
to change decisions in switch expansion which in turn can expose new
jump threading opportunities.  Skip the later tests on aarch64.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! 
aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 9"  "thread2" { target { ! 
aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 17"  "thread2" { target { 
aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 10"  "thread2" { target { ! 
aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 14"  "thread2" { target { 
aarch64*-*-* } } } } */
 
 enum STATE {
   S0=0,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-23.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-23.c
new file mode 100644
index 000..930360a33b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-23.c
@@ -0,0 +1,19 @@
+/* PR120003 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce3-details" } */
+
+extern _Bool g(int);
+
+_Bool f()
+{
+  _Bool retval = 0;
+  for(int i=0; i<100; ++i)
+retval = retval || g(i);
+  return retval;
+}
+
+/* Jump threading after loop optimization should get the counting loop
+   separated from the loop until retval is true and CD-DCE elide it.
+   It's difficult to check for the fact that a true retval terminates
+   the loop so check CD-DCE eliminates one loop instead.  */
+/* { dg-final { scan-tree-dump "fix_loop_structure: removing loop" "cddce3" } 
} */
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 23bfc14c8f0..ce765cb5ded 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -349,9 +349,6 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting,
unsigned overall_paths,
back_threader_profitability &profit)
 {
-  if (m_visited_bbs.add (bb))
-return;
-
   m_path.safe_push (bb);
 
   // Try to resolve the path without looking back.  Avoid resolving paths
@@ -377,7 +374,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting,
   // Continue looking for ways to extend the path but limit the
   // search space along a branch
   else if ((overall_paths = overall_paths * EDGE_COUNT (bb->preds))
-  <= (unsigned)param_max_jump_thread_paths)
+  <= (unsigned)param_max_jump_thread_paths
+  && !m_visited_bbs.add (bb))
 {
   // For further greedy searching we want to remove interesting
   // names defined in BB but add ones on the PHI edges for the
@@ -489,6 +487,7 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting,
 backtracking we have to restore it.  */
   for (int j : new_imports)
bitmap_clear_bit (m_imports, j);
+  m_visited_bbs.remove (bb);
 }
   else if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
@@ -496,7 +495,6 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting,
 
   // Reset things to their original state.
   m_path.pop ();
-  m_visited_bbs.remove (bb);
 }
 
 // Search backwards from BB looking for paths where the final
-- 
2.43.0


[GCC16 stage1][PATCH v3 1/3] Extend "counted_by" attribute to pointer fields of structures.

2025-04-30 Thread Qing Zhao
For example:
struct PP {
  size_t count2;
  char other1;
  char *array2 __attribute__ ((counted_by (count2)));
  int other2;
} *pp;

specifies that the "array2" is an array that is pointed by the
pointer field, and its number of elements is given by the field
"count2" in the same structure.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_counted_by_attribute): Accept counted_by
attribute for pointer fields.

gcc/c/ChangeLog:

* c-decl.cc (verify_counted_by_attribute): Change the 2nd argument
to a vector of fields with counted_by attribute. Verify all fields
in this vector.
(finish_struct): Collect all the fields with counted_by attribute
to a vector and pass this vector to verify_counted_by_attribute.

gcc/ChangeLog:

* doc/extend.texi: Extend counted_by attribute to pointer fields in
structures. Add one more requirement to pointers with counted_by
attribute.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by.c: Update test.
* gcc.dg/pointer-counted-by-2.c: New test.
* gcc.dg/pointer-counted-by-3.c: New test.
* gcc.dg/pointer-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc|  15 ++-
 gcc/c/c-decl.cc  |  91 +++--
 gcc/doc/extend.texi  |  38 +-
 gcc/testsuite/gcc.dg/flex-array-counted-by.c |   2 +-
 gcc/testsuite/gcc.dg/pointer-counted-by-2.c  |  10 ++
 gcc/testsuite/gcc.dg/pointer-counted-by-3.c  | 127 +++
 gcc/testsuite/gcc.dg/pointer-counted-by.c|  73 +++
 7 files changed, 305 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..51d42999578 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2906,16 +2906,18 @@ handle_counted_by_attribute (tree *node, tree name,
" declaration %q+D", name, decl);
   *no_add_attrs = true;
 }
-  /* This attribute only applies to field with array type.  */
-  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+  /* This attribute only applies to field with array type or pointer type.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE)
 {
   error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute is not allowed for a non-array field",
-   name);
+   "%qE attribute is not allowed for a non-array"
+   " or non-pointer field", name);
   *no_add_attrs = true;
 }
   /* This attribute only applies to a C99 flexible array member type.  */
-  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
+  else if (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
+  && !c_flexible_array_member_type_p (TREE_TYPE (decl)))
 {
   error_at (DECL_SOURCE_LOCATION (decl),
"%qE attribute is not allowed for a non-flexible"
@@ -2930,7 +2932,8 @@ handle_counted_by_attribute (tree *node, tree name,
   *no_add_attrs = true;
 }
   /* Issue error when there is a counted_by attribute with a different
- field as the argument for the same flexible array member field.  */
+ field as the argument for the same flexible array member or
+ pointer field.  */
   else if (old_counted_by != NULL_TREE)
 {
   tree old_fieldname = TREE_VALUE (TREE_VALUE (old_counted_by));
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 8c420f22976..53e7b726ee6 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9448,56 +9448,62 @@ c_update_type_canonical (tree t)
 }
 }
 
-/* Verify the argument of the counted_by attribute of the flexible array
-   member FIELD_DECL is a valid field of the containing structure,
-   STRUCT_TYPE, Report error and remove this attribute when it's not.  */
+/* Verify the argument of the counted_by attribute of each of the
+   FIELDS_WITH_COUNTED_BY is a valid field of the containing structure,
+   STRUCT_TYPE, Report error and remove the corresponding attribute
+   when it's not.  */
 
 static void
-verify_counted_by_attribute (tree struct_type, tree field_decl)
+verify_counted_by_attribute (tree struct_type,
+auto_vec *fields_with_counted_by)
 {
-  tree attr_counted_by = lookup_attribute ("counted_by",
-  DECL_ATTRIBUTES (field_decl));
-
-  if (!attr_counted_by)
-return;
+  for (tree field_decl : *fields_with_counted_by)
+{
+  tree attr_counted_by = lookup_attribute ("counted_by",
+   DECL_ATTRIBUTES (field_decl));
 
-  /* If there is an counted_by attribute attached to the field,
- verify it.  */
+  if (!attr_counted_by)
+   continue;
 
-  tr

[GCC16 stage1][PATCH v3 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-30 Thread Qing Zhao
Hi,

This is the 3rd version of the patch set to extend "counted_by" attribute
 to pointer fields of structures.

compared to the 2nd version:

https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681727.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681728.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681729.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681730.html

The major change is:

"The counted_by attribute is allowed for a void pointer field, the element
size of such pointer array is assumed as size 1."

both __builtin_dynamic_object_size and bounds sanitizer handle this.

This patch set includes 3 parts:

1.Extend "counted_by" attribute to pointer fields of structures. 
2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
and use it in builtinin-object-size.
3.Use the counted_by attribute of pointers in array bound checker.

In which, the patch 1 and 2 are simple and straightforward, however, the patch 
3  
is a little complicate due to the following reason:

Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.

When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref. 

So, the patch #3 is a RFC: I do need some comments and suggestions on it.
And I do wonder for the access to pointer arrays:

struct annotated {
  int b;
  int *c __attribute__ ((counted_by (b)));
} *p_array_annotated;

p_array_annotated->c[annotated_index] = 2;

Is it possible to generate ARRAY_REF instead of INDIRECT_REF for the above 
p_array_annotated->c[annotated_index]
in C FE? then we can keep the INDEX info in the IR and avoid all the hacks 
to get the index from the OFFSET computation expression.

The whole patch set has been rebased on the latest trunk, bootstrapped 
and regression tested on both aarch64 and x86.

Please let me know whether the patch 1 and patch 2 are ready for thunk?
and any suggestions and comments on the patch 3?
 
Thanks a lot.

Qing




the first version was submitted 3 months ago on 1/16/2025, and triggered
a lot of discussion on whether we need a new syntax for counted_by
attribute.

https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html

After a long discussion since then: 
(https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)

We agreed to the following compromised solution:

1. Keep the current syntax of counted_by for lone identifier;
2. Add a new attribute "counted_by_exp" for expressions.

Although there are still some discussion going on for the new 
counted_by_exp attribute (In Clang community) 
https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885

The syntax for the lone identifier is kept the same as before.

So, I'd like to resubmit my previous patch of extending "counted_by"
to pointer fields of structures. 

The whole patch set has been rebased on the latest trunk, some testing case
adjustment,  bootstrapped  and regression tested on both aarch64 and x86.

There will be a seperate patch set for the new "counted_by_exp" 
attribute later to cover the expressions cases.

The following are more details on this patch set:

For example:

struct PP {
  size_t count2;
  char other1;
  char *array2 __attribute__ ((counted_by (count2)));
  int other2;
} *pp;

specifies that the "array2" is an array that is pointed by the
pointer field, and its number of elements is given by the field
"count2" in the same structure.

There are the following importand facts about "counted_by" on pointer
fields compared to the "counted_by" on FAM fields:

1. one more new requirement for pointer fields with "counted_by" attribute:
   pp->array2 and pp->count2 can ONLY be changed by changing the whole structure
   at the same time.

2. the following feature for FAM field with "counted_by" attribute is NOT
   valid for the pointer field any more:

" One important feature of the attribute is, a reference to the
 flexible array member field uses the latest value assigned to the
 field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' uses 'val1' as the number of the elements in
 'p->array', and 'ref2' uses 'val2' as the number of elements in
 'p->array'. "


[RFC][GCC16 stage1][PATCH v3 3/3] Use the counted_by attribute of pointers in array bound checker.

2025-04-30 Thread Qing Zhao
Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.

When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref.  I.e.

Given an OFFSET expression, and the ELEMENT_SIZE,
get the index expression from the OFFSET.
For example:
  OFFSET:
   ((long unsigned int) m * (long unsigned int) SAVE_EXPR ) * 4
  ELEMENT_SIZE:
   (sizetype) SAVE_EXPR  * 4
get the index as (long unsigned int) m.

gcc/c-family/ChangeLog:

* c-gimplify.cc (ubsan_walk_array_refs_r): Instrument INDIRECT_REF
with .ACCESS_WITH_SIZE in its address computation.
* c-ubsan.cc (ubsan_instrument_bounds): Format change.
(ubsan_instrument_bounds_pointer): New function.
(get_factors_from_mul_expr): New function.
(get_index_from_offset): New function.
(get_index_from_pointer_addr_expr): New function.
(is_instrumentable_pointer_array): New function.
(ubsan_array_ref_instrumented_p): Handle INDIRECT_REF.
(ubsan_maybe_instrument_array_ref): Handle INDIRECT_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-6.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
---
 gcc/c-family/c-gimplify.cc|  28 ++
 gcc/c-family/c-ubsan.cc   | 289 +-
 .../ubsan/pointer-counted-by-bounds-2.c   |  47 +++
 .../ubsan/pointer-counted-by-bounds-3.c   |  35 +++
 .../ubsan/pointer-counted-by-bounds-4.c   |  35 +++
 .../ubsan/pointer-counted-by-bounds-5.c   |  33 ++
 .../ubsan/pointer-counted-by-bounds-6.c   |  40 +++
 .../gcc.dg/ubsan/pointer-counted-by-bounds.c  |  46 +++
 8 files changed, 537 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-5.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds-6.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pointer-counted-by-bounds.c

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index c6fb7646567..e905059708f 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -66,6 +66,20 @@ along with GCC; see the file COPYING3.  If not see
 walk back up, we check that they fit our constraints, and copy them
 into temporaries if not.  */
 
+
+/* Check whether TP is an address computation whose base is a call to
+   .ACCESS_WITH_SIZE.  */
+
+static bool
+is_address_with_access_with_size (tree tp)
+{
+  if (TREE_CODE (tp) == POINTER_PLUS_EXPR
+  && (TREE_CODE (TREE_OPERAND (tp, 0)) == INDIRECT_REF)
+  && (is_access_with_size_p (TREE_OPERAND (TREE_OPERAND (tp, 0), 0
+   return true;
+  return false;
+}
+
 /* Callback for c_genericize.  */
 
 static tree
@@ -121,6 +135,20 @@ ubsan_walk_array_refs_r (tree *tp, int *walk_subtrees, 
void *data)
   walk_tree (&TREE_OPERAND (*tp, 1), ubsan_walk_array_refs_r, pset, pset);
   walk_tree (&TREE_OPERAND (*tp, 0), ubsan_walk_array_refs_r, pset, pset);
 }
+  else if (TREE_CODE (*tp) == INDIRECT_REF
+  && is_address_with_access_with_size (TREE_OPERAND (*tp, 0)))
+{
+  ubsan_maybe_instrument_array_ref (&TREE_OPERAND (*tp, 0), false);
+  /* Make sure ubsan_maybe_instrument_array_ref is not called again on
+the POINTER_PLUS_EXPR, so ensure it is not walked again and walk
+its subtrees manually.  */
+  tree aref = TREE_OPERAND (*tp, 0);
+  pset->add (aref);
+  *walk_subtrees = 0;
+  walk_tree (&TREE_OPERAND (aref, 0), ubsan_walk_array_refs_r, pset, pset);
+}
+  else if (is_address_with_access_with_size (*tp))
+ubsan_maybe_instrument_array_ref (tp, true);
   return NULL_TREE;
 }
 
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 78b78685469..e8b0cce91d3 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -554,38 +554,295 @@ ubsan_instrument_bounds (location_t loc, tree array, 
tree *index,
   *index, bound);
 }
 
-/* Return true iff T is an array that was instrumented by SANITIZE_BOUNDS.  */
+
+/* Instrument array bounds for the pointer array address which is
+   an INDIRECT_REF to the call to .ACCESS_WITH_SIZE.  We create special
+   builtin, that gets expanded in the sanopt pass, and make an array
+   dimention of it.  POINTER_ADDR is

[GCC16 stage1][PATCH v3 2/3] Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE and use it in builtinin-object-size.

2025-04-30 Thread Qing Zhao
gcc/c/ChangeLog:

* c-typeck.cc (build_counted_by_ref): Handle pointers with counted_by.
(build_access_with_size_for_counted_by): Likewise.

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): Handle pointers
with counted_by.
(collect_object_sizes_for): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/pointer-counted-by-4.c: New test.
* gcc.dg/pointer-counted-by-5.c: New test.
* gcc.dg/pointer-counted-by-6.c: New test.
* gcc.dg/pointer-counted-by-7.c: New test.
* gcc.dg/pointer-counted-by-8.c: New test.
---
 gcc/c/c-typeck.cc   | 42 --
 gcc/testsuite/gcc.dg/pointer-counted-by-4.c | 63 +
 gcc/testsuite/gcc.dg/pointer-counted-by-5.c | 48 
 gcc/testsuite/gcc.dg/pointer-counted-by-6.c | 47 +++
 gcc/testsuite/gcc.dg/pointer-counted-by-7.c | 30 ++
 gcc/testsuite/gcc.dg/pointer-counted-by-8.c | 30 ++
 gcc/tree-object-size.cc | 17 --
 7 files changed, 256 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-5.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-6.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-7.c
 create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-8.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 55d896e02df..7cb19f4a239 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2928,8 +2928,8 @@ should_suggest_deref_p (tree datum_type)
 
 /* For a SUBDATUM field of a structure or union DATUM, generate a REF to
the object that represents its counted_by per the attribute counted_by
-   attached to this field if it's a flexible array member field, otherwise
-   return NULL_TREE.
+   attached to this field if it's a flexible array member or a pointer
+   field, otherwise return NULL_TREE.
Set COUNTED_BY_TYPE to the TYPE of the counted_by field.
For example, if:
 
@@ -2950,7 +2950,9 @@ static tree
 build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type)
 {
   tree type = TREE_TYPE (datum);
-  if (!c_flexible_array_member_type_p (TREE_TYPE (subdatum)))
+  tree sub_type = TREE_TYPE (subdatum);
+  if (!c_flexible_array_member_type_p (sub_type)
+  && TREE_CODE (sub_type) != POINTER_TYPE)
 return NULL_TREE;
 
   tree attr_counted_by = lookup_attribute ("counted_by",
@@ -2981,8 +2983,11 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 }
 
 /* Given a COMPONENT_REF REF with the location LOC, the corresponding
-   COUNTED_BY_REF, and the COUNTED_BY_TYPE, generate an INDIRECT_REF
-   to a call to the internal function .ACCESS_WITH_SIZE.
+   COUNTED_BY_REF, and the COUNTED_BY_TYPE, generate the corresponding
+   call to the internal function .ACCESS_WITH_SIZE.
+
+   Generate an INDIRECT_REF to a call to the internal function
+   .ACCESS_WITH_SIZE.
 
REF
 
@@ -2992,17 +2997,15 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
(TYPE_OF_ARRAY *)0))
 
NOTE: The return type of this function is the POINTER type pointing
-   to the original flexible array type.
-   Then the type of the INDIRECT_REF is the original flexible array type.
-
-   The type of the first argument of this function is a POINTER type
-   to the original flexible array type.
+   to the original flexible array type or the original pointer type.
+   Then the type of the INDIRECT_REF is the original flexible array type
+   or the original pointer type.
 
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
-   The 6th argument of the call is a constant 0 with the pointer TYPE
-   to the original flexible array type.
+   The 6th argument of the call is a constant 0 of the same TYPE as
+   the return type of the call.
 
   */
 static tree
@@ -3010,11 +3013,16 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   tree counted_by_ref,
   tree counted_by_type)
 {
-  gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
-  /* The result type of the call is a pointer to the flexible array type.  */
+  gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref))
+ || TREE_CODE (TREE_TYPE (ref)) == POINTER_TYPE);
+  bool is_fam = c_flexible_array_member_type_p (TREE_TYPE (ref));
+  tree first_param = is_fam ? array_to_pointer_conversion (loc, ref)
+: build_unary_op (loc, ADDR_EXPR, ref, false);
+
+  /* The result type of the call is a pointer to the original type
+ of the ref.  */
   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
-  tree first_param
-= c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
+  first_param = c_fully_fold (first_param, false, NULL);
 

Re: [PATCH] libgcc: Update FMV features to latest ACLE spec 2024Q4

2025-04-30 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Update FMV features to latest ACLE spec of 2024Q4 - several features have 
> been removed
> or merged.  Add FMV support for CSSC and MOPS.  Preserve the ordering in enum 
> CPUFeatures.
>
> gcc:
> * common/config/aarch64/cpuinfo.h: Remove unused features, add 
> FEAT_CSSC
> and FEAT_MOPS. 
> * config/aarch64/aarch64-option-extensions.def: Remove FMV support
> for RPRES, use PULL rather than AES, add FMV support for CSSC and 
> MOPS.
>
> libgcc:
> * config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
> Remove unused features, add support for CSSC and MOPS.

OK, thanks.

Not your fault, but I was initially confused that, when two
architectural features are combined into a single FMV feature, the
FEAT_* macro names followed the "more specific" architectural feature,
whereas the user-visible FMV feature names followed the "least specific"
architectural feature.  E.g. architectural FEAT_SBBS + FEAT_SBBS2 is
represented by the macro FEAT_SBBS2 and the feature string "sbbs".
Similarly FEAT_AES + FEAT_PMULL is represented by the macro FEAT_PMULL
and the feature string "aes".  But I can see why the macro naming
makes sense in the context of hwcaps checks, and in any case we should
maintain compatibility with compiler-rt rather than pick our own naming.

Richard
>
> ---
>
> diff --git a/gcc/common/config/aarch64/cpuinfo.h 
> b/gcc/common/config/aarch64/cpuinfo.h
> index 
> cd3c2b20c5315b035870528fa39246bbc780f369..d329d861bf73fbb03436643d09553d7eabecdfb8
>  100644
> --- a/gcc/common/config/aarch64/cpuinfo.h
> +++ b/gcc/common/config/aarch64/cpuinfo.h
> @@ -39,10 +39,10 @@ enum CPUFeatures {
>FEAT_FP,
>FEAT_SIMD,
>FEAT_CRC,
> -  FEAT_SHA1,
> +  FEAT_CSSC,
>FEAT_SHA2,
>FEAT_SHA3,
> -  FEAT_AES,
> +  FEAT_unused5,
>FEAT_PMULL,
>FEAT_FP16,
>FEAT_DIT,
> @@ -53,30 +53,30 @@ enum CPUFeatures {
>FEAT_RCPC,
>FEAT_RCPC2,
>FEAT_FRINTTS,
> -  FEAT_DGH,
> +  FEAT_unused6,
>FEAT_I8MM,
>FEAT_BF16,
> -  FEAT_EBF16,
> -  FEAT_RPRES,
> +  FEAT_unused7,
> +  FEAT_unused8,
>FEAT_SVE,
> -  FEAT_SVE_BF16,
> -  FEAT_SVE_EBF16,
> -  FEAT_SVE_I8MM,
> +  FEAT_unused9,
> +  FEAT_unused10,
> +  FEAT_unused11,
>FEAT_SVE_F32MM,
>FEAT_SVE_F64MM,
>FEAT_SVE2,
> -  FEAT_SVE_AES,
> +  FEAT_unused12,
>FEAT_SVE_PMULL128,
>FEAT_SVE_BITPERM,
>FEAT_SVE_SHA3,
>FEAT_SVE_SM4,
>FEAT_SME,
> -  FEAT_MEMTAG,
> +  FEAT_unused13,
>FEAT_MEMTAG2,
> -  FEAT_MEMTAG3,
> +  FEAT_unused14,
>FEAT_SB,
>FEAT_unused1,
> -  FEAT_SSBS,
> +  FEAT_unused15,
>FEAT_SSBS2,
>FEAT_BTI,
>FEAT_unused2,
> @@ -87,6 +87,7 @@ enum CPUFeatures {
>FEAT_SME_I64,
>FEAT_SME2,
>FEAT_RCPC3,
> +  FEAT_MOPS,
>FEAT_MAX,
>FEAT_EXT = 62, /* Reserved to indicate presence of additional features 
> field
>   in __aarch64_cpu_features.  */
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 
> 79b79358c5d4a9e23c7601f7a1ba742dddadb778..b111b33d9bc75c8b85faf672fba051e0e417b796
>  100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -128,7 +128,9 @@ AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), 
> "sha1 sha2")
>  
>  AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
>  
> -AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
> +AARCH64_OPT_EXTENSION("aes", AES, (SIMD), (), (), "aes")
> +
> +AARCH64_FMV_FEATURE("aes", PMULL, (AES))
>  
>  /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
> (such as SHA3 and the SVE2 crypto extensions).  */
> @@ -171,8 +173,6 @@ AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), 
> "i8mm")
> instructions.  */
>  AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
>  
> -AARCH64_FMV_FEATURE("rpres", RPRES, ())
> -
>  AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16, FCMA), (), (), "sve")
>  
>  /* This specifically does not imply +sve.  */
> @@ -190,7 +190,7 @@ AARCH64_OPT_FMV_EXTENSION("sve2", SVE2, (SVE), (), (), 
> "sve2")
>  
>  AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), (), (), "sveaes")
>  
> -AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2_AES))
> +AARCH64_FMV_FEATURE("sve2-aes", SVE_PMULL128, (SVE2_AES))
>  
>  AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
> "svebitperm")
> @@ -245,9 +245,9 @@ AARCH64_OPT_EXTENSION("sme-b16b16", SME_B16B16, (SME2, 
> SVE_B16B16), (), (), "sme
>  
>  AARCH64_OPT_EXTENSION("sme-f16f16", SME_F16F16, (SME2), (), (), "smef16f16")
>  
> -AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")
> +AARCH64_OPT_FMV_EXTENSION("mops", MOPS, (), (), (), "mops")
>  
> -AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
> +AARCH64_OPT_FMV_EXTENSION("cssc", CSSC, (), (), (), "cssc")
>  
>  AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
>  
> dif

Re: [PATCH] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-04-30 Thread Tomasz Kaminski
On Wed, Apr 30, 2025 at 1:26 PM Tomasz Kamiński  wrote:

> This commits adjust the way how the arguments are stored in the _Arg_value
> (and thus basic_format_args), by preserving the types of fixed width
> floating-point types, that were previously converted to float, double,
> long double.
>
> The _Arg_value union now contains alternatives with std::bfloat16_t,
> std::float16_t, std::float32_t, std::float64_t that use pre-existing
> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
>
> This does not affect formatting, as specialization of formatters for
> formats them by casting to the corresponding standard floating point
> type.
>
> For the 128bit floating we need to handle the ppc64 architecture,
> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
> basis) designate either __ibm128 and __ieee128 type, we need to store both
> types in the _Arg_value and have two _Arg_types (_Arg_ibm128,
> _Arg_ieee128).
> On other architectures we use extra enumerator value to store __float128,
> that is different from long double and _Float128. This is consistent with
> ppc64,
> for which __float128 is same type as __ieee128 if present. We use
> _Arg_float128
> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize
> that
> this flag is not used for std::float128_t (_Float128_t) type, that is
> consistenly
> formatted via handle.
>
> The __format::_float128_t type is renamed to __format::__flt128_t, to
> mitigate
> visual confusion between this type and __float128. We also introduce
> __bflt16_t
> typedef instead of using of decltype.
>
> We add new alternative for the _Arg_value and allow them to be accessed
> via _S_get,
> when the types are available. However, we produce and handle corresponding
> _Arg_type,
> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
>
> The formatter<_Float128, _CharT> that formats via __flt128_t is always
> provided, when type is available. It is still correct __flt128_t is
> _Float128_t.
>
> We also provide formatter<__float128, _CharT> that formats via __flt128_t.
> As this type may be disabled (-mno-float128), extra care needs to be taken,
> for situation when __float128 is same as long double. If the formatter
> would be
> defined in such case, the formatter would be generated
> from
> different specializations, and have different mangling:
>   * formatter<__float128, _CharT> if __float128 is present,
>   * formatter<_format::__formattable_float, _CharT> otherwise.
> To best of my knowledge this happens only on ppc64 for __ieee128 and
> __float128,
> so the formatter is not defined in this case. static_assert is added to
> detect
> other configurations like that. In such case we should replace it with
> constraint.
>
> PR libstdc++/119246
>
> libstdc++-v3/ChangeLog:
>
> * include/std/format (__format::__bflt16_t): Define.
> (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128
> is used.
> (__format::__float128_t): Renamed to __format::_flt128_t.
> (std::formatter<_Float128, _CharT>): Define always if there is
> formattable
> 128bit float.
> (std::formatter<__float128, _CharT>): Define.
> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
> (_Arg_value::_M_ieee128, _Arg_value::_M_float128,
> _Arg_value::_M_bf16)
> (_Arg_value::_M_f16, _Arg_value::_M_f32, _Arg_value::_M_f64):
> Define.
> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle
> __bflt16,
> _Float16, _Float32, _Float64, and __float128 types.
> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
> _Float32, _Float64 and __float128 types.
> (basic_format_arg::_M_visit): Hadndle _Arg_float128, _Arg_ieee128,
> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
> * testsuite/std/format/arguments/args.cc: Updated to illustrate
> that
> extended floating point types use handles now. Added test for
> __float128.
> * testsuite/std/format/parse_ctx.cc: Extended test to cover class
> to
> check_dynamic_spec with floating point types and handles.
> ---
> Tested on x86_64-linux and powerpc64le-unknown-linux-gnu.
> Running additional test on powerpc64le with
> unix\{-mabi=ibmlongdouble,-mabi=ieeelongdouble,-mno-float128}.
>
The -mabi=ibmlongdouble and -mabi=ieeelongdouble passed.
The  -mno-float128 seem to be not handled on trunk due use of __float128
instead of __ieee128 here:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/std/charconv;h=dda49ce72d0b53c7a6e86c2e3fb510d0218fd5a6;hb=HEAD#l878
Interestingly from chars use __ieee128:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/std/charconv;h=dda49ce72d0b53c7a6e86c2e3fb510d0218fd5a6;hb=H

Revert 51ba233fe2db562390a6e0a3618420889761bc77

2025-04-30 Thread Richard Biener


I accidentially pushed the wrong commit, promptly reverted.

Richard.


Re: [PATCH] libgcc: Cleanup HWCAP defines in cpuinfo.c

2025-04-30 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Cleanup HWCAP defines - rather than including hwcap.h and then repeating it 
> using
> #ifndef, just define the HWCAPs we need exactly as in hwcap.h.
>
> libgcc:
> * config/aarch64/cpuinfo.c: Cleanup HWCAP defines.

Thanks for doing this.  I agree it's a significant improvement over
the status quo.  It will be a while before we can reasonably require
a glibc that provides asm/hwcap.h.  And even once we do, new hwcaps
will be added on a regular basis, so we can never rely totally on the
installed header file.  And if, as now, glibc needs to extend the
structure, we can't rely on the installed header file having the
most up-to-date definition.

So in practice, if we stuck to the current approach of trying to
include asm/hwcaps.h, we'd always need to maintain a fallback definition,
and would need increasingly complex code to decide whether the installed
header file is new enough.  Having two versions doubles the amount of
testing needed.  It also increases rather than decreases the amount of
code.

OK for trunk from my POV, but please leave 24 hrs for others to object.

Richard

>
> ---
>
> diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
> index 
> 14877f5d8410a51fe47f94cb1a2ff5399f95c253..a4958a00fb53d77a0c2291337543324951ca20df
>  100644
> --- a/libgcc/config/aarch64/cpuinfo.c
> +++ b/libgcc/config/aarch64/cpuinfo.c
> @@ -37,211 +37,64 @@ typedef struct __ifunc_arg_t {
>  } __ifunc_arg_t;
>  #endif
>  
> -#if __has_include()
> -#include 
> -
>  /* Architecture features used in Function Multi Versioning.  */
>  struct {
>unsigned long long features;
>/* As features grows new fields could be added.  */
>  } __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
>  
> -#ifndef _IFUNC_ARG_HWCAP
>  #define _IFUNC_ARG_HWCAP (1ULL << 62)
> -#endif
> -#ifndef AT_HWCAP
>  #define AT_HWCAP 16
> -#endif
> -#ifndef HWCAP_FP
> -#define HWCAP_FP (1 << 0)
> -#endif
> -#ifndef HWCAP_ASIMD
> -#define HWCAP_ASIMD (1 << 1)
> -#endif
> -#ifndef HWCAP_EVTSTRM
> -#define HWCAP_EVTSTRM (1 << 2)
> -#endif
> -#ifndef HWCAP_AES
> -#define HWCAP_AES (1 << 3)
> -#endif
> -#ifndef HWCAP_PMULL
> -#define HWCAP_PMULL (1 << 4)
> -#endif
> -#ifndef HWCAP_SHA1
> -#define HWCAP_SHA1 (1 << 5)
> -#endif
> -#ifndef HWCAP_SHA2
> -#define HWCAP_SHA2 (1 << 6)
> -#endif
> -#ifndef HWCAP_CRC32
> -#define HWCAP_CRC32 (1 << 7)
> -#endif
> -#ifndef HWCAP_ATOMICS
> -#define HWCAP_ATOMICS (1 << 8)
> -#endif
> -#ifndef HWCAP_FPHP
> -#define HWCAP_FPHP (1 << 9)
> -#endif
> -#ifndef HWCAP_ASIMDHP
> -#define HWCAP_ASIMDHP (1 << 10)
> -#endif
> -#ifndef HWCAP_CPUID
> -#define HWCAP_CPUID (1 << 11)
> -#endif
> -#ifndef HWCAP_ASIMDRDM
> -#define HWCAP_ASIMDRDM (1 << 12)
> -#endif
> -#ifndef HWCAP_JSCVT
> -#define HWCAP_JSCVT (1 << 13)
> -#endif
> -#ifndef HWCAP_FCMA
> -#define HWCAP_FCMA (1 << 14)
> -#endif
> -#ifndef HWCAP_LRCPC
> -#define HWCAP_LRCPC (1 << 15)
> -#endif
> -#ifndef HWCAP_DCPOP
> -#define HWCAP_DCPOP (1 << 16)
> -#endif
> -#ifndef HWCAP_SHA3
> -#define HWCAP_SHA3 (1 << 17)
> -#endif
> -#ifndef HWCAP_SM3
> -#define HWCAP_SM3 (1 << 18)
> -#endif
> -#ifndef HWCAP_SM4
> -#define HWCAP_SM4 (1 << 19)
> -#endif
> -#ifndef HWCAP_ASIMDDP
> -#define HWCAP_ASIMDDP (1 << 20)
> -#endif
> -#ifndef HWCAP_SHA512
> -#define HWCAP_SHA512 (1 << 21)
> -#endif
> -#ifndef HWCAP_SVE
> -#define HWCAP_SVE (1 << 22)
> -#endif
> -#ifndef HWCAP_ASIMDFHM
> -#define HWCAP_ASIMDFHM (1 << 23)
> -#endif
> -#ifndef HWCAP_DIT
> -#define HWCAP_DIT (1 << 24)
> -#endif
> -#ifndef HWCAP_ILRCPC
> -#define HWCAP_ILRCPC (1 << 26)
> -#endif
> -#ifndef HWCAP_FLAGM
> -#define HWCAP_FLAGM (1 << 27)
> -#endif
> -#ifndef HWCAP_SSBS
> -#define HWCAP_SSBS (1 << 28)
> -#endif
> -#ifndef HWCAP_SB
> -#define HWCAP_SB (1 << 29)
> -#endif
> -#ifndef HWCAP_PACA
> -#define HWCAP_PACA (1 << 30)
> -#endif
> -#ifndef HWCAP_PACG
> -#define HWCAP_PACG (1UL << 31)
> -#endif
> -
> -#ifndef AT_HWCAP2
>  #define AT_HWCAP2 26
> -#endif
> -#ifndef HWCAP2_DCPODP
> -#define HWCAP2_DCPODP (1 << 0)
> -#endif
> -#ifndef HWCAP2_SVE2
> -#define HWCAP2_SVE2 (1 << 1)
> -#endif
> -#ifndef HWCAP2_SVEAES
> -#define HWCAP2_SVEAES (1 << 2)
> -#endif
> -#ifndef HWCAP2_SVEPMULL
> -#define HWCAP2_SVEPMULL (1 << 3)
> -#endif
> -#ifndef HWCAP2_SVEBITPERM
> -#define HWCAP2_SVEBITPERM (1 << 4)
> -#endif
> -#ifndef HWCAP2_SVESHA3
> -#define HWCAP2_SVESHA3 (1 << 5)
> -#endif
> -#ifndef HWCAP2_SVESM4
> -#define HWCAP2_SVESM4 (1 << 6)
> -#endif
> -#ifndef HWCAP2_FLAGM2
> -#define HWCAP2_FLAGM2 (1 << 7)
> -#endif
> -#ifndef HWCAP2_FRINT
> -#define HWCAP2_FRINT (1 << 8)
> -#endif
> -#ifndef HWCAP2_SVEI8MM
> -#define HWCAP2_SVEI8MM (1 << 9)
> -#endif
> -#ifndef HWCAP2_SVEF32MM
> -#define HWCAP2_SVEF32MM (1 << 10)
> -#endif
> -#ifndef HWCAP2_SVEF64MM
> -#define HWCAP2_SVEF64MM (1 << 11)
> -#endif
> -#ifndef HWCAP2_SVEBF16
> -#define HWCAP2_SVEBF16 (1 << 12)
> -#endif
> -#ifndef HWCAP2_I8MM
> -#define HWCAP2_I8MM (1 << 13)
> -#endif
> -#ifndef HWCAP2_B

Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-04-30 Thread Iain Sandoe



> On 30 Apr 2025, at 09:26, Richard Biener  wrote:
> 
> On Wed, Apr 30, 2025 at 9:03 AM Andrew Pinski  wrote:
>> 
>> On Tue, Apr 29, 2025 at 11:49 PM Richard Biener
>>  wrote:
>>> 
>>> On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  
>>> wrote:
 
 When we have an empty function, things can go wrong with
 cfi_startproc/cfi_endproc and a few other things like exceptions. So if
 the only thing the function does is a call to __builtin_unreachable,
 let's expand that to a __builtin_trap instead. For most targets that
 is one instruction wide so it won't hurt things that much and we get
 correct behavior for exceptions and some linkers will be better for it.
 
 The only thing I have a concern about is that some targets still
 don't define a trap instruction. I tried to emit a nop instead of
 an abort but that nop is removed during RTL DCE.
 Should we just push targets to define a trap instead?
 E.g. BPF, avr and sh are the 3 semi active targets which still don't
 have a trap defined.
>>> 
>>> Do any of those targets have the cfi_startproc/cfi_endproc issue
>>> or exceptions are relevant on those?
>> 
>> Yes, the sh target is the one which can run fully Linux even. There is
>> an open bug about sh not having trap pattern implemented yet;
>> https://gcc.gnu.org/PR70216; been open for 9 years now too.
>> 
>>> 
>>> I'd say guard this with targetm.have_trap (), there's the chance that
>>> say on avr the expansion to abort() might fail to link in a
>>> freestanding environment.
>> 
>> I was thinking of that even (I even accidently left in the include for
>> target.h :) )
>> 
>>> 
>>> As for the nop, if you mark it volatile does it prevail?
>> 
>> I don't even know how to mark the rtl insn as volatile.
>> the volatil field for INSN is listed as being if it was deleted:
>> 1 in an INSN, CALL_INSN, JUMP_INSN, CODE_LABEL, BARRIER, or NOTE
>> if it has been deleted.
>> So that won't help.
>> 
>> Now we could use the `used` field for this marking. I have not looked
>> at what it could take to make sure it does not get deleted though.
> 
> I wonder if a general fallback for expanding a trap could be
> 
> label:
>jmp label;
> 
> a nop in general wouldn't do (in this particular case it would, but then
> not as expansion for __builtin_unreachable_trap ()).
> 
> But yeah, we should possibly force targets to implement a trap
> instruction, but more thorougly document what should happen
> (the program should stop [making progress]).

We have nearly reached the same overall design as my original patch.

In that case, I provided a target hook so that targets could opt in or out
of the treatment (and, presumably, could even make that conditional on
some -munreachble-is-trap).

(FAOD, I don’t have any specific attachment to any solution, so long as
 we eventually get one)

Iain

> 
> Richard.
> 
>> Thanks,
>> Andrew
>> 
>> 
>>> 
 The QOI idea for basic block reorder is recorded as PR 120004.
 
 Changes since v1:
 * v2: Move to final gimple cfg cleanup instead of expand and use
  BUILT_IN_UNREACHABLE_TRAP.
 
 Bootstrapped and tested on x86_64-linux-gnu.
 
PR middle-end/109267
 
 gcc/ChangeLog:
 
* tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If the 
 first
non debug statement in the first (and only) basic block is a call
to __builtin_unreachable change it to a call to __builtin_trap.
 
 gcc/testsuite/ChangeLog:
 
* gcc.dg/pr109267-1.c: New test.
* gcc.dg/pr109267-2.c: New test.
 
 Signed-off-by: Andrew Pinski 
 ---
 gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
 gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
 gcc/tree-cfgcleanup.cc| 14 ++
 3 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
 
 diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
 b/gcc/testsuite/gcc.dg/pr109267-1.c
 new file mode 100644
 index 000..d6df2c3b49a
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
 @@ -0,0 +1,14 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-optimized" } */
 +
 +/* PR middle-end/109267 */
 +
 +int f(void)
 +{
 +  __builtin_unreachable();
 +}
 +
 +/* This unreachable should be changed to be a trap. */
 +
 +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 1 
 "optimized"} } */
 +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" 
 "optimized"} } */
 diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
 b/gcc/testsuite/gcc.dg/pr109267-2.c
 new file mode 100644
 index 000..6cd1419a1e3
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
 @@ -0,0 +1,14 @@
 +/* { dg-do compile }

Re: [PATCH] x86: Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
>
> SSE_FIRST_REG was added to CLASS_LIKELY_SPILLED_P, which became
> TARGET_CLASS_LIKELY_SPILLED_P, for
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40470
>
> Since RA has been improved and xmm0 is a commonly used register, remove
> SSE_FIRST_REG from ix86_class_likely_spilled_p to improve xmm0 codegen:

While the AVX version of pblendvb doesn't use XMM0 as an architectural
register, there are still plenty of other cases, e.g. PCMPESTM and
PCMPISTRM, SHA and KEYLOCKER insns, so you are risking RA failures
with these insn if the life of XMM0 is extended.

OTOH, the (unrelated?) ternlog change changes the expander, where the
expander does subreg tricks on memory operand, which doesn' look
correct to me. adjust_address should be used instead.


Uros.


[PATCH] ipa/120006 - wrong code with IPA PTA

2025-04-30 Thread Richard Biener
When PTA gets support for special-handling more builtins in
find_func_aliases the corresponding code in find_func_clobbers
needs updating as well since for unhandled cases it assumes
the former will populate ESCAPED accordingly.  The following
fixes a few omissions, the testcase runs into the missing strdup
handling.  I believe the more advanced handling using modref
results and fnspecs opened a larger gap, the proper fix is to
merge both functions, gating the clobber/use part on a parameter
to avoid diverging.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

PR ipa/120006
* tree-ssa-structalias.cc (find_func_clobbers): Handle
strdup, strndup, realloc, index, strchr, strrchr, memchr,
strstr, strpbrk builtins like find_func_aliases does.

* gcc.dg/torture/pr120006.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr120006.c | 31 +
 gcc/tree-ssa-structalias.cc | 36 +
 2 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr120006.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr120006.c 
b/gcc/testsuite/gcc.dg/torture/pr120006.c
new file mode 100644
index 000..c067f0ef9ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr120006.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fipa-pta" } */
+
+char *b;
+int f = 1;
+
+char *xstrdup(char *i) {
+  char *c = __builtin_strdup(i);
+  if (!c)
+__builtin_exit(1);
+  return c;
+}
+
+int main() {
+  char g;
+  char h[8];
+
+  for (int i = 0; i < 2; i++) {
+char c = *__builtin_strdup("");
+b = &g;
+
+if (f) {
+  h[0] = '-';
+  h[1] = 'a';
+  h[2] = '\0';
+  b = xstrdup(h);
+   }
+  }
+  if (__builtin_strcmp(b, "-a") != 0)
+__builtin_abort();
+}
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index f79b54284c6..3ad0c69930c 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -5583,6 +5583,42 @@ find_func_clobbers (struct function *fn, gimple *origt)
  process_ipa_clobber (fi, gimple_call_arg (t, 2));
  return;
}
+ /* The following functions use what their first argument
+points to.  */
+ case BUILT_IN_STRDUP:
+ case BUILT_IN_STRNDUP:
+ case BUILT_IN_REALLOC:
+ case BUILT_IN_INDEX:
+ case BUILT_IN_STRCHR:
+ case BUILT_IN_STRRCHR:
+ case BUILT_IN_MEMCHR:
+   {
+ tree src = gimple_call_arg (t, 0);
+ get_constraint_for_ptr_offset (src, NULL_TREE, &rhsc);
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ FOR_EACH_VEC_ELT (rhsc, i, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ return;
+   }
+ /* The following functions use what their first and second argument
+point to.  */
+ case BUILT_IN_STRSTR:
+ case BUILT_IN_STRPBRK:
+   {
+ tree src = gimple_call_arg (t, 0);
+ get_constraint_for_ptr_offset (src, NULL_TREE, &rhsc);
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ FOR_EACH_VEC_ELT (rhsc, i, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.truncate (0);
+ src = gimple_call_arg (t, 1);
+ get_constraint_for_ptr_offset (src, NULL_TREE, &rhsc);
+ FOR_EACH_VEC_ELT (rhsc, i, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ return;
+   }
  /* The following functions neither read nor clobber memory.  */
  case BUILT_IN_ASSUME_ALIGNED:
  case BUILT_IN_FREE:
-- 
2.43.0


[PATCH v6 1/4] libstdc++: Setup internal FTM for mdspan.

2025-04-30 Thread Luc Grosheintz
Uses the FTM infrastructure to create an internal feature testing macro
for partial availability of mdspan; which is then used to hide the
contents of the header mdspan when compiling against a standard prior to
C++23.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Add internal feature testing macro
__glibcxx_mdspan.
* include/bits/version.h: Regenerate.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/bits/version.def | 9 +
 libstdc++-v3/include/bits/version.h   | 9 +
 2 files changed, 18 insertions(+)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 737b3f421bf..a0b5553ed04 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -998,6 +998,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = mdspan;
+  no_stdname = true; // FIXME: remove
+  values = {
+v = 1; // FIXME: 202207
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = ssize;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 59ff0cee043..1bb97847ce6 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1115,6 +1115,15 @@
 #endif /* !defined(__cpp_lib_span) && defined(__glibcxx_want_span) */
 #undef __glibcxx_want_span
 
+#if !defined(__cpp_lib_mdspan)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_mdspan 1L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_mdspan)
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_mdspan) && defined(__glibcxx_want_mdspan) */
+#undef __glibcxx_want_mdspan
+
 #if !defined(__cpp_lib_ssize)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_ssize 201902L
-- 
2.49.0



[PATCH v6 2/4] libstdc++: Add header mdspan to the build-system.

2025-04-30 Thread Luc Grosheintz
Creates a nearly empty header mdspan and adds it to the build-system and
Doxygen config file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Add .
* include/Makefile.am: Ditto.
* include/Makefile.in: Ditto.
* include/precompiled/stdc++.h: Ditto.
* include/std/mdspan: New file.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
 libstdc++-v3/include/Makefile.am  |  1 +
 libstdc++-v3/include/Makefile.in  |  1 +
 libstdc++-v3/include/precompiled/stdc++.h |  1 +
 libstdc++-v3/include/std/mdspan   | 48 +++
 5 files changed, 52 insertions(+)
 create mode 100644 libstdc++-v3/include/std/mdspan

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 19ae67a67ba..e926c6707f6 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -880,6 +880,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc 
\
  include/list \
  include/locale \
  include/map \
+ include/mdspan \
  include/memory \
  include/memory_resource \
  include/mutex \
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 537774c2668..1140fa0dffd 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -38,6 +38,7 @@ std_freestanding = \
${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
${std_srcdir}/memory \
${std_srcdir}/numbers \
${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 7b96b2207f8..c96e981acd6 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -396,6 +396,7 @@ std_freestanding = \
${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
${std_srcdir}/memory \
${std_srcdir}/numbers \
${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index f4b312d9e47..e7d89c92704 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -228,6 +228,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
new file mode 100644
index 000..78a00a5aa52
--- /dev/null
+++ b/libstdc++-v3/include/std/mdspan
@@ -0,0 +1,48 @@
+//  -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file mdspan
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_MDSPAN
+#define _GLIBCXX_MDSPAN 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#define __glibcxx_want_mdspan
+#include 
+
+#ifdef __glibcxx_mdspan
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+#endif
+#endif
-- 
2.49.0



[PATCH v6 4/4] libstdc++: Add tests for std::extents.

2025-04-30 Thread Luc Grosheintz
A prior commit added std::extents, this commit adds the tests. The bulk
is focussed on testing the constructors. These are split into three
groups:

1. the ctor from other extents and the copy ctor,
2. the ctor from a pack of integer-like objects,
3. the ctor from shapes, i.e. span and array.

For each group check that the ctor:
* produces an object with the expected values for extent,
* is implicit if and only if required,
* is constexpr,
* doesn't change the rank of the extent.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_copy.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_ints.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_shape.cc: New test.
* testsuite/23_containers/mdspan/extents/custom_integer.cc: New test.
* testsuite/23_containers/mdspan/extents/misc.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/extents/class_mandates_neg.cc  |   8 +
 .../23_containers/mdspan/extents/ctor_copy.cc |  82 +++
 .../23_containers/mdspan/extents/ctor_ints.cc |  62 +
 .../mdspan/extents/ctor_shape.cc  | 160 +
 .../mdspan/extents/custom_integer.cc  |  87 +++
 .../23_containers/mdspan/extents/misc.cc  | 224 ++
 6 files changed, 623 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
new file mode 100644
index 000..b654e3920a8
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++23 } }
+#include
+
+std::extents e1; // { dg-error "from here" }
+std::extents e2;// { dg-error "from here" }
+// { dg-prune-output "dynamic or representable as _IndexType" }
+// { dg-prune-output "must be integral" }
+// { dg-prune-output "invalid use of incomplete type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
new file mode 100644
index 000..a7b3a169301
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
@@ -0,0 +1,82 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+// Test the copy ctor and the ctor from other extents.
+
+constexpr auto dyn = std::dynamic_extent;
+
+// Not constructible
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+static_assert(!std::is_constructible_v,
+  std::extents>);
+
+// Nothrow constructible
+static_assert(std::is_nothrow_constructible_v,
+ std::extents>);
+static_assert(std::is_nothrow_constructible_v,
+ std::extents>);
+
+// Implicit conversion
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+static_assert(!std::is_convertible_v,
+std::extents>);
+static_assert(std::is_convertible_v,
+   std::extents>);
+
+template
+  constexpr void
+  test_ctor(const Other& other)
+  {
+auto e = std::extents(other);
+VERIFY(e == other);
+  }
+
+constexpr int
+test_all()
+{
+  auto e0 = std::extents();
+  test_ctor(e0);
+
+  auto e1 = std::extents();
+  test_ctor(e1);
+  test_ctor(e1);
+  test_ctor(e1);
+
+  auto e2 = std::extents{1, 2, 3};
+  test_ctor(e2);
+  test_ctor(e2);
+  test_ctor(e2);
+  return true;
+}
+
+int
+main()
+{
+  test_all();
+  static_assert(test_all());
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/23_containers

[PATCH v6 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-30 Thread Luc Grosheintz
This implements std::extents from  according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This
precondition is not checked at runtime.

The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
For extents this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

PR libstdc++/107761

libstdc++-v3/ChangeLog:

* include/std/mdspan (extents): New class.
* src/c++23/std.cc.in: Add 'using std::extents'.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 261 +++
 libstdc++-v3/src/c++23/std.cc.in |   9 +-
 2 files changed, 269 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 78a00a5aa52..aee96dda7cd 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -33,6 +33,12 @@
 #pragma GCC system_header
 #endif
 
+#include 
+#include 
+#include 
+#include 
+#include 
+
 #define __glibcxx_want_mdspan
 #include 
 
@@ -41,6 +47,261 @@
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+  namespace __mdspan
+  {
+template
+  class _ExtentsStorage
+  {
+  public:
+   static consteval bool
+   _S_is_dyn(size_t __ext) noexcept
+   { return __ext == dynamic_extent; }
+
+   template
+ static constexpr _IndexType
+ _S_int_cast(const _OIndexType& __other) noexcept
+ { return _IndexType(__other); }
+
+   static constexpr size_t _S_rank = _Extents.size();
+
+   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
+   // of dynamic extents up to (and not including) __r.
+   //
+   // If __r is the index of a dynamic extent, then
+   // _S_dynamic_index[__r] is the index of that extent in
+   // _M_dynamic_extents.
+   static constexpr auto _S_dynamic_index = [] consteval
+   {
+ array __ret;
+ size_t __dyn = 0;
+ for(size_t __i = 0; __i < _S_rank; ++__i)
+   {
+ __ret[__i] = __dyn;
+ __dyn += _S_is_dyn(_Extents[__i]);
+   }
+ __ret[_S_rank] = __dyn;
+ return __ret;
+   }();
+
+   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
+
+   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
+   // index of the __r-th dynamic extent in _Extents.
+   static constexpr auto _S_dynamic_index_inv = [] consteval
+   {
+ array __ret;
+ for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
+   if (_S_is_dyn(_Extents[__i]))
+ __ret[__r++] = __i;
+ return __ret;
+   }();
+
+   static constexpr size_t
+   _S_static_extent(size_t __r) noexcept
+   { return _Extents[__r]; }
+
+   constexpr _IndexType
+   _M_extent(size_t __r) const noexcept
+   {
+ auto __se = _Extents[__r];
+ if (__se == dynamic_extent)
+   return _M_dynamic_extents[_S_dynamic_index[__r]];
+ else
+   return __se;
+   }
+
+   template
+ constexpr void
+ _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
+ {
+   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
+ {
+   size_t __di = __i;
+   if constexpr (_OtherRank != _S_rank_dynamic)
+ __di = _S_dynamic_index_inv[__i];
+   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
+ }
+ }
+
+   constexpr
+   _ExtentsStorage() noexcept = default;
+
+   template
+ constexpr
+ _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents>&
+ __other) noexcept
+ {
+   _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
+ { return __other._M_extent(__i); });
+ }
+
+   template
+ constexpr
+ _ExtentsStorage(span __exts) noexcept
+ {
+   _M_init_dynamic_extents<_Nm>(
+ [&__exts](size_t __i) -> const _OIndexType&
+ { return __exts[__i]; });
+ }
+
+  private:
+   using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
+   [[no_unique_address]] _S_storage _M_dynamic_extents;
+  };
+
+template
+  concept __valid_index_type =
+   is_convertible_v<_OIndexType, _SIndexType> &&
+   is_nothrow_constructible_v<_SIndexType, _OIndexType>;
+
+template
+  concept
+  __valid_static_extent = _Extent == dynamic_extent
+   || _Extent <= numeric_limits<_IndexType>::max();
+  }
+
+  template
+class extents
+{
+  static_assert(is_integral_v<_IndexType>, "_IndexType must be integral.");
+  static_assert(
+ (_

[PATCH v6 0/4] Implement extents from the mdspan header.

2025-04-30 Thread Luc Grosheintz
This is the sixth interation and replaces:
https://gcc.gnu.org/pipermail/libstdc++/2025-April/061190.html

Changes since v5:
* Removed superfluous braces.
* Fixed std.cc.in
* Fixed Copyright statement.

Any layout related code has been removed from this patch
series.

Luc Grosheintz (4):
  libstdc++: Setup internal FTM for mdspan.
  libstdc++: Add header mdspan to the build-system.
  libstdc++: Implement std::extents [PR107761].
  libstdc++: Add tests for std::extents.

 libstdc++-v3/doc/doxygen/user.cfg.in  |   1 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |   9 +
 libstdc++-v3/include/precompiled/stdc++.h |   1 +
 libstdc++-v3/include/std/mdspan   | 309 ++
 libstdc++-v3/src/c++23/std.cc.in  |   9 +-
 .../mdspan/extents/class_mandates_neg.cc  |   8 +
 .../23_containers/mdspan/extents/ctor_copy.cc |  82 +
 .../23_containers/mdspan/extents/ctor_ints.cc |  62 
 .../mdspan/extents/ctor_shape.cc  | 160 +
 .../mdspan/extents/custom_integer.cc  |  87 +
 .../23_containers/mdspan/extents/misc.cc  | 224 +
 14 files changed, 962 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/include/std/mdspan
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc

-- 
2.49.0



[PATCH] libgcc: Cleanup HWCAP defines in cpuinfo.c

2025-04-30 Thread Wilco Dijkstra

Cleanup HWCAP defines - rather than including hwcap.h and then repeating it 
using
#ifndef, just define the HWCAPs we need exactly as in hwcap.h.

libgcc:
* config/aarch64/cpuinfo.c: Cleanup HWCAP defines.

---

diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
index 
14877f5d8410a51fe47f94cb1a2ff5399f95c253..a4958a00fb53d77a0c2291337543324951ca20df
 100644
--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -37,211 +37,64 @@ typedef struct __ifunc_arg_t {
 } __ifunc_arg_t;
 #endif
 
-#if __has_include()
-#include 
-
 /* Architecture features used in Function Multi Versioning.  */
 struct {
   unsigned long long features;
   /* As features grows new fields could be added.  */
 } __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
 
-#ifndef _IFUNC_ARG_HWCAP
 #define _IFUNC_ARG_HWCAP (1ULL << 62)
-#endif
-#ifndef AT_HWCAP
 #define AT_HWCAP 16
-#endif
-#ifndef HWCAP_FP
-#define HWCAP_FP (1 << 0)
-#endif
-#ifndef HWCAP_ASIMD
-#define HWCAP_ASIMD (1 << 1)
-#endif
-#ifndef HWCAP_EVTSTRM
-#define HWCAP_EVTSTRM (1 << 2)
-#endif
-#ifndef HWCAP_AES
-#define HWCAP_AES (1 << 3)
-#endif
-#ifndef HWCAP_PMULL
-#define HWCAP_PMULL (1 << 4)
-#endif
-#ifndef HWCAP_SHA1
-#define HWCAP_SHA1 (1 << 5)
-#endif
-#ifndef HWCAP_SHA2
-#define HWCAP_SHA2 (1 << 6)
-#endif
-#ifndef HWCAP_CRC32
-#define HWCAP_CRC32 (1 << 7)
-#endif
-#ifndef HWCAP_ATOMICS
-#define HWCAP_ATOMICS (1 << 8)
-#endif
-#ifndef HWCAP_FPHP
-#define HWCAP_FPHP (1 << 9)
-#endif
-#ifndef HWCAP_ASIMDHP
-#define HWCAP_ASIMDHP (1 << 10)
-#endif
-#ifndef HWCAP_CPUID
-#define HWCAP_CPUID (1 << 11)
-#endif
-#ifndef HWCAP_ASIMDRDM
-#define HWCAP_ASIMDRDM (1 << 12)
-#endif
-#ifndef HWCAP_JSCVT
-#define HWCAP_JSCVT (1 << 13)
-#endif
-#ifndef HWCAP_FCMA
-#define HWCAP_FCMA (1 << 14)
-#endif
-#ifndef HWCAP_LRCPC
-#define HWCAP_LRCPC (1 << 15)
-#endif
-#ifndef HWCAP_DCPOP
-#define HWCAP_DCPOP (1 << 16)
-#endif
-#ifndef HWCAP_SHA3
-#define HWCAP_SHA3 (1 << 17)
-#endif
-#ifndef HWCAP_SM3
-#define HWCAP_SM3 (1 << 18)
-#endif
-#ifndef HWCAP_SM4
-#define HWCAP_SM4 (1 << 19)
-#endif
-#ifndef HWCAP_ASIMDDP
-#define HWCAP_ASIMDDP (1 << 20)
-#endif
-#ifndef HWCAP_SHA512
-#define HWCAP_SHA512 (1 << 21)
-#endif
-#ifndef HWCAP_SVE
-#define HWCAP_SVE (1 << 22)
-#endif
-#ifndef HWCAP_ASIMDFHM
-#define HWCAP_ASIMDFHM (1 << 23)
-#endif
-#ifndef HWCAP_DIT
-#define HWCAP_DIT (1 << 24)
-#endif
-#ifndef HWCAP_ILRCPC
-#define HWCAP_ILRCPC (1 << 26)
-#endif
-#ifndef HWCAP_FLAGM
-#define HWCAP_FLAGM (1 << 27)
-#endif
-#ifndef HWCAP_SSBS
-#define HWCAP_SSBS (1 << 28)
-#endif
-#ifndef HWCAP_SB
-#define HWCAP_SB (1 << 29)
-#endif
-#ifndef HWCAP_PACA
-#define HWCAP_PACA (1 << 30)
-#endif
-#ifndef HWCAP_PACG
-#define HWCAP_PACG (1UL << 31)
-#endif
-
-#ifndef AT_HWCAP2
 #define AT_HWCAP2 26
-#endif
-#ifndef HWCAP2_DCPODP
-#define HWCAP2_DCPODP (1 << 0)
-#endif
-#ifndef HWCAP2_SVE2
-#define HWCAP2_SVE2 (1 << 1)
-#endif
-#ifndef HWCAP2_SVEAES
-#define HWCAP2_SVEAES (1 << 2)
-#endif
-#ifndef HWCAP2_SVEPMULL
-#define HWCAP2_SVEPMULL (1 << 3)
-#endif
-#ifndef HWCAP2_SVEBITPERM
-#define HWCAP2_SVEBITPERM (1 << 4)
-#endif
-#ifndef HWCAP2_SVESHA3
-#define HWCAP2_SVESHA3 (1 << 5)
-#endif
-#ifndef HWCAP2_SVESM4
-#define HWCAP2_SVESM4 (1 << 6)
-#endif
-#ifndef HWCAP2_FLAGM2
-#define HWCAP2_FLAGM2 (1 << 7)
-#endif
-#ifndef HWCAP2_FRINT
-#define HWCAP2_FRINT (1 << 8)
-#endif
-#ifndef HWCAP2_SVEI8MM
-#define HWCAP2_SVEI8MM (1 << 9)
-#endif
-#ifndef HWCAP2_SVEF32MM
-#define HWCAP2_SVEF32MM (1 << 10)
-#endif
-#ifndef HWCAP2_SVEF64MM
-#define HWCAP2_SVEF64MM (1 << 11)
-#endif
-#ifndef HWCAP2_SVEBF16
-#define HWCAP2_SVEBF16 (1 << 12)
-#endif
-#ifndef HWCAP2_I8MM
-#define HWCAP2_I8MM (1 << 13)
-#endif
-#ifndef HWCAP2_BF16
-#define HWCAP2_BF16 (1 << 14)
-#endif
-#ifndef HWCAP2_DGH
-#define HWCAP2_DGH (1 << 15)
-#endif
-#ifndef HWCAP2_RNG
-#define HWCAP2_RNG (1 << 16)
-#endif
-#ifndef HWCAP2_BTI
-#define HWCAP2_BTI (1 << 17)
-#endif
-#ifndef HWCAP2_MTE
-#define HWCAP2_MTE (1 << 18)
-#endif
-#ifndef HWCAP2_RPRES
-#define HWCAP2_RPRES (1 << 21)
-#endif
-#ifndef HWCAP2_MTE3
-#define HWCAP2_MTE3 (1 << 22)
-#endif
-#ifndef HWCAP2_SME
-#define HWCAP2_SME (1 << 23)
-#endif
-#ifndef HWCAP2_SME_I16I64
-#define HWCAP2_SME_I16I64 (1 << 24)
-#endif
-#ifndef HWCAP2_SME_F64F64
-#define HWCAP2_SME_F64F64 (1 << 25)
-#endif
-#ifndef HWCAP2_WFXT
-#define HWCAP2_WFXT (1UL << 31)
-#endif
-#ifndef HWCAP2_EBF16
-#define HWCAP2_EBF16 (1UL << 32)
-#endif
-#ifndef HWCAP2_SVE_EBF16
-#define HWCAP2_SVE_EBF16 (1UL << 33)
-#endif
-#ifndef HWCAP2_CSSC
-#define HWCAP2_CSSC (1UL << 34)
-#endif
-#ifndef HWCAP2_SME2
-#define HWCAP2_SME2 (1UL << 37)
-#endif
-#ifndef HWCAP2_MOPS
-#define HWCAP2_MOPS (1UL << 43)
-#endif
-#ifndef HWCAP2_LRCPC3
-#define HWCAP2_LRCPC3  (1UL << 46)
-#endif
+
+#define HWCAP_FP   (1 << 0)
+#define HWCAP_ASIMD(1 << 1)
+#define HWCAP_PMULL(1 << 4)
+#define HWCAP_SHA2 (1 << 6)
+#define HWCAP_CRC32  

Re: [RFC 0/3] Use automatic make dependencies in aarch64

2025-04-30 Thread Richard Sandiford
Alice Carlotti  writes:
> On Tue, Apr 29, 2025 at 02:47:21PM +0100, Alice Carlotti wrote:
>> This demonstrates a clear benefit to make the makefile rules automatic. I
>> thought this might be quite tricky, but it turns out to be fairly
>> straightforward.
>
> Actually, it turns out I missed at least one more thing that's needed, so the
> first two patches combined don't even build cleanly.  The issue is that
> dependencies on generated files need some mechanism to ensure that the
> generated files are available before their dependants are built during a clean
> build.  This means that I can't just delete the dependency
>
> aarch64-builtins.o: aarch64-builtin-iterators.h
>
>
> Many other generated files are currently specified as prerequisites via the
> rule:
> $(ALL_HOST_OBJS) : | $(generated_files)
>
> I think it would make sense to include the backend generated files into this
> variable.  Currently some backend files are included, but this is done using
> the variables TM_H, TM_P_H, TM_D_H and TM_RUST_H variables, which looks like a
> misuse of these variables.
>
> The intended meaning/use of the TM_* variables is also unclear.  As far as I
> can tell, it looks like they should list the dependencies of the corresponding
> files generated by mkconfig, of which the direct includes are added
> automatically, but this isn't quite consistent with the current values in
> t-aarch64.

Which files are you thinking of when you say that those macros are being
misused, and that the current values in t-aarch64 aren't consistent with
the intended usage?  It looks from a quick glance at:

TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
$(srcdir)/config/aarch64/aarch64-tuning-flags.def \
$(srcdir)/config/aarch64/aarch64-option-extensions.def \
$(srcdir)/config/aarch64/aarch64-cores.def \
$(srcdir)/config/aarch64/aarch64-isa-modes.def \
$(srcdir)/config/aarch64/aarch64-arches.def

that aarch64-fusion-pairs.def and aarch64-tuning-flags.def could be put
in TM_P_H instead of TM_H, since they are included via aarch64-protos.h
but not (apparently) via aarch64.h.  But the others look correct.

> Another related observation is that aarch64-builtin-iterators.h is missing 
> from
> MOSTLYCLEANFILES, so it isn't removed in a clean build.  It ought to be
> included, and I think it would be good if we could use the same list (or 
> mostly
> the same list) of generated files for both the order-only prerequiste rule and
> in MOSTLYCLEANFILES.

I agree that it would be good for the common subset to be specified once
rather than twice.  But generated_files includes things generated by
configure, which shouldn't be removed by "make mostlyclean".
And MOSTLYCLEANFILES includes .cc files, which shouldn't be
included in the ordering dependency.  And I think some targets
use tools to generate installed headers, which also shouldn't be
included in the ordering dependency.

So I suppose this amounts to a macro that means "generated header files
that are included by host code".  I'm not going to suggest a name for that :)

Richard


Re: [PATCH] libstdc++: Rewrite atomic builtin checks [PR70560]

2025-04-30 Thread Jonathan Wakely
On Wed, 30 Apr 2025 at 12:12, Tomasz Kaminski  wrote:
>
>
>
> On Tue, Apr 29, 2025 at 10:11 PM Jonathan Wakely  wrote:
>>
>> Currently the GLIBCXX_ENABLE_ATOMIC_BUILTINS macro checks for a variety
>> of __atomic built-ins for bool, short and int. If all those checks pass,
>> then it defines _GLIBCXX_ATOMIC_BUILTINS and uses the definitions from
>> config/cpu/generic/atomicity_builtins/atomicity.h for the non-inline
>> versions of __exchange_and_add and __atomic_add that get compiled into
>> libsupc++.
>>
>> However, the config/cpu/generic/atomicity_builtins/atomicity.h
>> definitions only depend on __atomic_fetch_add not on
>> __atomic_test_and_set or __atomic_compare_exchange. And they only
>> operate on a variable of type _Atomic word, which is not necessarily one
>> of bool, short or int (e.g. for sparcv9 _Atomic_word is 64-bit long).
>>
>> This means that for a target where _Atomic_word is int but there are no
>> 1-byte or 2-byte atomic instructions, GLIBCXX_ENABLE_ATOMIC_BUILTINS
>> will fail the checks for bool and short and not define the macro
>> _GLIBCXX_ATOMIC_BUILTINS. That means that we will use a single global
>> mutex for reference counting in the COW std::string and std::locale,
>> even though we could use __atomic_fetch_add to do it lock-free.
>>
>> This commit removes most of the GLIBCXX_ENABLE_ATOMIC_BUILTINS checks,
>> so that it only checks __atomic_fetch_add on _Atomic_word. This will
>> enable the atomic versions of __exchange_and_add and __atomic_add for
>> more targets. This is not an ABI change, because for targets which
>> didn't previously use the atomic definitions of those function, they
>> always make a non-inlined call to the functions in the library. If the
>> definition of those functions now start using atomics, that doesn't
>> change the semantics for the code calling those functions.
>>
>> On affected targets, new code compiled after this change will see the
>> _GLIBCXX_ATOMIC_BUILTINS macro and so will use the always-inline
>> versions of __exchange_and_add and __atomic_add, which use
>> __atomic_fetch_add directly. That is also compatible with older code
>> which calls the non-inline definitions, because those non-inline
>> definitions now also use __atomic_fetch_add.
>>
>> The only configuration where this could be an ABI change is for a target
>> which currently defines _GLIBCXX_ATOMIC_BUILTINS (because all the atomic
>> built-ins for bool, short and int are supported), but which defines
>> _Atomic_word to some other type for which __atomic_fetch_add is _not_
>> supported. For such a target, we would previously have used inline calls
>> to __atomic_fetch_add, which would have dependend on libatomic. After
>> this commit, we would make non-inline calls into the library where
>> __exchange_and_add and __atomic_add would use the global mutex. That
>> would be an ABI break. I don't consider that a realistic scenario,
>> because it wouldn't have made any sense to define _Atomic_word to a
>> wider type than int, when doing so would have required libatomic to make
>> libstdc++.so work. Surely such a target would have just used int for its
>> _Atomic_word type.
>>
>> The GLIBCXX_ENABLE_BACKTRACE macro currently uses the
>> glibcxx_ac_atomic_int macro defined by the checks that this commit
>> removes from GLIBCXX_ENABLE_ATOMIC_BUILTINS. That wasn't a good check
>> anyway, because libbacktrace actually depends on atomic loads+stores for
>> pointers as well as int, and for atomic stores for size_t. This commit
>> replaces the glibcxx_ac_atomic_int check with a proper test for all the
>> required atomic operations on all three of int, void* and size_t. This
>> ensures that the libbacktrace code used for std::stacktrace will either
>> use native atomics, or implement those loads and stores only in terms of
>> __sync_bool_compare_and_swap (possibly requiring that to come from
>> libatomic or elsewhere).
>>
>> libstdc++-v3/ChangeLog:
>>
>> PR libstdc++/70560
>> PR libstdc++/119667
>> * acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Only check for
>> __atomic_fetch_add on _Atomic_word.
>> (GLIBCXX_ENABLE_BACKTRACE): Check for __atomic_load_n and
>> __atomic_store_n on int, void* and size_t.
>> * config.h.in: Regenerate.
>> * configure: Regenerate.
>> * configure.host: Fix typo in comment.
>> ---
>>
>> Tested x86_64-linux, no changes to the c++config.h results.
>> I need to do more testing on other targets.
>
> O would rename _GLIBCXX_ATOMIC_BUILTINS to _GLIBCXX_ATOMIC_WORLD_BUILTINS,
> to better reflect new reality. I have checked that it seems to be one 
> libstdc++-v3/include/ext/atomicity.h
> file change. And if I mistaken, then the rename will show all affected files 
> in commit.

That seems like a good suggestion. There's a small chance of it
breaking user code that is being naughty and referring to that macro,
e.g. this advises to unset it:

https://sources.debian.org/src/guitarix/0.46.0+dfsg-1/src/NAM/Ne

Re: [PUSHED] vectorizer: Fix riscv build [PR120042]

2025-04-30 Thread Jeff Law




On 4/30/25 4:13 PM, Andrew Pinski wrote:

r15-9859-ga6cfde60d8c added a call to dominated_by_p to tree-vectorizer.h
but dominance.h is not always included; and you get a build failure on riscv 
building
riscv-vector-costs.cc.

Let's add the include of dominance.h to tree-vectorizer.h

Pushed as obvious after builds for riscv and x86_64.

gcc/ChangeLog:

* tree-vectorizer.h: Include dominance.h.
Thanks.  I had that in my local tree, but hadn't pushed it yet.  Caught 
some bug that's had me knocked out of commission for the better part of 
a week now :(


jeff



Re: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-04-30 Thread Jeff Law




On 4/30/25 6:03 PM, Li, Pan2 wrote:

Hi Vineet,


Sorry this got backed up as I'm working on FRM overhaul - if this is not super
urgent can you please wait for a few weeks for my work to be posted.
If you prefer this go in still, fine by me as well.


Sure thing, feel free to ping me if there is something I can help.

I put Pan's patch onto the deferred list.

Jeff



Re: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-04-30 Thread Vineet Gupta


On 4/30/25 20:44, Jeff Law wrote:
> On 4/30/25 6:03 PM, Li, Pan2 wrote:
>>> Sorry this got backed up as I'm working on FRM overhaul - if this is not 
>>> super
>>> urgent can you please wait for a few weeks for my work to be posted.
>>> If you prefer this go in still, fine by me as well.
>> Sure thing, feel free to ping me if there is something I can help.
> I put Pan's patch onto the deferred list.

Thx Jeff and Pan.

-Vineet




[PATCH] phiopt: Remove special case for a sequence after match and simplify for early phiopt

2025-04-30 Thread Andrew Pinski
r16-189-g99aa410f5e0a72 fixed the case where match-and-simplify there was an 
extra
assignment happening inside the sequence return. phiopt_early_allow had code to
workaround that issue but now can be removed and simplify down to only allowing
the sequence having only one MIN/MAX if the outer code is MIN/MAX also.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (phiopt_early_allow): Only allow a sequence
with one statement for MIN/MAX and the op was MIN/MAX.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index e27166c55a5..54ecd93495a 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -549,8 +549,7 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
   tree_code code = (tree_code)op.code;
 
   /* For non-empty sequence, only allow one statement
- except for MIN/MAX, allow max 2 statements,
- each with MIN/MAX.  */
+ a MIN/MAX and an original MIN/MAX.  */
   if (!gimple_seq_empty_p (seq))
 {
   if (code == MIN_EXPR || code == MAX_EXPR)
@@ -565,18 +564,7 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
  code = gimple_assign_rhs_code (stmt);
  return code == MIN_EXPR || code == MAX_EXPR;
}
-  /* Check to make sure op was already a SSA_NAME.  */
-  if (code != SSA_NAME)
-   return false;
-  if (!gimple_seq_singleton_p (seq))
-   return false;
-  gimple *stmt = gimple_seq_first_stmt (seq);
-  /* Only allow assignments.  */
-  if (!is_gimple_assign (stmt))
-   return false;
-  if (gimple_assign_lhs (stmt) != op.ops[0])
-   return false;
-  code = gimple_assign_rhs_code (stmt);
+  return false;
 }
 
   switch (code)
-- 
2.43.0



[pushed: r16-314] sarif output: introduce sarif_serialization_format

2025-04-30 Thread David Malcolm
The SARIF 2.1.0 spec says that although a "SARIF log file SHALL contain
a serialization of the SARIF object model into the JSON format ... in the
future, other serializations might be defined." (§3.1)

I've been experimenting with alternative serializations of SARIF (CBOR
and JSON5 for now).  To help with these experiments, this patch adds a
new param "serialization" to -fdiagnostics-add-output='s "sarif" scheme.

For now this must have value "json", but will be helpful for any
followup patches.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-314-ge504a59bd149f8.

gcc/ChangeLog:
* diagnostic-format-sarif.cc
(sarif_serialization_format_json::write_to_file): New.
(sarif_builder::m_formatted): Replace field with...
(sarif_builder::m_serialization_format): ...this.
(sarif_builder::sarif_builder): Update for field change.
(sarif_builder::flush_to_file): Call m_serialization_format's
write_to_file vfunc.
(sarif_output_format::sarif_output_format): Replace param
"formatted" with "serialization_format".
(sarif_stream_output_format::sarif_output_format): Likewise.
(sarif_file_output_format::sarif_file_output_format): Likewise.
(diagnostic_output_format_init_sarif_stderr): Make a
sarif_serialization_format_json and pass it to
diagnostic_output_format_init_sarif.
(diagnostic_output_format_open_sarif_file): Split out into...
(diagnostic_output_file::try_to_open): ...this, adding
"serialization_kind" param.
(diagnostic_output_format_init_sarif_file): Update for new param
to diagnostic_output_format_open_sarif_file.  Make a
sarif_serialization_format_json and pass it to
diagnostic_output_format_init_sarif.
(diagnostic_output_format_init_sarif_stream): Make a
sarif_serialization_format_json and pass it to
diagnostic_output_format_init_sarif.
(make_sarif_sink): Replace param "formatted" with "serialization".
(selftest::test_make_location_object): Update for changes to
sarif_builder ctor.
* diagnostic-format-sarif.h (enum class sarif_serialization): New.
(diagnostic_output_format_open_sarif_file): Add param
"serialization_kind".
(class sarif_serialization_format): New.
(class sarif_serialization_format_json): New.
(make_sarif_sink): Replace param "formatted" with
"serialization_format".
* diagnostic-output-file.h (diagnostic_output_file::try_to_open):
New decl.
* diagnostic.h (enum diagnostics_output_format): Tweak comments.
* doc/invoke.texi (-fdiagnostics-add-output): Add "serialization"
param to sarif scheme.
* libgdiagnostics.cc (sarif_sink::sarif_sink): Update for change
to make_sarif_sink.
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Add
"serialization" param and pass it on to make_sarif_sink.
---
 gcc/diagnostic-format-sarif.cc | 115 -
 gcc/diagnostic-format-sarif.h  |  42 +++-
 gcc/diagnostic-output-file.h   |   7 ++
 gcc/diagnostic.h   |   4 +-
 gcc/doc/invoke.texi|   5 ++
 gcc/libgdiagnostics.cc |   3 +-
 gcc/opts-diagnostic.cc |  33 +-
 7 files changed, 173 insertions(+), 36 deletions(-)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index f322991ab2e..bc6abdff5e4 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -634,6 +634,18 @@ private:
   std::vector> m_results;
 };
 
+/* Classes for abstracting away JSON vs other serialization formats.  */
+
+// class sarif_serialization_format_json : public sarif_serialization_format
+
+void
+sarif_serialization_format_json::write_to_file (FILE *outf,
+   const json::value &top)
+{
+  top.dump (outf, m_formatted);
+  fprintf (outf, "\n");
+}
+
 /* A class for managing SARIF output (for -fdiagnostics-format=sarif-stderr
and -fdiagnostics-format=sarif-file).
 
@@ -687,7 +699,7 @@ public:
 pretty_printer &printer,
 const line_maps *line_maps,
 const char *main_input_filename_,
-bool formatted,
+std::unique_ptr 
serialization_format,
 const sarif_generation_options &sarif_gen_opts);
   ~sarif_builder ();
 
@@ -891,7 +903,7 @@ private:
 
   int m_tabstop;
 
-  bool m_formatted;
+  std::unique_ptr m_serialization_format;
   const sarif_generation_options m_sarif_gen_opts;
 
   unsigned m_next_result_idx;
@@ -1561,7 +1573,7 @@ sarif_builder::sarif_builder (diagnostic_context &context,
  pretty_printer &printer,
  const line_maps *line_maps,
  const char *main_input_filename_,
- bool formatted

[pushed: r16-312] analyzer: avoid saying "'0' is NULL"

2025-04-30 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-312-g039ba6580f5328.

gcc/analyzer/ChangeLog:
* sm-malloc.cc (malloc_diagnostic::describe_state_change): Tweak
the "EXPR is NULL" message for the case where EXPR is a null
pointer.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/data-model-path-1.c: Check for
"using NULL here" message.
* 
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c:
Likewise.  Check for "return of NULL" message.
* c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c:
Likewise.
* gcc.dg/analyzer/data-model-5.c: Likewise.
* gcc.dg/analyzer/data-model-5b.c: Likewise.
* gcc.dg/analyzer/data-model-5c.c: Likewise.
* gcc.dg/analyzer/torture/pr93647.c: Likewise.
---
 gcc/analyzer/sm-malloc.cc| 9 +++--
 gcc/testsuite/c-c++-common/analyzer/data-model-path-1.c  | 2 +-
 .../null-deref-pr108251-smp_fetch_ssl_fc_has_early.c | 4 ++--
 .../analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c| 2 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5.c | 2 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5b.c| 2 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5c.c| 2 +-
 gcc/testsuite/gcc.dg/analyzer/torture/pr93647.c  | 2 +-
 8 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 01862686d58..333dfea47d3 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -785,8 +785,13 @@ public:
else
  {
if (change.m_expr)
- pp_printf (&pp, "%qE is NULL",
-change.m_expr);
+ {
+   if (zerop (change.m_expr))
+ pp_printf (&pp, "using NULL here");
+   else
+ pp_printf (&pp, "%qE is NULL",
+change.m_expr);
+ }
else
  pp_printf (&pp, "%qs is NULL",
 "");
diff --git a/gcc/testsuite/c-c++-common/analyzer/data-model-path-1.c 
b/gcc/testsuite/c-c++-common/analyzer/data-model-path-1.c
index d7058ea18e0..0609dc8cb4f 100644
--- a/gcc/testsuite/c-c++-common/analyzer/data-model-path-1.c
+++ b/gcc/testsuite/c-c++-common/analyzer/data-model-path-1.c
@@ -3,7 +3,7 @@
 static int *__attribute__((noinline))
 callee (void)
 {
-  return NULL;
+  return NULL; /* { dg-message "using NULL here" } */
 }
 
 void test_1 (void)
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
index c5f1fa42e6f..4f04e46695e 100644
--- 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
+++ 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
@@ -66,7 +66,7 @@ static inline struct connection *__objt_conn(enum obj_type *t)
 static inline struct connection *objt_conn(enum obj_type *t)
 {
  if (!t || *t != OBJ_TYPE_CONN)
-   return (struct connection *) ((void *)0);
+   return (struct connection *) ((void *)0); /* { dg-message "using NULL here" 
} */
  return __objt_conn(t);
 }
 struct session {
@@ -85,7 +85,7 @@ smp_fetch_ssl_fc_has_early(const struct arg *args, struct 
sample *smp, const cha
  SSL *ssl;
  struct connection *conn;
 
- conn = objt_conn(smp->sess->origin);
+ conn = objt_conn(smp->sess->origin); /* { dg-message "return of NULL" } */
  ssl = ssl_sock_get_ssl_object(conn);
  if (!ssl)
   return 0;
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c
index 9dcf7aa31f1..0ebeeff8348 100644
--- 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c
+++ 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c
@@ -60,7 +60,7 @@ void WuExpireSessionKey(WEBUI *wu)
 
for(i=0; iContexts); i++)
{
-   STRMAP_ENTRY *entry = (STRMAP_ENTRY*)LIST_DATA(wu->Contexts, i);
+   STRMAP_ENTRY *entry = (STRMAP_ENTRY*)LIST_DATA(wu->Contexts, 
i); /* { dg-message "'entry' is NULL" } */
WU_CONTEXT *context = (WU_CONTEXT*)entry->Value; /* { dg-bogus 
"dereference of NULL 'entry'" "PR analyzer/108400" { xfail *-*-* } } */
if(context->ExpireDate < Tick64())
{
diff --git a/gcc/testsuite/gcc.dg/analyzer/data-model-5.c 
b/gcc/testsuite/gcc.dg/analyzer/data-model-5.c
index b71bad757a1..78e27521980 100644
--- a/gcc/testsuite/gcc.dg/analyzer/data-model-5.c
+++ b/gcc/testsuite/gcc.dg/analyzer/data-model-5.c
@@ -60,7 +60,7 @@ base_obj *alloc_obj (type_obj *ob_type, size_t sz)
 {
   base_obj *obj = (base_obj *)malloc (sz);
   if (!obj)
-return NULL;
+return NULL; /* { dg-message "using

[pushed: r16-313] analyzer: add more test coverage for sprintf

2025-04-30 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-313-g8c80fc106482dd.

gcc/testsuite/ChangeLog:
PR analyzer/107017
* c-c++-common/analyzer/sprintf-3.c: New test, covering use of
sprintf with specific format strings.  Doesn't yet find problems
as the analyzer doesn't yet understand the format strings.
---
 .../c-c++-common/analyzer/sprintf-3.c | 44 +++
 1 file changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/sprintf-3.c

diff --git a/gcc/testsuite/c-c++-common/analyzer/sprintf-3.c 
b/gcc/testsuite/c-c++-common/analyzer/sprintf-3.c
new file mode 100644
index 000..ac5169e71b8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/sprintf-3.c
@@ -0,0 +1,44 @@
+/* See e.g. https://en.cppreference.com/w/c/io/fprintf
+   and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */
+
+extern int
+sprintf(char* dst, const char* fmt, ...)
+  __attribute__((__nothrow__));
+
+#include "../../gcc.dg/analyzer/analyzer-decls.h"
+
+void test_text_ok (void)
+{
+  char buf[16];
+  sprintf (buf, "hello world");
+}
+
+void test_text_oob (void)
+{
+  char buf[3];
+  sprintf (buf, "hello world"); /* { dg-warning "out-of-bounds" "PR 
analyzer/107017" { xfail *-*-* } } */
+}
+
+void test_percent_s_ok (void)
+{
+  char buf[16];
+  sprintf (buf, "%s", "foo");
+}
+
+void test_percent_s_oob (void)
+{
+  char buf[3];
+  sprintf (buf, "%s", "foo"); /* { dg-warning "out-of-bounds" "PR 
analyzer/107017" { xfail *-*-* } } */
+}
+
+void test_percent_i_ok (void)
+{
+  char buf[16];
+  sprintf (buf, "%i", "42");
+}
+
+void test_percent_i_oob (void)
+{
+  char buf[4];
+  sprintf (buf, "%i", "1066"); /* { dg-warning "out-of-bounds" "PR 
analyzer/107017" { xfail *-*-* } } */
+}
-- 
2.26.3



Re: [PATCH] x86: Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-04-30 Thread H.J. Lu
On Wed, Apr 30, 2025 at 8:12 PM Uros Bizjak  wrote:
>
> On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> >
> > SSE_FIRST_REG was added to CLASS_LIKELY_SPILLED_P, which became
> > TARGET_CLASS_LIKELY_SPILLED_P, for
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40470
> >
> > Since RA has been improved and xmm0 is a commonly used register, remove
> > SSE_FIRST_REG from ix86_class_likely_spilled_p to improve xmm0 codegen:
>
> While the AVX version of pblendvb doesn't use XMM0 as an architectural
> register, there are still plenty of other cases, e.g. PCMPESTM and
> PCMPISTRM, SHA and KEYLOCKER insns, so you are risking RA failures
> with these insn if the life of XMM0 is extended.

This is no different from

void
foo1 (double x)
{
  register double xmm1 __asm ("xmm1") = x;
  asm volatile ("# %0" : "+v" (xmm1));
}

We don't add XMM1 to ix86_class_likely_spilled_p.

> OTOH, the (unrelated?) ternlog change changes the expander, where the
> expander does subreg tricks on memory operand, which doesn' look
> correct to me. adjust_address should be used instead.
>
>

Should I drop the ternlog change and submit a different patch?

Thanks.

-- 
H.J.


Re: [PATCH 1/1] Fix BZ 119317: named loops (C2y) with debug info

2025-04-30 Thread Joseph Myers
On Tue, 29 Apr 2025, Christopher Bazley wrote:

> Named loops (C2y) could not previously be compiled with
> -O1 and -ggdb2 or higher because the label preceding
> a loop (or switch) could not be found when using such
> command lines.
> 
> This could be observed by compiling
> gcc/gcc/testsuite/gcc.dg/c2y-named-loops-1.c with
> the provoking command line (or any minimal example such
> as that cited in the bug report).
> 
> The fix was simply to ignore the tree nodes inserted
> for debugging information.
> 
> Base commit is ae4c22ab05501940e345ee799be3aa36ffa7269a

There should be a testcase added to the testsuite (possibly one that 
#includes c2y-named-loops-1.c with appropriate dg-*, in particular 
dg-options that includes the options required to show the bug).

> -  else if (TREE_CODE (stmt) != CASE_LABEL_EXPR)
> +  else if (TREE_CODE (stmt) != CASE_LABEL_EXPR &&
> +TREE_CODE (stmt) != DEBUG_BEGIN_STMT)

The line should be broken before &&, not after.

-- 
Joseph S. Myers
josmy...@redhat.com



[PUSHED] vectorizer: Fix riscv build [PR120042]

2025-04-30 Thread Andrew Pinski
r15-9859-ga6cfde60d8c added a call to dominated_by_p to tree-vectorizer.h
but dominance.h is not always included; and you get a build failure on riscv 
building
riscv-vector-costs.cc.

Let's add the include of dominance.h to tree-vectorizer.h

Pushed as obvious after builds for riscv and x86_64.

gcc/ChangeLog:

* tree-vectorizer.h: Include dominance.h.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-vectorizer.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 94cbfde6c9a..63991c3d977 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -30,6 +30,7 @@ typedef struct _slp_tree *slp_tree;
 #include "internal-fn.h"
 #include "tree-ssa-operands.h"
 #include "gimple-match.h"
+#include "dominance.h"
 
 /* Used for naming of new temporaries.  */
 enum vect_var_kind {
-- 
2.43.0



RE: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore volatile define_insn

2025-04-30 Thread Li, Pan2
Hi Vineet,

> Sorry this got backed up as I'm working on FRM overhaul - if this is not super
> urgent can you please wait for a few weeks for my work to be posted.
> If you prefer this go in still, fine by me as well.

Sure thing, feel free to ping me if there is something I can help.

Pan


-Original Message-
From: Vineet Gupta  
Sent: Thursday, May 1, 2025 2:08 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken 
Subject: Re: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore 
volatile define_insn

Hi Pan,

On 4/27/25 18:33, Li, Pan2 wrote:
> Kindly ping.

Sorry this got backed up as I'm working on FRM overhaul - if this is not super
urgent can you please wait for a few weeks for my work to be posted.
If you prefer this go in still, fine by me as well.

Thx,
-Vineet


>
> Pan
>
> -Original Message-
> From: Li, Pan2  
> Sent: Wednesday, April 16, 2025 10:57 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
> rdapp@gmail.com; Chen, Ken ; Li, Pan2 
> 
> Subject: [PATCH v1][GCC16-Stage-1] RISC-V: Remove unnecessary frm restore 
> volatile define_insn
>
> From: Pan Li 
>
> After we add the frm register to the global_regs, we may not need to
> define_insn that volatile to emit the frm restore insns.  The
> cooperatively-managed global register will help to handle this, instead
> of emit the volatile define_insn explicitly.
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv.cc (riscv_emit_frm_mode_set): Refactor
>   the frm mode set by removing fsrmsi_restore_volatile.
>   * config/riscv/vector-iterators.md (unspecv): Remove as
>   unnecessary.
>   * config/riscv/vector.md (fsrmsi_restore_volatile): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Adjust
>   the asm dump check times.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Ditto.
>   * gcc.target/riscv/rvv/base/float-point-dynamic-frm-75.c: Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc | 43 ++-
>  gcc/config/riscv/vector-iterators.md  |  4 --
>  gcc/config/riscv/vector.md| 13 --
>  .../rvv/base/float-point-dynamic-frm-49.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-50.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-52.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-74.c |  2 +-
>  .../rvv/base/float-point-dynamic-frm-75.c |  2 +-
>  8 files changed, 28 insertions(+), 42 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38f3ae7cd84..3878702e3a1 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -12047,27 +12047,30 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
>if (prev_mode == riscv_vector::FRM_DYN_CALL)
>  emit_insn (gen_frrmsi (backup_reg)); /* Backup frm when DYN_CALL.  */
>  
> -  if (mode != prev_mode)
> -{
> -  rtx frm = gen_int_mode (mode, SImode);
> -
> -  if (mode == riscv_vector::FRM_DYN_CALL
> - && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
> - /* No need to emit when prev mode is DYN already.  */
> - emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
> -  else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
> - && prev_mode != riscv_vector::FRM_DYN
> - && prev_mode != riscv_vector::FRM_DYN_CALL)
> - /* No need to emit when prev mode is DYN or DYN_CALL already.  */
> - emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
> -  else if (mode == riscv_vector::FRM_DYN
> - && prev_mode != riscv_vector::FRM_DYN_CALL)
> - /* Restore frm value from backup when switch to DYN mode.  */
> - emit_insn (gen_fsrmsi_restore (backup_reg));
> -  else if (riscv_static_frm_mode_p (mode))
> - /* Set frm value when switch to static mode.  */
> - emit_insn (gen_fsrmsi_restore (frm));
> +  if (mode == prev_mode)
> +return;
> +
> +  if (riscv_static_frm_mode_p (mode))
> +{
> +  /* Set frm value when switch to static mode.  */
> +  emit_insn (gen_fsrmsi_restore (gen_int_mode (mode, SImode)));
> +  return;
>  }
> +
> +  bool restore_p
> += /* No need to emit when prev mode is DYN.  */
> +  (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_CALL
> +   && prev_mode != riscv_vector::FRM_DYN)
> +  /* No need to emit if prev mode is DYN or DYN_CALL.  */
> +  || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_EXIT
> +   && prev_mode != riscv_vector::FRM_DYN
> +   && prev_mode != riscv_vector::FRM_DYN_CALL)
> +  /* Restore frm value when switch to DYN mode.  */
> +  || (mode == riscv_

Re: [PATCH] c: Suppress -Wdeprecated-non-prototype warnings for builtins

2025-04-30 Thread Joseph Myers
On Sat, 26 Apr 2025, Florian Weimer wrote:

> Builtins defined with BT_FN_INT_VAR etc. show as functions without
> a prototype and trigger the warning.
> 
> gcc/c/
> 
>   PR c/119950
>   * c-typeck.cc (convert_arguments): Check for built-in
>   function declaration before warning.
> 
> gcc/testsuite/
> 
>   * gcc.dg/Wdeprecated-non-prototype-5.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



[pushed: r16-315] prime-paths.cc: remove redundant semicolons

2025-04-30 Thread David Malcolm
Fixes a couple of pedantic warnings.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-315-g49d2c6ced2c894.

gcc/ChangeLog:
* prime-paths.cc (limit_checked_add): Remove redundant trailing
';'.
(enters_through_p): Likewise.
---
 gcc/prime-paths.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
index cde630cb5e0..838343c8427 100644
--- a/gcc/prime-paths.cc
+++ b/gcc/prime-paths.cc
@@ -53,7 +53,7 @@ limit_checked_add (size_t approx)
 {
   approx_limit -= approx < approx_limit ? approx : approx_limit;
   return approx_limit == 0;
-};
+}
 
 /* Check if adding APPROX would exceed the path limit.  This is necessary when
(pessimistically counted) trie insertions would exceed the limit and yields
@@ -1061,7 +1061,7 @@ enters_through_p (const struct graph *cfg, const vec 
&path, int vertex)
   if (cfg->vertices[last].component == cfg->vertices[vertex].component)
 return false;
   return edge_p (cfg, last, vertex);
-};
+}
 
 /* Worker for scc_entry_prime_paths.  CFG is the CFG for the function,
SCC_ENTRY_PATHS the accumulated scc_entry_paths for all the SCCs, 
PRIME_PATHS
-- 
2.26.3



[PATCH] x86: Skip if the mode size is smaller than its natural size

2025-04-30 Thread H.J. Lu
When generating a SUBREG from V16QI to V2HF, validate_subreg fails since
the V2HF size (4 bytes) is smaller than its natural size (word size).
Update remove_redundant_vector_load to skip if the mode size is smaller
than its natural size.

gcc/

PR target/120036
* config/i386/i386-features.cc (remove_redundant_vector_load):
Also skip if the mode size is smaller than its natural size.

gcc/testsuite/

PR target/120036
* g++.target/i386/pr120036.C: New test.

-- 
H.J.
From 6bfacf6014965d3ec498620dd9951efca9ad6015 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 1 May 2025 06:30:41 +0800
Subject: [PATCH] x86: Skip if the mode size is smaller than its natural size

When generating a SUBREG from V16QI to V2HF, validate_subreg fails since
the V2HF size (4 bytes) is smaller than its natural size (word size).
Update remove_redundant_vector_load to skip if the mode size is smaller
than its natural size.

gcc/

	PR target/120036
	* config/i386/i386-features.cc (remove_redundant_vector_load):
	Also skip if the mode size is smaller than its natural size.

gcc/testsuite/

	PR target/120036
	* g++.target/i386/pr120036.C: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc |   7 +-
 gcc/testsuite/g++.target/i386/pr120036.C | 113 +++
 2 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr120036.C

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 31f3ee2ef17..8e12ca88f7a 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3395,8 +3395,11 @@ remove_redundant_vector_load (void)
 
 	  rtx dest = SET_DEST (set);
 	  machine_mode mode = GET_MODE (dest);
-	  /* Skip non-vector instruction.  */
-	  if (!VECTOR_MODE_P (mode))
+	  /* Skip non-vector instruction. Also skip if the mode size is
+	 smaller than its natural size to avoid validate_subreg
+	 failure.  */
+	  if (!VECTOR_MODE_P (mode)
+	  || GET_MODE_SIZE (mode) < ix86_regmode_natural_size (mode))
 	continue;
 
 	  rtx src = SET_SRC (set);
diff --git a/gcc/testsuite/g++.target/i386/pr120036.C b/gcc/testsuite/g++.target/i386/pr120036.C
new file mode 100644
index 000..a2fc24f1286
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr120036.C
@@ -0,0 +1,113 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -std=c++11 -march=sapphirerapids -fPIC" } */
+
+typedef _Float16 Native;
+struct float16_t
+{
+  Native native;
+  float16_t ();
+  float16_t (Native arg) : native (arg) {}
+  operator Native ();
+  float16_t
+  operator+ (float16_t rhs)
+  {
+return native + rhs.native;
+  }
+  float16_t
+  operator* (float16_t)
+  {
+return native * native;
+  }
+};
+template  struct Simd
+{
+  static constexpr int kPrivateLanes = N;
+};
+template  struct ClampNAndPow2
+{
+  using type = Simd;
+};
+template  struct CappedTagChecker
+{
+  static constexpr int N = sizeof (int) ? kLimit : 0;
+  using type = typename ClampNAndPow2::type;
+};
+template 
+using CappedTag = typename CappedTagChecker::type;
+template 
+int
+Lanes (D)
+{
+  return D::kPrivateLanes;
+}
+template  int Zero (D);
+template  using VFromD = decltype (Zero (D ()));
+struct Vec512
+{
+  __attribute__ ((__vector_size__ (16))) _Float16 raw;
+};
+Vec512 Zero (Simd<2>);
+template  void ReduceSum (D, VFromD);
+struct Dot
+{
+  template 
+  static T
+  Compute (D d, T *pa, int num_elements)
+  {
+T *pb;
+int N = Lanes (d), i = 0;
+if (__builtin_expect (num_elements < N, 0))
+  {
+T sum0 = 0, sum1 = 0;
+for (; i + 2 <= num_elements; i += 2)
+  {
+float16_t __trans_tmp_6 = pa[i] * pb[i],
+  __trans_tmp_5 = sum0 + __trans_tmp_6,
+  __trans_tmp_8 = pa[i + 1] * pb[1],
+  __trans_tmp_7 = sum1 + __trans_tmp_8;
+sum0 = __trans_tmp_5;
+sum1 = __trans_tmp_7;
+  }
+float16_t __trans_tmp_9 = sum0 + sum1;
+return __trans_tmp_9;
+  }
+decltype (Zero (d)) sum0;
+ReduceSum (d, sum0);
+__builtin_trap ();
+  }
+};
+template  struct ForeachCappedR
+{
+  static void
+  Do (int min_lanes, int max_lanes)
+  {
+CappedTag d;
+Test () (int (), d);
+ForeachCappedR::Do (min_lanes, max_lanes);
+  }
+};
+template  struct ForeachCappedR<0, Test, kPow2>
+{
+  static void Do (int, int);
+};
+struct TestDot
+{
+  template 
+  void
+  operator() (T, D d)
+  {
+int counts[]{ 1, 3 };
+for (int num : counts)
+  {
+float16_t a;
+T __trans_tmp_4 = Dot::Compute<0> (d, &a, num);
+  }
+  }
+};
+int DotTest_TestAllDot_TestTestBody_max_lanes;
+void
+DotTest_TestAllDot_TestTestBody ()
+{
+  ForeachCappedR<64, TestDot, 0>::Do (
+  1, DotTest_TestAllDot_TestTestBody_max_lanes);
+}
-- 
2.49.0



Re: [PATCH] x86: Update TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P

2025-04-30 Thread H.J. Lu
On Wed, Apr 30, 2025 at 7:48 PM Uros Bizjak  wrote:
>
> On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> >
> > SMALL_REGISTER_CLASSES was added by
> >
> > commit c98f874233428d7e6ba83def7842fd703ac0ddf1
> > Author: James Van Artsdalen 
> > Date:   Sun Feb 9 13:28:48 1992 +
> >
> > Initial revision
> >
> > which became TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P.  It is false from
> > day 1 for i386.  Since x86-64 doubles the number of registers, Change
> > TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to return false for x86-64
> > and update decrease_live_ranges_number to skip hard register if
> > targetm.class_likely_spilled_p returns true.  These extend the live
> > range of rbp, r8-r31 and xmm1-xmm31 registers.
> >
> > PR target/118996
> > * ira.cc (decrease_live_ranges_number): Skip hard register if
> > targetm.class_likely_spilled_p returns true.
> > * config/i386/i386.cc (ix86_small_register_classes_for_mode_p):
> > New.
> > (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): Use it.
>
> Redeclaring X86_64 is too risky. We can perhaps enable integer
> registers for APX because it is a relatively new development. I'm
> undecided about AVX-512, perhaps this is worth a try.

There are many GCC targets with 16 GPRs which don't return false
for TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P.

> In the case of AVX-512, can you provide some performance numbers to
> assess the risk/benefit ratio?
>

SPEC CPU impact is within noise range.  This usually improves applications
with very high register pressure.  This isn't an intrusive change.  We
can always
change it back if needed.

-- 
H.J.


Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-04-30 Thread H.J. Lu
On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
>
> On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> >
> > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
> > avoid the following regressions with
> >
> > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
> >
> > FAIL: gcc.dg/pr105911.c (internal compiler error: in 
> > lra_split_hard_reg_for, at lra-assigns.cc:1863)
> > FAIL: gcc.dg/pr105911.c (test for excess errors)
> > FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
> > vpro[lr]q 29
> > FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
> > FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
> > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
> > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
> > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
> > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
> > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
> > FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
> > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
> > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
> > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
> > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
> > FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
> >
> > Tested with glibc master branch at
> >
> > commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
> > Author: Samuel Thibault 
> > Date:   Sun Mar 2 15:16:45 2025 +0100
> >
> > htl: move pthread_once into libc
> >
> > and built Linux kernel 6.13.5 on x86-64.
> >
> > PR target/119083
> > * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
> > and BREG.
>
> The commit message doesn't reflect what the patch does.
>
> OTOH, this is a very delicate part of the compiler. You are risking RA
> failures, the risk/benefit ratio is very high, so I wouldn't touch it
> without clear benefits. Do you have a concrete example where declaring
> BREG as spilled hurts?
>

I can find a testcase to show the improvement.  But I am not sure if
it is what you were asking for.  ix86_class_likely_spilled_p was
CLASS_LIKELY_SPILLED_P which was added by

commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
Author: Michael Meissner 
Date:   Thu Sep 8 17:59:18 1994 +

Add support for -mreg-alloc=

This option is long gone and there is no test in GCC testsuite to show
that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
another callee-saved register, not different from R12.  On i386, BX is
used as the PIC register.  But today RA may pick a different register if
PLT isn't involved.  This patch gives RA a little bit more freedom.

-- 
H.J.
From 4fdcd7fc7e0dad6cb595e8807823cbfde49aab06 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 3 Mar 2025 12:42:31 +0800
Subject: [PATCH v2] x86: Remove BREG from ix86_class_likely_spilled_p

AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
avoid the following regressions with

$ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"

FAIL: gcc.dg/pr105911.c (internal compiler error: in lra_split_hard_reg_for, at lra-assigns.cc:1863)
FAIL: gcc.dg/pr105911.c (test for excess errors)
FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times vpro[lr]q 29
FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw

Tested with glibc master branch at

commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
Author: Samuel Thibault 
Date:   Sun Mar 2 15:16:45 2025 +0100

htl: move pthread_once into libc

and built Linux kernel 6.13.5 on x86-64.

	PR target/119083
	* config/i386/i386.cc (ix86_class_likely_spilled_p): Remove BREG.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index c3c2a465b3c..c8df3af803b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.

Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-04-30 Thread Uros Bizjak
On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu  wrote:
>
> On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
> >
> > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> > >
> > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
> > > avoid the following regressions with
> > >
> > > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
> > >
> > > FAIL: gcc.dg/pr105911.c (internal compiler error: in 
> > > lra_split_hard_reg_for, at lra-assigns.cc:1863)
> > > FAIL: gcc.dg/pr105911.c (test for excess errors)
> > > FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
> > > vpro[lr]q 29
> > > FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
> > > FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
> > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
> > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
> > > FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
> > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
> > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
> > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
> > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
> > > FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
> > >
> > > Tested with glibc master branch at
> > >
> > > commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
> > > Author: Samuel Thibault 
> > > Date:   Sun Mar 2 15:16:45 2025 +0100
> > >
> > > htl: move pthread_once into libc
> > >
> > > and built Linux kernel 6.13.5 on x86-64.
> > >
> > > PR target/119083
> > > * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
> > > and BREG.
> >
> > The commit message doesn't reflect what the patch does.
> >
> > OTOH, this is a very delicate part of the compiler. You are risking RA
> > failures, the risk/benefit ratio is very high, so I wouldn't touch it
> > without clear benefits. Do you have a concrete example where declaring
> > BREG as spilled hurts?
> >
>
> I can find a testcase to show the improvement.  But I am not sure if
> it is what you were asking for.  ix86_class_likely_spilled_p was
> CLASS_LIKELY_SPILLED_P which was added by
>
> commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
> Author: Michael Meissner 
> Date:   Thu Sep 8 17:59:18 1994 +
>
> Add support for -mreg-alloc=
>
> This option is long gone and there is no test in GCC testsuite to show
> that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
> another callee-saved register, not different from R12.  On i386, BX is
> used as the PIC register.  But today RA may pick a different register if
> PLT isn't involved.  This patch gives RA a little bit more freedom.

In the past *decades* CLASS_LIKELY_SPILLED_P was repurposed to signal
the compiler that some extra care is needed with listed classes. On
i386 and x86_64 these include single register classes that represent
"architectural" registers (registers with assigned role). The compiler
does take care to not extend life times of CLASS_LIKELY_SPILLED_P
classes too much to avoid reload failures in cases where instruction
with C_L_S_P class (e.g. shifts with %cl register) is emitted between
unrelated register def and use.

Registers in these classes won't disappear from the pool of available
registers and RA can still use them, but with some extra care. So,
without clear and noticeable benefits, single register classes remain
declared as CLASS_LIKELY_SPILLED_P to avoid possible reload failures.

Uros.