[PATCH] libstdc++: Suppress GDB output from new 'skip' commands [PR118260]

2025-05-09 Thread Jonathan Wakely
I added some gdb.execute('skip -rfu ...') commands to the Python hook
loaded with libstdc++.so but this makes GDB print output like:

Function(s) ^std::(move|forward|as_const|(__)?addressof) will be skipped when 
stepping.

This probably aren't interesting to users, so this change suppresses
that output by capturing the output into the gdb.execute return value
(which is then ignored). An exception is thrown if the gdb.execute
command fails, so this doesn't suppress any errors which might be
meaningful to users or libstdc++ developers.

libstdc++-v3/ChangeLog:

PR libstdc++/118260
* python/hook.in: Suppress output from gdb.execute calls to
register skips.
---

Tested manually with GDB.

 libstdc++-v3/python/hook.in | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/python/hook.in b/libstdc++-v3/python/hook.in
index d63909d2af4c..74a097cd0a00 100644
--- a/libstdc++-v3/python/hook.in
+++ b/libstdc++-v3/python/hook.in
@@ -55,10 +55,14 @@ if gdb.current_objfile () is not None:
 if not dir_ in sys.path:
 sys.path.insert(0, dir_)
 
-gdb.execute('skip -rfu ^std::(move|forward|as_const|(__)?addressof)')
-gdb.execute('skip -rfu ^std::(shared|unique)_ptr<.*>::(get|operator)')
-gdb.execute('skip -rfu 
^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty)')
-gdb.execute('skip -rfu 
^std::(basic_string|vector|array|deque|span)<.*>::operator.]')
+gdb.execute('skip -rfu ^std::(move|forward|as_const|(__)?addressof)',
+to_string=True)
+gdb.execute('skip -rfu ^std::(shared|unique)_ptr<.*>::(get|operator)',
+to_string=True)
+gdb.execute('skip -rfu 
^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty)',
+to_string=True)
+gdb.execute('skip -rfu 
^std::(basic_string|vector|array|deque|span)<.*>::operator.]',
+to_string=True)
 
 # Call a function as a plain import would not execute body of the included file
 # on repeated reloads of this object file.
-- 
2.49.0



Re: [PATCH 8/9] AArch64: rules for CMPBR instructions

2025-05-09 Thread Kyrylo Tkachov


> On 8 May 2025, at 21:10, Karl Meakin  wrote:
> 
> Add rules for lowering `cbranch4` to CBB/CBH/CB when
> CMPBR extension is enabled.
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64.md (cbranch4): Mmit CMPBR
> instructions if possible.
> (BRANCH_LEN_P_1Kib): New constant.
> (BRANCH_LEN_N_1Kib): Likewise.
> (cbranch4): New expand rule.
> (aarch64_cb): Likewise.
> (aarch64_cb): Likewise.
> * config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
> * config/aarch64/predicates.md (const_0_to_63_operand): New
> predicate.
> (aarch64_cb_immediate): Likewise.
> (aarch64_cb_operand): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/cmpbr.c: update tests.
> ---
> gcc/config/aarch64/aarch64.md|  87 +++-
> gcc/config/aarch64/iterators.md  |   5 +
> gcc/config/aarch64/predicates.md |  17 +
> gcc/testsuite/gcc.target/aarch64/cmpbr.c | 484 ---
> 4 files changed, 275 insertions(+), 318 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 248b0e8644f..641c3653a40 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -697,37 +697,60 @@ (define_insn "jump"
> ;; Maximum PC-relative positive/negative displacements for various branching
> ;; instructions.
> (define_constants
>   [
> ;; +/- 128MiB.  Used by B, BL.
> (BRANCH_LEN_P_128MiB  134217724)
> (BRANCH_LEN_N_128MiB -134217728)
> 
> ;; +/- 1MiB.  Used by B., CBZ, CBNZ.
> (BRANCH_LEN_P_1MiB  1048572)
> (BRANCH_LEN_N_1MiB -1048576)
> 
> ;; +/- 32KiB.  Used by TBZ, TBNZ.
> (BRANCH_LEN_P_32KiB  32764)
> (BRANCH_LEN_N_32KiB -32768)
> +
> +;; +/- 1KiB.  Used by CBB, CBH, CB.
> +(BRANCH_LEN_P_1Kib  1020)
> +(BRANCH_LEN_N_1Kib -1024)
>   ]
> )
> 
> ;; ---
> ;; Conditional jumps
> ;; ---
> 
> -(define_expand "cbranch4"
> +(define_expand "cbranch4"
>   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>[(match_operand:GPI 1 "register_operand")
> (match_operand:GPI 2 "aarch64_plus_operand")])
>   (label_ref (match_operand 3))
>   (pc)))]
>   ""
> -  "
> -  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
> - operands[2]);
> -  operands[2] = const0_rtx;
> -  "
> +  {
> +  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], mode))
> +{
> +  emit_jump_insn (gen_aarch64_cb (operands[0], operands[1],
> +operands[2], operands[3]));
> +  DONE;
> +}
> +  else
> +{
> +  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> + operands[1], operands[2]);
> +  operands[2] = const0_rtx;
> +}
> +  }
> +)
> +
> +(define_expand "cbranch4"
> +  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
> +[(match_operand:SHORT 1 "register_operand")
> + (match_operand:SHORT 2 "aarch64_cb_short_operand")])
> +   (label_ref (match_operand 3))
> +   (pc)))]
> +  "TARGET_CMPBR"
> +  ""
> )
> 
> (define_expand "cbranch4"
> @@ -747,13 +770,65 @@ (define_expand "cbranch4"
> (define_expand "cbranchcc4"
>   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>[(match_operand 1 "cc_register")
> (match_operand 2 "const0_operand")])
>   (label_ref (match_operand 3))
>   (pc)))]
>   ""
>   ""
> )
> 
> +;; Emit a `CB (register)` or `CB (immediate)` instruction.
> +(define_insn "aarch64_cb"
> +  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
> +[(match_operand:GPI 1 "register_operand")
> + (match_operand:GPI 2 "aarch64_cb_operand")])
> +   (label_ref (match_operand 3))
> +   (pc)))]
> +  "TARGET_CMPBR"
> +  "cb%m0\\t%1, %2, %l3";
> +  [(set_attr "type" "branch")
> +   (set (attr "length")
> + (if_then_else (and (ge (minus (match_dup 3) (pc))
> +   (const_int BRANCH_LEN_N_1Kib))
> +   (lt (minus (match_dup 3) (pc))
> +   (const_int BRANCH_LEN_P_1Kib)))
> +  (const_int 4)
> +  (const_int 8)))
> +   (set (attr "far_branch")
> + (if_then_else (and (ge (minus (match_dup 3) (pc))
> +   (const_int BRANCH_LEN_N_1Kib))
> +   (lt (minus (match_dup 3) (pc))
> +   (const_int BRANCH_LEN_P_1Kib)))
> +  (const_string "no")
> +  (const_string "yes")))]
> +)
> +
> +;; Emit a `CBB (register)` or `CBH (register)` instruction.
> +(define_insn "aarch64_cb"
> +  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
> +[(match_operand:SHORT 1 "register_operand")
> + (match_operand:SHORT 2 "aarch64_cb_short_operand")])
> +   (label_ref (match_operand 3))
> +   (pc)))]

As per the review of the previous version, the define_insn operands need a 
constraint string.


> +  "TARGET_CMPBR"
> +  "cb%m0\\t%1, %2, %l3";

The ‘;’ at the end is unnecessary, better to remove it.

> +  [(set_attr "type" "branch")
> +   (set (attr "length")
> + (if_then_else (and (ge (minus (matc

RE: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-09 Thread Richard Biener
On Fri, 9 May 2025, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 9, 2025 8:31 AM
> > To: Pengfei Li 
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford 
> > Subject: Re: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, 
> > y) for
> > vectors
> > 
> > On Thu, 8 May 2025, Pengfei Li wrote:
> > 
> > > This patch folds vector expressions of the form (x + y) >> 1 into
> > > IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that
> > > support averaging operations. For example, it can help improve the
> > > codegen on AArch64 from:
> > >   add v0.4s, v0.4s, v31.4s
> > >   ushrv0.4s, v0.4s, 1
> > > to:
> > >   uhadd   v0.4s, v0.4s, v31.4s
> > >
> > > As this folding is only valid when the most significant bit of each
> > > element in both x and y is known to be zero, this patch checks leading
> > > zero bits of elements in x and y, and extends get_nonzero_bits_1() to
> > > handle uniform vectors. When the input is a uniform vector, the function
> > > now returns the nonzero bits of its element.
> > >
> > > Additionally, this patch adds more checks to reject vector types in bit
> > > constant propagation (tree-bit-ccp), since tree-bit-ccp was designed for
> > > scalar values only, and the new vector logic in get_non_zero_bits_1()
> > > could lead to incorrect propagation results.
> > >
> > > Bootstrapped and tested on aarch64-linux-gnu and x86_64_linux_gnu.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd: Add folding rule for vector average.
> > >   * tree-ssa-ccp.cc (get_default_value): Reject vector types.
> > >   (evaluate_stmt): Reject vector types.
> > >   * tree-ssanames.cc (get_nonzero_bits_1): Extend to handle
> > >   uniform vectors.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/acle/uhadd_1.c: New test.
> > > ---
> > >  gcc/match.pd  |  9 +
> > >  .../gcc.target/aarch64/acle/uhadd_1.c | 34 +++
> > >  gcc/tree-ssa-ccp.cc   |  8 ++---
> > >  gcc/tree-ssanames.cc  |  8 +
> > >  4 files changed, 55 insertions(+), 4 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index ab496d923cc..ddd16a10944 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -2177,6 +2177,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  (view_convert (rshift (view_convert:ntype @0) @1))
> > >  (convert (rshift (convert:ntype @0) @1))
> > >
> > > + /* Fold ((x + y) >> 1 into IFN_AVG_FLOOR (x, y) if x and y are vectors 
> > > in
> > > +which each element is known to have at least one leading zero bit.  
> > > */
> > > +(simplify
> > > + (rshift (plus:cs @0 @1) integer_onep)
> > > + (if (VECTOR_TYPE_P (type)
> > > +  && wi::clz (get_nonzero_bits (@0)) > 0
> > > +  && wi::clz (get_nonzero_bits (@1)) > 0)
> > > +  (IFN_AVG_FLOOR @0 @1)))
> > 
> > You need to check that IFN_AVG_FLOOR is supported using
> > direct_internal_fn_supported_p here.
> > 
> 
> Is this actually needed? The match.pd machinery already rejects it
> If not supported.
> 
> For gimple you end up in maybe_push_res_to_seq in gimple-match-exports.cc
> which calls build_call_internal which would refuse to build the call with 
> NULL as
> a result and stopping the simplification.

Ah, yeah - I forgot about this.

> For generic you end up in maybe_build_call_expr_loc in tree.cc which also
> fails with NULL_TREE if the IFN isn't supported.
> 
> I think the other usages of direct_internal_fn_supported_p are there because
> they predate these additions.  Or am I missing something?

Some are there because we decide between simplification variants I think.

Also consider you'd have the above pattern and a following

(simplify
 (rshift (plus:cs @0 @1) integer_onep)
 (if (VECTOR_TYPE_P (type)
 && wi::clz (get_nonzero_bits (@0)) > 0
 && wi::clz (get_nonzero_bits (@1)) > 0)
  (SOMETHING_ELSE @0 @1)))

then the first would match but ultimatively be rejected and the 2nd
also matching pattern would not be tried.  Unlikely in the case in
question but in general I think this could happen when the
maybe_push_res_to_seq that rejects the simplification is happening
from the caller of the simplification (the outermost expression
is open-coded in res_ops).

Richard.

> Regards,
> Tamar
> 
> > Otherwise this is OK with me.
> > 
> > Richard.
> > 
> > > +
> > >  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> > > when profitable.
> > > For bitwise binary operations apply operand conversions to the
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> > b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> > > new file mode 100644
> > > index 000..f1748a199ad
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> > > @@ -0,0 +1,34 @@
> > > +/* Test if SIMD 

[PATCH] Restore lrealpath() fallback scenario

2025-05-09 Thread Rink Springer
Hi all!
 
Git commit e2bb55ec3b70cf45088bb73ba9ca990129328d60 (pr/108350) removes the 
fallback scenario for lrealpath() when none of the #ifdef's match (in which 
case the function is empty)
In this situation, there is no return statement, and hence an uninitialized 
pointer value is returned. This is bad, as make_relative_prefix_1() uses this 
pointer and passes it to free().
 
This crashes in my hobby OS (https://github.com/zhmu/dogfood) as I do not 
implement any of the proper lrealpath() scenarios  ;-) This is likely not an 
issue for any real use case, but I'm submitting this to prevent others from 
having to chase this.
 
I've attached a simple patch that restores the fallback behaviour (return 
strdup(filename);) which it did prior to pr/108350.
 
Regards,
Rinkdiff -rubB gcc-15.1.0-base/libiberty/lrealpath.c gcc-15.1.0/libiberty/lrealpath.c
--- gcc-15.1.0-base/libiberty/lrealpath.c	2025-04-25 10:18:04.0 +0200
+++ gcc-15.1.0/libiberty/lrealpath.c	2025-05-09 16:49:35.228340555 +0200
@@ -303,4 +303,7 @@
 return res;
   }
 #endif // _WIN32
+
+  /* This system is a lost cause, just duplicate the filename.  */
+  return strdup (filename);
 }


[PATCH] c++/modules: Revert "Remove unnecessary lazy_load_pendings"

2025-05-09 Thread Nathaniel Shead
On Fri, May 09, 2025 at 08:18:58AM -0400, Jason Merrill wrote:
> On 4/21/25 6:22 AM, Nathaniel Shead wrote:
> > This call is not necessary, as we don't access the bodies of any classes
> > that we instantiate here.
> 
> This turns out to break
> 
> 20_util/function_objects/mem_fn/constexpr.cc
> std/ranges/view.cc
> 
> when modified to use import std (as attached).  For the former, I see
> 
> > In file included from 
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/stdc++.h:55,
> >  from 
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std.cc:30,
> > of module std, imported at 
> > /home/jason/gt/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc:21:
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional: In 
> > instantiation of ‘class std::_Mem_fn_base’:
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:211:12:
> >required from ‘struct std::_Mem_fn’
> >   211 | struct _Mem_fn<_Res _Class::*>
> >   |^~~
> > /home/jason/gt/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc:36:21:
> >required from here
> >36 |   return std::mem_fn(&F::i)(f);
> >   |  ~~~^~~
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:190:23:
> >  error: conflicting declaration of template ‘template > ... _BoundArgs> struct std::_Bind_check_arity’
> >   190 | friend struct _Bind_check_arity;
> >   |   ^
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:834:12:
> >  note: previous declaration ‘template 
> > struct std::_Bind_check_arity’
> >   834 | struct _Bind_check_arity { };
> >   |^
> 
> lookup_imported_hidden_friend is failing without the lazy_load_pendings, so
> we try and fail to push the instantiation.  Reverting this patch makes them
> pass.
> 
> Jason

Here's a patch which reverts the change with an additional comment and
testcase.  OK for trunk if full bootstrap+regtest passes?

-- >8 --

This reverts commit r16-63-g241157eb0858b3.  It turns out that the
'lazy_load_pendings' is necessary if we haven't seen a binding for the
given template name at all in the current TU, as it is also used to find
template instantiations with the given name.

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_imported_hidden_friend): Add back
lazy_load_pendings with comment.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-19_a.C: New test.
* g++.dg/modules/tpl-friend-19_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc  |  3 +++
 gcc/testsuite/g++.dg/modules/tpl-friend-19_a.C | 16 
 gcc/testsuite/g++.dg/modules/tpl-friend-19_b.C |  6 ++
 3 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-19_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-19_b.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 9b317c44669..84b5e673a6d 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4556,6 +4556,9 @@ lookup_imported_hidden_friend (tree friend_tmpl)
   || !DECL_MODULE_ENTITY_P (inner))
 return NULL_TREE;
 
+  /* Load any templates matching FRIEND_TMPL from importers.  */
+  lazy_load_pendings (friend_tmpl);
+
   tree name = DECL_NAME (inner);
   tree *slot = find_namespace_slot (current_namespace, name, false);
   if (!slot || !*slot || TREE_CODE (*slot) != BINDING_VECTOR)
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-19_a.C 
b/gcc/testsuite/g++.dg/modules/tpl-friend-19_a.C
new file mode 100644
index 000..59f0175693c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-19_a.C
@@ -0,0 +1,16 @@
+// { dg-additional-options "-fmodules -Wno-global-module" }
+// { dg-module-cmi M }
+
+module;
+
+template 
+class _Mem_fn_base {
+  template  friend struct _Bind_check_arity;
+};
+
+template  struct _Bind_check_arity {};
+
+export module M;
+
+template struct _Bind_check_arity;
+export _Mem_fn_base mem_fn();
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-19_b.C 
b/gcc/testsuite/g++.dg/modules/tpl-friend-19_b.C
new file mode 100644
index 000..ce99647b9a8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-19_b.C
@@ -0,0 +1,6 @@
+// { dg-additional-options "-fmodules" }
+
+import M;
+int main() {
+  mem_fn();
+}
-- 
2.47.0



[PATCH] c++: Take downgraded errors into account in seen_error [PR118388]

2025-05-09 Thread Simon Martin
Several gcc_assert through the C++ front-end involve seen_error (), that
does not take into account errors that were turned into warnings due to
-fpermissive or -Wtemplate-body.

Running the full C++ testsuite when forcing the use of -fpermissive
leads to ICEs for 6 tests (see list in ticket); one could consider those
as reject-valid cases.

This patch keeps track of whether we tried to emit an error (whether it
was eventually output as such or not) and uses this in seen_error.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118388

gcc/cp/ChangeLog:

* error.cc (seen_error_raw): New counter to keep track of errors
including those downgraded to warnings.
(cp_seen_error): Take downgraded errors into account.
* typeck2.cc (merge_exception_specifiers): Use seen_error
instead of errorcount.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C: New test.
* g++.dg/cpp0x/noexcept128-fpermissive.C: New test.

---
 gcc/cp/error.cc   | 54 +--
 gcc/cp/typeck2.cc |  2 +-
 .../cpp0x/lambda/lambda-ice5-fpermissive.C| 14 +
 .../g++.dg/cpp0x/noexcept128-fpermissive.C| 21 
 4 files changed, 63 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept128-fpermissive.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 75bf7dcef62..78ecafb0e02 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -215,6 +215,11 @@ get_current_template ()
 
 erroneous_templates_t *erroneous_templates;
 
+/* SEEN_ERROR_RAW will be true if we tried to emit an error message, regardless
+   of whether it was actually output or downgraded to a warning.  */
+
+bool seen_error_raw = false;
+
 /* Callback function diagnostic_context::m_adjust_diagnostic_info.
 
Errors issued when parsing a template are automatically treated like
@@ -227,40 +232,35 @@ cp_adjust_diagnostic_info (diagnostic_context *context,
   diagnostic_info *diagnostic)
 {
   if (diagnostic->kind == DK_ERROR)
-if (tree tmpl = get_current_template ())
-  {
-   diagnostic->option_id = OPT_Wtemplate_body;
-
-   if (context->m_permissive)
- diagnostic->kind = DK_WARNING;
-
-   bool existed;
-   location_t &error_loc
- = hash_map_safe_get_or_insert (erroneous_templates,
-  tmpl, &existed);
-   if (!existed)
- /* Remember that this template had a parse-time error so
-that we'll ensure a hard error has been issued upon
-its instantiation.  */
- error_loc = diagnostic->richloc->get_loc ();
-  }
+{
+  seen_error_raw = true;
+  if (tree tmpl = get_current_template ())
+   {
+ diagnostic->option_id = OPT_Wtemplate_body;
+
+ if (context->m_permissive)
+   diagnostic->kind = DK_WARNING;
+
+ bool existed;
+ location_t &error_loc
+   = hash_map_safe_get_or_insert (erroneous_templates,
+tmpl, &existed);
+ if (!existed)
+   /* Remember that this template had a parse-time error so
+  that we'll ensure a hard error has been issued upon
+  its instantiation.  */
+   error_loc = diagnostic->richloc->get_loc ();
+   }
+}
 }
 
 /* A generalization of seen_error which also returns true if we've
-   permissively downgraded an error to a warning inside a template.  */
+   permissively downgraded an error to a warning.  */
 
 bool
 cp_seen_error ()
 {
-  if ((seen_error) ())
-return true;
-
-  if (erroneous_templates)
-if (tree tmpl = get_current_template ())
-  if (erroneous_templates->get (tmpl))
-   return true;
-
-  return false;
+  return (seen_error) () || seen_error_raw;
 }
 
 /* CONTEXT->printer is a basic pretty printer that was constructed
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 45edd180173..a2d230461c4 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -2726,7 +2726,7 @@ merge_exception_specifiers (tree list, tree add)
 return add;
   noex = TREE_PURPOSE (list);
   gcc_checking_assert (!TREE_PURPOSE (add)
-  || errorcount || !flag_exceptions
+  || seen_error () || !flag_exceptions
   || cp_tree_equal (noex, TREE_PURPOSE (add)));
 
   /* Combine the dynamic-exception-specifiers, if any.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C
new file mode 100644
index 000..11300ad23e0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C
@@ -0,0 +1,14 @@
+// PR c++/118388
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fpermissive" }
+
+template int f

[PATCH] c++/modules: Fix handling of -fdeclone-ctor-dtor with explicit instantiations [PR120125]

2025-05-09 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?

One slight concern I have is why we end up in 'maybe_thunk_body' to
start with: the imported constructor isn't DECL_ONE_ONLY (as its
external) and so 'can_alias_cdtor' returns false.  The change in
write_function_def (which I believe is necessary regardless) hides this
because we never actually emit the function definitions, but I worry
that this might somehow affect LTO, though I haven't been able to
construct a testcase which fails.  To avoid the potential of that we
could do something like this to mark such functions as DECL_ONE_ONLY
just in case; thoughts?

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index e7782627a49..c96e81aceef 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16738,9 +16738,15 @@ module_state::read_cluster (unsigned snum)
 
   /* Make sure we emit explicit instantiations.
  FIXME do we want to do this in expand_or_defer_fn instead?  */
-  if (DECL_EXPLICIT_INSTANTIATION (decl)
-  && !DECL_EXTERNAL (decl))
-setup_explicit_instantiation_definition_linkage (decl);
+  if (DECL_EXPLICIT_INSTANTIATION (decl))
+{
+  if (DECL_DECLARED_INLINE_P (decl)
+  && TREE_PUBLIC (decl)
+  && DECL_MAYBE_IN_CHARGE_CDTOR_P (decl))
+maybe_make_one_only (decl);
+  if (!DECL_EXTERNAL (decl))
+setup_explicit_instantiation_definition_linkage (decl);
+}
 
   if (abstract)
 ;


This bug also affects 14 after r14-11743-g6d5a6a26e28d15; a minimal fix
for the ICE that seems straight-forwardly correct is to just do the
optimize.cc hunk, and I think that might be more appropriate given we
didn't handle explicit instantiations specially then anyway.  Would such
a change (with appropriate xfailed tests) be OK?  Maybe this would even
be better for 15 also?

-- >8 --

The attached testcase ICEs in maybe_thunk_body because we haven't
created a node in the cgraph for an imported explicit instantiation yet.

We in fact really shouldn't be emitting calls at all, since an imported
explicit instantiation always exists in the TU we imported it from.  So
this patch adjusts DECL_NOT_REALLY_EXTERN handling to account for this.

PR c++/120125

gcc/cp/ChangeLog:

* module.cc (trees_out::write_function_def): Only set
DECL_NOT_REALLY_EXTERN if the importer might need to emit it.
* optimize.cc (maybe_thunk_body): Don't assume 'fn' has a cgraph
node created.

gcc/testsuite/ChangeLog:

* g++.dg/modules/clone-4_a.C: New test.
* g++.dg/modules/clone-4_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc |  6 +-
 gcc/cp/optimize.cc   |  4 ++--
 gcc/testsuite/g++.dg/modules/clone-4_a.C | 12 
 gcc/testsuite/g++.dg/modules/clone-4_b.C | 12 
 4 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/clone-4_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/clone-4_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f562bf8cd91..e7782627a49 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12638,7 +12638,11 @@ trees_out::write_function_def (tree decl)
 {
   unsigned flags = 0;
 
-  flags |= 1 * DECL_NOT_REALLY_EXTERN (decl);
+  /* Whether the importer should emit this definition, if used.  */
+  flags |= 1 * (DECL_NOT_REALLY_EXTERN (decl)
+   && (get_importer_interface (decl)
+   != importer_interface::always_import));
+
   if (f)
{
  flags |= 2;
diff --git a/gcc/cp/optimize.cc b/gcc/cp/optimize.cc
index 6f9a77f407a..fc4d6c2e351 100644
--- a/gcc/cp/optimize.cc
+++ b/gcc/cp/optimize.cc
@@ -309,8 +309,8 @@ maybe_thunk_body (tree fn, bool force)
   defer_mangling_aliases = save_defer_mangling_aliases;
   cgraph_node::get_create (fns[0])->set_comdat_group (comdat_group);
   cgraph_node::get_create (fns[1])->add_to_same_comdat_group
-   (cgraph_node::get_create (fns[0]));
-  symtab_node::get (fn)->add_to_same_comdat_group
+   (cgraph_node::get (fns[0]));
+  symtab_node::get_create (fn)->add_to_same_comdat_group
(symtab_node::get (fns[0]));
   if (fns[2])
/* If *[CD][12]* dtors go into the *[CD]5* comdat group and dtor is
diff --git a/gcc/testsuite/g++.dg/modules/clone-4_a.C 
b/gcc/testsuite/g++.dg/modules/clone-4_a.C
new file mode 100644
index 000..3ee6109f23e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/clone-4_a.C
@@ -0,0 +1,12 @@
+// PR c++/120125
+// { dg-additional-options "-fmodules -fdeclone-ctor-dtor" }
+// { dg-module-cmi M }
+
+export module M;
+
+void foo();
+export template  struct __shared_ptr {
+  inline __shared_ptr() { foo(); }
+};
+
+template class __shared_ptr;
diff --git a/gcc/testsuite/g++.dg/modules/clone-4_b.C 
b/gcc/testsuite/g++.dg/modules/clone-4_b.C
new file mode 100644
index 

Re: [PATCH] libstdc++: Use scope guard for deallocating nodes in deque.

2025-05-09 Thread Tomasz Kaminski
On Fri, May 9, 2025 at 5:25 PM Jonathan Wakely  wrote:

> On Fri, 9 May 2025 at 16:13, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Thu, May 8, 2025 at 7:46 PM Jonathan Wakely 
> wrote:
> >>
> >> On Fri, 18 Apr 2025 at 10:03, Tomasz Kamiński 
> wrote:
> >> >
> >> > This patch adds a _Guard_nodes scope guard nested to the _Deque_base,
> >> > that deallocates the range of nodes, and replaces __try/__catch block
> >> > with approparietly constructed guard object.
> >>
> >> "appropriately"
> >>
> >> >
> >> > libstdc++-v3/ChangeLog:
> >> >
> >> > * include/bits/deque.tcc (_Deque_base<_Tp,
> _Alloc>::_Guard_nodes): Define.
> >>
> >> There's no need for the template argument list here, just
> >> "_Deque_base" is unambiguous (there's no partial or explicit
> >> specialization that could be disambiguated with template argument
> >> lists). And just "deque" below.
> >>
> >> > (_Deque_base<_Tp, _Alloc>::_M_create_nodes): Moved defintion
> from stl_deque.h
> >> > and replace __try/__catch with _Guard_nodes scope object.
> >> > (deque<_Tp, _Alloc>::_M_fill_insert, deque<_Tp,
> _Alloc>::_M_default_append)
> >> > (deque<_Tp, _Alloc>::_M_push_back_aux, deque<_Tp,
> _Alloc>::_M_push_front_aux)
> >> > (deque<_Tp, _Alloc>::_M_range_prepend, deque<_Tp,
> _Alloc>::_M_range_append)
> >> > (deque<_Tp, _Alloc>::_M_insert_aux): Replace __try/__catch
> with _Guard_nodes
> >> > scope object.
> >> > (deque<_Tp, _Alloc>::_M_new_elements_at_back)
> >> > (deque<_Tp, _Alloc>::_M_new_elements_at_back): Use
> _M_create_nodes.
> >> > * include/bits/stl_deque.h (_Deque_base<_Tp,
> _Alloc>::_Guard_nodes): Declare.
> >> > (_Deque_base<_Tp, _Alloc)::_M_create_nodes): Move defintion
> to deque.tcc.
> >> > (deque<_Tp, _Alloc>::_Guard_nodes): Add typedef, so name is
> found by lookup.
> >> > ---
> >> > Testing x86_64-linux, default test configuration passed.
> >> > OK for trunk?
> >> >
> >> >  libstdc++-v3/include/bits/deque.tcc   | 424
> --
> >> >  libstdc++-v3/include/bits/stl_deque.h |  20 +-
> >> >  2 files changed, 196 insertions(+), 248 deletions(-)
> >> >
> >> > diff --git a/libstdc++-v3/include/bits/deque.tcc
> b/libstdc++-v3/include/bits/deque.tcc
> >> > index dabb6ec5365..b70eed69294 100644
> >> > --- a/libstdc++-v3/include/bits/deque.tcc
> >> > +++ b/libstdc++-v3/include/bits/deque.tcc
> >> > @@ -63,6 +63,40 @@ namespace std _GLIBCXX_VISIBILITY(default)
> >> >  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> >  _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >> >
> >> > +  template
> >> > +struct
> >>
> >> No new line here, just "struct _Deque_base...".
> >>
> >> > +_Deque_base<_Tp, _Alloc>::_Guard_nodes
> >> > +  {
> >> > +   _Guard_nodes(_Deque_base& __self,
> >> > +_Map_pointer __first, _Map_pointer __last)
> >> > +   : _M_self(__self), _M_first(__first), _M_last(__last)
> >> > +   { }
> >> > +
> >> > +   ~_Guard_nodes()
> >> > +   { _M_self._M_destroy_nodes(_M_first, _M_last); }
> >> > +
> >> > +   void _M_disarm()
> >> > +   { _M_first = _M_last; }
> >> > +
> >> > +   _Deque_base& _M_self;
> >> > +   _Map_pointer _M_first;
> >> > +   _Map_pointer _M_last;
> >> > +
> >> > +  private:
> >> > +   _Guard_nodes(_Guard_nodes const&);
> >> > +  };
> >> > +
> >> > +  template
> >> > +void
> >> > +_Deque_base<_Tp, _Alloc>::
> >> > +_M_create_nodes(_Map_pointer __nstart, _Map_pointer __nfinish)
> >> > +{
> >> > +  _Guard_nodes __guard(*this, __nstart, __nstart);
> >> > +  for (_Map_pointer& __cur = __guard._M_last; __cur < __nfinish;
> ++__cur)
> >> > +   *__cur = this->_M_allocate_node();
> >> > +  __guard._M_disarm();
> >> > +}
> >> > +
> >> >  #if __cplusplus >= 201103L
> >> >template 
> >> >  void
> >> > @@ -310,35 +344,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >> >if (__pos._M_cur == this->_M_impl._M_start._M_cur)
> >> > {
> >> >   iterator __new_start = _M_reserve_elements_at_front(__n);
> >> > - __try
> >> > -   {
> >> > - std::__uninitialized_fill_a(__new_start,
> this->_M_impl._M_start,
> >> > - __x, _M_get_Tp_allocator());
> >> > - this->_M_impl._M_start = __new_start;
> >> > -   }
> >> > - __catch(...)
> >> > -   {
> >> > - _M_destroy_nodes(__new_start._M_node,
> >> > -  this->_M_impl._M_start._M_node);
> >> > - __throw_exception_again;
> >> > -   }
> >> > + _Guard_nodes __guard(*this, __new_start._M_node,
> >> > + this->_M_impl._M_start._M_node);
> >> > +
> >> > + std::__uninitialized_fill_a(__new_start,
> this->_M_impl._M_start,
> >> > + __x, _M_get_Tp_allocator());
> >> > + __guard._M_disarm();
> >> > + this->_M

RE: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-09 Thread Tamar Christina



> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 9, 2025 8:31 AM
> To: Pengfei Li 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford 
> Subject: Re: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) 
> for
> vectors
> 
> On Thu, 8 May 2025, Pengfei Li wrote:
> 
> > This patch folds vector expressions of the form (x + y) >> 1 into
> > IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that
> > support averaging operations. For example, it can help improve the
> > codegen on AArch64 from:
> > add v0.4s, v0.4s, v31.4s
> > ushrv0.4s, v0.4s, 1
> > to:
> > uhadd   v0.4s, v0.4s, v31.4s
> >
> > As this folding is only valid when the most significant bit of each
> > element in both x and y is known to be zero, this patch checks leading
> > zero bits of elements in x and y, and extends get_nonzero_bits_1() to
> > handle uniform vectors. When the input is a uniform vector, the function
> > now returns the nonzero bits of its element.
> >
> > Additionally, this patch adds more checks to reject vector types in bit
> > constant propagation (tree-bit-ccp), since tree-bit-ccp was designed for
> > scalar values only, and the new vector logic in get_non_zero_bits_1()
> > could lead to incorrect propagation results.
> >
> > Bootstrapped and tested on aarch64-linux-gnu and x86_64_linux_gnu.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add folding rule for vector average.
> > * tree-ssa-ccp.cc (get_default_value): Reject vector types.
> > (evaluate_stmt): Reject vector types.
> > * tree-ssanames.cc (get_nonzero_bits_1): Extend to handle
> > uniform vectors.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/acle/uhadd_1.c: New test.
> > ---
> >  gcc/match.pd  |  9 +
> >  .../gcc.target/aarch64/acle/uhadd_1.c | 34 +++
> >  gcc/tree-ssa-ccp.cc   |  8 ++---
> >  gcc/tree-ssanames.cc  |  8 +
> >  4 files changed, 55 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index ab496d923cc..ddd16a10944 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -2177,6 +2177,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (view_convert (rshift (view_convert:ntype @0) @1))
> >  (convert (rshift (convert:ntype @0) @1))
> >
> > + /* Fold ((x + y) >> 1 into IFN_AVG_FLOOR (x, y) if x and y are vectors in
> > +which each element is known to have at least one leading zero bit.  */
> > +(simplify
> > + (rshift (plus:cs @0 @1) integer_onep)
> > + (if (VECTOR_TYPE_P (type)
> > +  && wi::clz (get_nonzero_bits (@0)) > 0
> > +  && wi::clz (get_nonzero_bits (@1)) > 0)
> > +  (IFN_AVG_FLOOR @0 @1)))
> 
> You need to check that IFN_AVG_FLOOR is supported using
> direct_internal_fn_supported_p here.
> 

Is this actually needed? The match.pd machinery already rejects it
If not supported.

For gimple you end up in maybe_push_res_to_seq in gimple-match-exports.cc
which calls build_call_internal which would refuse to build the call with NULL 
as
a result and stopping the simplification.

For generic you end up in maybe_build_call_expr_loc in tree.cc which also
fails with NULL_TREE if the IFN isn't supported.

I think the other usages of direct_internal_fn_supported_p are there because
they predate these additions.  Or am I missing something?

Regards,
Tamar

> Otherwise this is OK with me.
> 
> Richard.
> 
> > +
> >  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> > when profitable.
> > For bitwise binary operations apply operand conversions to the
> > diff --git a/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> > new file mode 100644
> > index 000..f1748a199ad
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> > @@ -0,0 +1,34 @@
> > +/* Test if SIMD fused unsigned halving adds are generated */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +#include 
> > +
> > +#define FUSED_SIMD_UHADD(vectype, q, ts, mask) \
> > +  vectype simd_uhadd ## q ## _ ## ts ## _1 (vectype a) \
> > +  { \
> > +vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
> > +vectype v2 = vdup ## q ## _n_ ## ts (mask); \
> > +return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
> > +  } \
> > +  \
> > +  vectype simd_uhadd ## q ## _ ## ts ## _2 (vectype a, vectype b) \
> > +  { \
> > +vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
> > +vectype v2 = vand ## q ## _ ## ts (b, vdup ## q ## _n_ ## ts (mask)); \
> > +return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
> > +  }
> > +
> > +FUSED_SIMD_UHADD (uint8x8_t, , u8, 0x7f)
> > +FUSED_SIMD_UHADD (uint8x16_t, q, u8, 0x7f)
> > +FUSED_SIMD_UHADD (uint16x4_t, , u16, 0x7fff)

Re: [PATCH 2/2] c++/modules: Remove unnecessary lazy_load_pendings

2025-05-09 Thread Nathaniel Shead
On Fri, May 09, 2025 at 08:18:58AM -0400, Jason Merrill wrote:
> On 4/21/25 6:22 AM, Nathaniel Shead wrote:
> > This call is not necessary, as we don't access the bodies of any classes
> > that we instantiate here.
> 
> This turns out to break
> 
> 20_util/function_objects/mem_fn/constexpr.cc
> std/ranges/view.cc
> 
> when modified to use import std (as attached).  For the former, I see
> 
> > In file included from 
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/stdc++.h:55,
> >  from 
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std.cc:30,
> > of module std, imported at 
> > /home/jason/gt/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc:21:
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional: In 
> > instantiation of ‘class std::_Mem_fn_base’:
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:211:12:
> >required from ‘struct std::_Mem_fn’
> >   211 | struct _Mem_fn<_Res _Class::*>
> >   |^~~
> > /home/jason/gt/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc:36:21:
> >required from here
> >36 |   return std::mem_fn(&F::i)(f);
> >   |  ~~~^~~
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:190:23:
> >  error: conflicting declaration of template ‘template > ... _BoundArgs> struct std::_Bind_check_arity’
> >   190 | friend struct _Bind_check_arity;
> >   |   ^
> > /home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:834:12:
> >  note: previous declaration ‘template 
> > struct std::_Bind_check_arity’
> >   834 | struct _Bind_check_arity { };
> >   |^
> 
> lookup_imported_hidden_friend is failing without the lazy_load_pendings, so
> we try and fail to push the instantiation.  Reverting this patch makes them
> pass.
> 
> Jason

Interesting, apparently I hadn't tested sufficiently: thanks for this.
I'll reduce a testcase.

Nathaniel


RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 9, 2025 11:08 AM
> To: Richard Sandiford 
> Cc: Pengfei Li ; gcc-patches@gcc.gnu.org;
> ktkac...@nvidia.com
> Subject: Re: [PATCH] vect: Improve vectorization for small-trip-count loops 
> using
> subvectors
> 
> On Fri, 9 May 2025, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Thu, 8 May 2025, Pengfei Li wrote:
> > >
> > >> This patch improves the auto-vectorization for loops with known small
> > >> trip counts by enabling the use of subvectors - bit fields of original
> > >> wider vectors. A subvector must have the same vector element type as the
> > >> original vector and enough bits for all vector elements to be processed
> > >> in the loop. Using subvectors is beneficial because machine instructions
> > >> operating on narrower vectors usually show better performance.
> > >>
> > >> To enable this optimization, this patch introduces a new target hook.
> > >> This hook allows the vectorizer to query the backend for a suitable
> > >> subvector type given the original vector type and the number of elements
> > >> to be processed in the small-trip-count loop. The target hook also has a
> > >> could_trap parameter to say if the subvector is allowed to have more
> > >> bits than needed.
> > >>
> > >> This optimization is currently enabled for AArch64 only. Below example
> > >> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
> > >> higher instruction throughput.
> > >>
> > >> Consider this loop operating on an array of 16-bit integers:
> > >>
> > >>  for (int i = 0; i < 5; i++) {
> > >>a[i] = a[i] < 0 ? -a[i] : a[i];
> > >>  }
> > >>
> > >> Before this patch, the generated AArch64 code would be:
> > >>
> > >>  ptrue   p7.h, vl5
> > >>  ptrue   p6.b, all
> > >>  ld1hz31.h, p7/z, [x0]
> > >>  abs z31.h, p6/m, z31.h
> > >>  st1hz31.h, p7, [x0]
> > >
> > > p6.b has all lanes active - why is the abs then not
> > > simply unmasked?
> >
> > There is no unpredicated abs for SVE.  The predicate has to be there,
> > and so expand introduces one even when the gimple stmt is unconditional.
> >
> > >> After this patch, it is optimized to:
> > >>
> > >>  ptrue   p7.h, vl5
> > >>  ld1hz31.h, p7/z, [x0]
> > >>  abs v31.8h, v31.8h
> > >>  st1hz31.h, p7, [x0]
> > >
> > > Help me decipher this - I suppose z31 and v31 "overlap" in the
> > > register file?  And z31 is a variable-length vector but
> > > z31.8h is a 8 element fixed length vector?  How can we
> >
> > v31.8h, but otherwise yes.
> >
> > > end up with just 8 elements here?  From the upper interation
> > > bound?
> >
> > Yeah.
> >
> > > I'm not sure why you need any target hook here.  It seems you
> > > do already have suitable vector modes so why not just ask
> > > for a suitable vector?  Is it because you need to have
> > > that register overlap guarantee (otherwise you'd get
> > > a move)?
> >
> > Yeah, the optimisation only makes sense for overlaid vector registers.
> >
> > > Why do we not simply use fixed-length SVE here in the first place?
> >
> > Fixed-length SVE is restricted to cases where the exact runtime length
> > is known: the compile-time length is both a minimum and a maximum.
> > In contrast, the code above would work even for 256-bit SVE.
> >
> > > To me doing this in this way in the vectorizer looks
> > > somewhat out-of-place.
> > >
> > > That said, we already have unmasked ABS in the IL:
> > >
> > >   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0,
> > > 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
> > >   vect__2.7_16 = ABSU_EXPR ;
> > >   vect__3.8_17 = VIEW_CONVERT_EXPR int>(vect__2.7_16);
> > >   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > > 0, 0, ... }, vect__3.8_17); [tail call]
> > >
> > > so what's missing here?  I suppose having a constant masked ABSU here
> > > would allow RTL expansion to select a fixed-size mode?
> > >
> > > And the vectorizer could simply use the existing
> > > related_vector_mode hook instead?
> >
> > I agree it's a bit awkward.  The problem is that we want conflicting
> > things.  On the one hand, it would make conceptual sense to use SVE
> > instructions to provide conditional optabs for Advanced SIMD vector modes.
> > E.g. SVE's LD1W could act as a predicated load for an Advanced SIMD
> > int32x4_t vector.  The main problem with that is that Advanced SIMD's
> > native boolean vector type is an integer vector of 0s and -1s, rather
> > than an SVE predicate.  For some (native Advanced SIMD) operations we'd
> > want one type of boolean, for some (SVE emulating Advanced SIMD)
> > operations we'd want the other type of boolean.
> >
> > The patch goes the other way and treats using Advanced SIMD as an
> > optimisation for SVE loops.
> >
> > related_vector_mode suffers from the same problem.  If we ask for a
> > vector mode of >=5 halfwords for a load or store, we want the SVE mode,
> > since that can be conditional on an SVE predicate.  

Re: [PATCH] c++: Only reject virtual base data member access in __builtin_offsetof [PR118346]

2025-05-09 Thread Simon Martin
Hi Jason,

On Tue May 6, 2025 at 10:36 PM CEST, Jason Merrill wrote:
> On 5/6/25 3:05 PM, Simon Martin wrote:
>> The following test case highlights two issues - see
>> https://godbolt.org/z/7E1KGYreh:
>>   1. We error out at both L4 and L5, while (at least) clang, EDG and MSVC
>>  only reject L5
>>   2. Our error message for L5 incorrectly mentions using a null pointer
>> 
>> === cut here ===
>> struct A { int i; };
>> struct C : virtual public A { };
>> void foo () {
>>auto res = ((C*)0)->i;  // L4
>
> I assume you meant to include a & in there.
I did not, but I agree it'd make the code a bit less nonsensical. I'll
make sure to check that the error/warnings we might emit with a & make
sense in the next iteration.

>>__builtin_offsetof (C, i);  // L5
>> }
>> === cut here ===
>> 
>> I am not aware of anything in the standard that'd make L4 invalid,
>
> I think it's undefined under https://eel.is/c++draft/expr#ref-9 and/or 
> https://eel.is/c++draft/expr#unary.op-1
>
> There's also no reasonable answer, because the offset of a virtual base 
> depends on the most-derived type of the object.  This is a bad question 
> to ask, which is why it's ill-formed for offsetof.  I wonder why the OP 
> wants to write this code?
It's definitely weird code; surprisingly enough none of clang, EDG or
MSVC emit any warning for it (https://godbolt.org/z/z7rYjzEWo).

> But I guess pedantically we shouldn't error out on code with undefined 
> behavior, and the diagnostic for L4 should only be a warning.
The fact that we should not error out was my thinking as well, but you
are right that we should warn the user about what they're doing.

I'll rework the patch to warn for L4, and error out for L5 without
talking about any null pointer.

Thanks,
  Simon


Re: [PATCH] Fix wrong optimization of complex boolean expression

2025-05-09 Thread Richard Biener
On Fri, May 9, 2025 at 11:49 AM Eric Botcazou  wrote:
>
> Hi,
>
> this is a regression introduced on the mainline, 15 and 14 branches by:
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639303.html
> although one may consider that the problem was latent before.
>
> The VRP2 pass turns:
>
>   # prephitmp_3 = PHI <0(4)>
>   _1 = prephitmp_3 == 0;
>   _5 = stretch_14(D) ^ 1;
>   _39 = _1 & _5;
>   _40 = _39 | last_20(D);
>
> into
>
>   _5 = stretch_14(D) ^ 1;
>   _42 = ~stretch_14(D);
>   _39 = _42;
>   _40 = last_20(D) | _39;
>
> using the following step:
>
> Folding statement: _1 = prephitmp_3 == 0;
> Queued stmt for removal.  Folds to: 1
> Folding statement: _5 = stretch_14(D) ^ 1;
> Not folded
> Folding statement: _39 = _1 & _5;
> gimple_simplified to _42 = ~stretch_14(D);
> _39 = _42 & 1;
> Folded into: _39 = _42;
>
> Folding statement: _40 = _39 | last_20(D);
> Folded into: _40 = last_20(D) | _39;
>
> but stretch_14 is a 8-bit boolean so the two forms are not equivalent, that is
> to say dropping the "& 1" is wrong.  It's another instance of the issue at:
>   https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558537.html
>
> "The problem is the bitwise/logical dichotomy for operators and the transition
> from the former to the latter for boolean types: if they are 1-bit, that's
> straightforward but, if they are larger, then you need to be careful because
> you cannot, on the one hand, turn a bitwise AND into a logical AND and, on the
> other hand, *not* turn e.g. a bitwise NOT into a logical NOT if they occur in
> the same computation, as the first change will drop the masking that may need
> to be applied after the bitwise NOT if it is not also changed."
>
> Here it's the reverse case: the bitwise NOT (~) is treated as logical by the
> machinery in range-op.cc but the bitwise OR (|) is *not* treated as logical by
> that of vr-values.cc, leading to the same problematic outcome.
>
> Tested on x86-64/Linux, OK for the mainline, 15 and 14 branches?

OK.

Thanks,
Richard.

>
>
> 2025-05-09  Eric Botcazou  
>
> * vr-values.cc (simplify_using_ranges::simplify) :
> Do not call simplify_bit_ops_using_ranges for boolean types whose
> precision is not 1.
>
>
> 2025-05-09  Eric Botcazou  
>
> * gnat.dg/opt106.adb: New test.
> * gnat.dg/opt106_pkg1.ads, gnat.dg/opt106_pkg1.adb: New helper.
> * gnat.dg/opt106_pkg2.ads, gnat.dg/opt106_pkg2.adb: Likewise.
>
> --
> Eric Botcazou


[PATCH 2/2] Move vector lowering to before vectorization

2025-05-09 Thread Richard Biener
The following moves vector lowering to before vectorization - in fact
to before DCE/forwprop and CSE.  This gets us the chance to re-vectorize
the lowered form eventually.  Note that when the vectorizer learns to
handle vector code vector lowering should be likely integrated with
vectorization itself.

The main goal is to not have the gap between vectorization and
vector lowering where optimize_vectors_before_lowering_p is
still false.  Because that has the chance to mess up vector code
emitted by the vectorizer into a form that requires lowering
which can be quite inefficient.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Any comments on this part?  Note canonicalize_math_after_vectorization_p
says "Return true if math operations that are beneficial only after
vectorization should be canonicalized." but is keyed on vector lowering
having run.  That might need adjustment now, possibly adding another
property?

Thanks,
Richard.

* passes.def (pass_lower_vector_ssa): Move from after loop
optimizations to after the initial reassoc.
---
 gcc/passes.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 3b251052e53..886023e57a0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -254,6 +254,7 @@ along with GCC; see the file COPYING3.  If not see
 program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
   NEXT_PASS (pass_reassoc, true /* early_p */);
+  NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop, /*last=*/false);
   NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -334,7 +335,6 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_slp_vectorize);
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_simduid_cleanup);
-  NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_cse_reciprocals);
-- 
2.43.0


Re: [PATCH] libstdc++: Use scope guard for deallocating nodes in deque.

2025-05-09 Thread Tomasz Kaminski
On Thu, May 8, 2025 at 7:46 PM Jonathan Wakely  wrote:

> On Fri, 18 Apr 2025 at 10:03, Tomasz Kamiński  wrote:
> >
> > This patch adds a _Guard_nodes scope guard nested to the _Deque_base,
> > that deallocates the range of nodes, and replaces __try/__catch block
> > with approparietly constructed guard object.
>
> "appropriately"
>
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/deque.tcc (_Deque_base<_Tp,
> _Alloc>::_Guard_nodes): Define.
>
> There's no need for the template argument list here, just
> "_Deque_base" is unambiguous (there's no partial or explicit
> specialization that could be disambiguated with template argument
> lists). And just "deque" below.
>
> > (_Deque_base<_Tp, _Alloc>::_M_create_nodes): Moved defintion
> from stl_deque.h
> > and replace __try/__catch with _Guard_nodes scope object.
> > (deque<_Tp, _Alloc>::_M_fill_insert, deque<_Tp,
> _Alloc>::_M_default_append)
> > (deque<_Tp, _Alloc>::_M_push_back_aux, deque<_Tp,
> _Alloc>::_M_push_front_aux)
> > (deque<_Tp, _Alloc>::_M_range_prepend, deque<_Tp,
> _Alloc>::_M_range_append)
> > (deque<_Tp, _Alloc>::_M_insert_aux): Replace __try/__catch with
> _Guard_nodes
> > scope object.
> > (deque<_Tp, _Alloc>::_M_new_elements_at_back)
> > (deque<_Tp, _Alloc>::_M_new_elements_at_back): Use
> _M_create_nodes.
> > * include/bits/stl_deque.h (_Deque_base<_Tp,
> _Alloc>::_Guard_nodes): Declare.
> > (_Deque_base<_Tp, _Alloc)::_M_create_nodes): Move defintion to
> deque.tcc.
> > (deque<_Tp, _Alloc>::_Guard_nodes): Add typedef, so name is
> found by lookup.
> > ---
> > Testing x86_64-linux, default test configuration passed.
> > OK for trunk?
> >
> >  libstdc++-v3/include/bits/deque.tcc   | 424 --
> >  libstdc++-v3/include/bits/stl_deque.h |  20 +-
> >  2 files changed, 196 insertions(+), 248 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/deque.tcc
> b/libstdc++-v3/include/bits/deque.tcc
> > index dabb6ec5365..b70eed69294 100644
> > --- a/libstdc++-v3/include/bits/deque.tcc
> > +++ b/libstdc++-v3/include/bits/deque.tcc
> > @@ -63,6 +63,40 @@ namespace std _GLIBCXX_VISIBILITY(default)
> >  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >
> > +  template
> > +struct
>
> No new line here, just "struct _Deque_base...".
>
> > +_Deque_base<_Tp, _Alloc>::_Guard_nodes
> > +  {
> > +   _Guard_nodes(_Deque_base& __self,
> > +_Map_pointer __first, _Map_pointer __last)
> > +   : _M_self(__self), _M_first(__first), _M_last(__last)
> > +   { }
> > +
> > +   ~_Guard_nodes()
> > +   { _M_self._M_destroy_nodes(_M_first, _M_last); }
> > +
> > +   void _M_disarm()
> > +   { _M_first = _M_last; }
> > +
> > +   _Deque_base& _M_self;
> > +   _Map_pointer _M_first;
> > +   _Map_pointer _M_last;
> > +
> > +  private:
> > +   _Guard_nodes(_Guard_nodes const&);
> > +  };
> > +
> > +  template
> > +void
> > +_Deque_base<_Tp, _Alloc>::
> > +_M_create_nodes(_Map_pointer __nstart, _Map_pointer __nfinish)
> > +{
> > +  _Guard_nodes __guard(*this, __nstart, __nstart);
> > +  for (_Map_pointer& __cur = __guard._M_last; __cur < __nfinish;
> ++__cur)
> > +   *__cur = this->_M_allocate_node();
> > +  __guard._M_disarm();
> > +}
> > +
> >  #if __cplusplus >= 201103L
> >template 
> >  void
> > @@ -310,35 +344,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >if (__pos._M_cur == this->_M_impl._M_start._M_cur)
> > {
> >   iterator __new_start = _M_reserve_elements_at_front(__n);
> > - __try
> > -   {
> > - std::__uninitialized_fill_a(__new_start,
> this->_M_impl._M_start,
> > - __x, _M_get_Tp_allocator());
> > - this->_M_impl._M_start = __new_start;
> > -   }
> > - __catch(...)
> > -   {
> > - _M_destroy_nodes(__new_start._M_node,
> > -  this->_M_impl._M_start._M_node);
> > - __throw_exception_again;
> > -   }
> > + _Guard_nodes __guard(*this, __new_start._M_node,
> > + this->_M_impl._M_start._M_node);
> > +
> > + std::__uninitialized_fill_a(__new_start,
> this->_M_impl._M_start,
> > + __x, _M_get_Tp_allocator());
> > + __guard._M_disarm();
> > + this->_M_impl._M_start = __new_start;
> > }
> >else if (__pos._M_cur == this->_M_impl._M_finish._M_cur)
> > {
> >   iterator __new_finish = _M_reserve_elements_at_back(__n);
> > - __try
> > -   {
> > - std::__uninitialized_fill_a(this->_M_impl._M_finish,
> > - __new_finish, __x,
> > - _M_get_Tp_allocator());
> > - this

[PATCH v2] libstdc++: Use scope guard for deallocating nodes in deque.

2025-05-09 Thread Tomasz Kamiński
This patch adds a _Guard_nodes scope guard nested to the _Deque_base,
that deallocates the range of nodes, and replaces __try/__catch block
with appropriately constructed guard object.

libstdc++-v3/ChangeLog:

* include/bits/deque.tcc (_Deque_base::_Guard_nodes): Define.
(_Deque_base::_M_create_nodes): Moved defintion from stl_deque.h
and replace __try/__catch with _Guard_nodes scope object.
(deque::_M_fill_insert, deque::_M_default_append)
(deque::_M_push_back_aux, deque::_M_push_front_aux)
(deque::_M_range_prepend, deque::_M_range_append, deque::_M_insert_aux):
Replace __try/__catch with _Guard_nodes scope object.
(deque::_M_new_elements_at_back, deque::_M_new_elements_at_back): Use
_M_create_nodes.
* include/bits/stl_deque.h (_Deque_base::_Guard_nodes): Declare.
(_Deque_base<_Tp, _Alloc)::_M_create_nodes): Move defintion to 
deque.tcc.
(deque::_Guard_nodes): Add typedef, so name is found by lookup.
* testsuite/23_containers/deque/modifiers/push_back/throw.cc: New test.
---
Fixed the off-by-one error in _M_push_back_aux and added test that reliably 
detecst it.
Updated description and removed new line before struct.
Tested on x86_64 linux. Separately tested push_back/throw.cc with all standards.
OK for trunk?


 libstdc++-v3/include/bits/deque.tcc   | 423 --
 libstdc++-v3/include/bits/stl_deque.h |  20 +-
 .../deque/modifiers/push_back/throw.cc|  56 +++
 3 files changed, 251 insertions(+), 248 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/deque/modifiers/push_back/throw.cc

diff --git a/libstdc++-v3/include/bits/deque.tcc 
b/libstdc++-v3/include/bits/deque.tcc
index dabb6ec5365..71c4f13170a 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -63,6 +63,39 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
+  template
+struct _Deque_base<_Tp, _Alloc>::_Guard_nodes
+{
+  _Guard_nodes(_Deque_base& __self,
+  _Map_pointer __first, _Map_pointer __last)
+  : _M_self(__self), _M_first(__first), _M_last(__last)
+  { }
+
+  ~_Guard_nodes()
+  { _M_self._M_destroy_nodes(_M_first, _M_last); }
+
+  void _M_disarm()
+  { _M_first = _M_last; }
+
+  _Deque_base& _M_self;
+  _Map_pointer _M_first;
+  _Map_pointer _M_last;
+
+ private:
+   _Guard_nodes(_Guard_nodes const&);
+ };
+
+  template
+void
+_Deque_base<_Tp, _Alloc>::
+_M_create_nodes(_Map_pointer __nstart, _Map_pointer __nfinish)
+{
+  _Guard_nodes __guard(*this, __nstart, __nstart);
+  for (_Map_pointer& __cur = __guard._M_last; __cur < __nfinish; ++__cur)
+   *__cur = this->_M_allocate_node();
+  __guard._M_disarm();
+}
+
 #if __cplusplus >= 201103L
   template 
 void
@@ -310,35 +343,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   if (__pos._M_cur == this->_M_impl._M_start._M_cur)
{
  iterator __new_start = _M_reserve_elements_at_front(__n);
- __try
-   {
- std::__uninitialized_fill_a(__new_start, this->_M_impl._M_start,
- __x, _M_get_Tp_allocator());
- this->_M_impl._M_start = __new_start;
-   }
- __catch(...)
-   {
- _M_destroy_nodes(__new_start._M_node,
-  this->_M_impl._M_start._M_node);
- __throw_exception_again;
-   }
+ _Guard_nodes __guard(*this, __new_start._M_node,
+ this->_M_impl._M_start._M_node);
+
+ std::__uninitialized_fill_a(__new_start, this->_M_impl._M_start,
+ __x, _M_get_Tp_allocator());
+ __guard._M_disarm();
+ this->_M_impl._M_start = __new_start;
}
   else if (__pos._M_cur == this->_M_impl._M_finish._M_cur)
{
  iterator __new_finish = _M_reserve_elements_at_back(__n);
- __try
-   {
- std::__uninitialized_fill_a(this->_M_impl._M_finish,
- __new_finish, __x,
- _M_get_Tp_allocator());
- this->_M_impl._M_finish = __new_finish;
-   }
- __catch(...)
-   {
- _M_destroy_nodes(this->_M_impl._M_finish._M_node + 1,
-  __new_finish._M_node + 1);
- __throw_exception_again;
-   }
+ _Guard_nodes __guard(*this, this->_M_impl._M_finish._M_node + 1,
+ __new_finish._M_node + 1);
+
+ std::__uninitialized_fill_a(this->_M_impl._M_finish,
+ __new_finish, __x,
+ _M_get_Tp_allocator());
+ __guard._M_disarm();
+ this->_M_impl._M_finis

Re: [PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Jakub Jelinek
On Fri, May 09, 2025 at 09:17:23AM +0200, Richard Biener wrote:
> RTL DSE forms store groups from unique invariant bases but that is
> confused when presented with constant addresses where it assigns
> one store group per unique address.  That causes it to not consider
> 0x101:QI to alias 0x100:SI.  Constant accesses can really alias
> to every object, in practice they appear for I/O and for access
> to objects fixed via linker scripts for example.  So simply avoid
> registering a store group for them.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR rtl-optimization/120182
>   * dse.cc (canon_address): Constant addresses have no
>   separate store group.
> 
>   * gcc.dg/torture/pr120182.c: New testcase.
> ---
>  gcc/dse.cc  |  5 ++-
>  gcc/testsuite/gcc.dg/torture/pr120182.c | 42 +
>  2 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr120182.c
> 
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index ffc86ffabe5..2b99c660a9a 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1190,7 +1190,10 @@ canon_address (rtx mem,
>address = strip_offset_and_add (address, offset);
>  
>if (ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (mem))
> -   && const_or_frame_p (address))
> +   && const_or_frame_p (address)
> +   /* Literal addresses can alias any base, avoid creating a
> +  group for them.  */
> +   && ! CONST_INT_P (address))

Perhaps better CONST_SCALAR_INT_P instead of CONST_INT_P?
Otherwise LGTM.

Jakub



[PATCH] fortran: Fix up minloc/maxloc lowering [PR120191]

2025-05-09 Thread Jakub Jelinek
On Fri, May 09, 2025 at 06:18:40PM +0300, Daniil Kochergin wrote:
> PR fortran/120191
> 
> * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc):
> 
> Call strip_kind_from_actual unconditionally.
> 
> 
> * gfortran.dg/pr120191.f90: New test.

This unfortunately only fixes some of the cases in the new testcase.

We indeed should drop the kind argument from what is passed to the
library, but need to do it not only when one uses the argument name
for it (so kind=4 etc.) but also when one passes all the arguments
to the intrinsics.

The following patch uses what gfc_conv_intrinsic_findloc uses,
which looks more efficient and cleaner, we already set automatic
vars to point to the kind and back actual arguments, so we can just
free/clear expr on the former and set name to "%VAL" on the latter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

PR fortran/120191
* trans-intrinsic.cc (strip_kind_from_actual): Remove.
(gfc_conv_intrinsic_minmaxloc): Don't call strip_kind_from_actual.
Free and clear kind_arg->expr if non-NULL.  Set back_arg->name to
"%VAL" instead of a loop looking for last argument.

* gfortran.dg/pr120191.f90: New test.

Co-Authored-By: Daniil Kochergin 

--- gcc/fortran/trans-intrinsic.cc.jj   2025-04-22 21:26:15.772920190 +0200
+++ gcc/fortran/trans-intrinsic.cc  2025-05-09 17:41:27.323962631 +0200
@@ -4715,22 +4715,6 @@ maybe_absent_optional_variable (gfc_expr
 }
 
 
-/* Remove unneeded kind= argument from actual argument list when the
-   result conversion is dealt with in a different place.  */
-
-static void
-strip_kind_from_actual (gfc_actual_arglist * actual)
-{
-  for (gfc_actual_arglist *a = actual; a; a = a->next)
-{
-  if (a && a->name && strcmp (a->name, "kind") == 0)
-   {
- gfc_free_expr (a->expr);
- a->expr = NULL;
-   }
-}
-}
-
 /* Emit code for minloc or maxloc intrinsic.  There are many different cases
we need to handle.  For performance reasons we sometimes create two
loops instead of one, where the second one is much simpler.
@@ -4954,14 +4938,16 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * s
   bool dim_present = dim_arg->expr != nullptr;
   bool nested_loop = dim_present && expr->rank > 0;
 
-  /* The last argument, BACK, is passed by value. Ensure that
- by setting its name to %VAL. */
-  for (gfc_actual_arglist *a = actual; a; a = a->next)
+  /* Remove kind.  */
+  if (kind_arg->expr)
 {
-  if (a->next == NULL)
-   a->name = "%VAL";
+  gfc_free_expr (kind_arg->expr);
+  kind_arg->expr = NULL;
 }
 
+  /* Pass BACK argument by value.  */
+  back_arg->name = "%VAL";
+
   if (se->ss)
 {
   if (se->ss->info->useflags)
@@ -4993,7 +4979,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * s
   gcc_assert (expr->rank == 0);
 
   gfc_actual_arglist *a = actual;
-  strip_kind_from_actual (a);
   while (a)
{
  if (a->name && strcmp (a->name, "dim") == 0)
--- gcc/testsuite/gfortran.dg/pr120191.f90.jj   2025-05-09 17:19:07.905018604 
+0200
+++ gcc/testsuite/gfortran.dg/pr120191.f90  2025-05-09 17:19:07.905018604 
+0200
@@ -0,0 +1,614 @@
+! PR fortran/120191
+! { dg-do run }
+
+  integer(kind=1) :: a1(10, 10, 10), b1(10)
+  integer(kind=2) :: a2(10, 10, 10), b2(10)
+  integer(kind=4) :: a4(10, 10, 10), b4(10)
+  integer(kind=8) :: a8(10, 10, 10), b8(10)
+  real(kind=4) :: r4(10, 10, 10), s4(10)
+  real(kind=8) :: r8(10, 10, 10), s8(10)
+  logical :: l1(10, 10, 10), l2(10), l3
+  l1 = .true.
+  l2 = .true.
+  l3 = .true.
+  a1 = 0
+  if (any (maxloc (a1) .ne. 1)) stop 1
+  if (any (maxloc (a1, back=.false.) .ne. 1)) stop 2
+  if (any (maxloc (a1, back=.true.) .ne. 10)) stop 3
+  if (any (maxloc (a1, kind=2) .ne. 1)) stop 4
+  if (any (maxloc (a1, kind=4, back=.false.) .ne. 1)) stop 5
+  if (any (maxloc (a1, kind=8, back=.true.) .ne. 10)) stop 6
+  if (any (maxloc (a1, 1) .ne. 1)) stop 7
+  if (any (maxloc (a1, 1, back=.false.) .ne. 1)) stop 8
+  if (any (maxloc (a1, 1, back=.true.) .ne. 10)) stop 9
+  if (any (maxloc (a1, 1, kind=1) .ne. 1)) stop 10
+  if (any (maxloc (a1, 1, kind=2, back=.false.) .ne. 1)) stop 11
+  if (any (maxloc (a1, 1, kind=4, back=.true.) .ne. 10)) stop 12
+  if (any (maxloc (a1, 1, l1) .ne. 1)) stop 13
+  if (any (maxloc (a1, 1, l1, back=.false.) .ne. 1)) stop 14
+  if (any (maxloc (a1, 1, l1, back=.true.) .ne. 10)) stop 15
+  if (any (maxloc (a1, 1, l1, kind=8) .ne. 1)) stop 16
+  if (any (maxloc (a1, 1, l1, 4, .false.) .ne. 1)) stop 17
+  if (any (maxloc (a1, 1, l1, 2, .true.) .ne. 10)) stop 18
+  if (any (maxloc (a1, 1, l3) .ne. 1)) stop 19
+  if (any (maxloc (a1, 1, l3, back=.false.) .ne. 1)) stop 20
+  if (any (maxloc (a1, 1, l3, back=.true.) .ne. 10)) stop 21
+  if (any (maxloc (a1, 1, l3, kind=8) .ne. 1)) stop 22
+  if (any (maxloc (a1, 1, l3, 4, .false.) .ne. 1)) stop 23
+  if (any (maxloc (a1, 1, l3, 2, .true.) .ne. 10)) stop 24
+  b1 = 0
+  if (any (maxloc (b1) .ne. 1)) stop 1
+  if 

Re: [PATCH] match: Remove (ne (cmp) 0) and (eq (cmp) 1) patterns

2025-05-09 Thread Andrew Pinski
On Fri, May 9, 2025 at 4:34 AM Andrew Pinski  wrote:
>
> On Fri, May 9, 2025 at 1:21 AM Richard Biener
>  wrote:
> >
> > On Fri, May 9, 2025 at 4:51 AM Andrew Pinski  
> > wrote:
> > >
> > > These patterns are not needed any more. There were already
> > > 2 patterns which did `(ne bool_var 0)` into `bool_var` and
> > > `(eq bool_var 1)` into `bool_var`. Just they were after the
> > > pattern that did `(cmp (cond @0 @1 @2) @3)` simplification but
> > > that pattern is now after the ones.
> > > Also these patterns will cause in some cases a new statement to
> > > be created for the comparison. In the case of floating point comparison
> > > wiht non-call exceptions (and trapping math), can cause a new statement
> > > every time fold_stmt is called.
> >
> > Hmm, but do we still fold
> >
> >   _1 = _2 < 1;
> >   if (_1 != 0)
> >
> > to
> >
> >   if (_2 < 1)
> >
> > or does that now again rely on forwprops explicit forwarding into
> > gcond?  I wanted
> > to get rid of the latter eventually.
>
> Oh. Yes this does rely on forwprop explicitly now.

On the subject of removing that part of forwprop, right now if you
remove it you will end up with an infinite loop in forwprop.
This is because right now the return value of
forward_propagate_into_gimple_cond overwrites the changed variable.
And if you remove the forward_propagate_into_gimple_cond call,
fold_stmt will return true all the time for the non-call exceptions
throwing comparison case.
This patch (and the few other gimple-fold.cc patches which all have
been approved and pushed) have been moving towards fixing that
infinite loop even.
The one piece of forward_propagate_into_gimple_cond that might be easy
to move to fold_stmt is the `Canonicalize _Bool == 0 and _Bool != 1 to
_Bool != 0`.
And then after my new patch for the non-call exception issue (that
does not remove the above pattern), I am going to see if I can remove
forward_propagate_into_gimple_cond too.

Thanks,
Andrew Pinski


>
> >
> > I agree that the trapping math thing is bad - I wonder if we can catch that 
> > more
> > intelligently (not sure how without following SSA use-def of gconds on bools
> > and see whether they can trap and then not simplifying)
>
> I think I know the way to fix the trapping issue without fully
> removing this. I am going to give it a go later today.
> Since trapping only depends on the code and the type it should be easy
> to add an extra condition here and the latter patterns catch the
> trapping case of removing `bool!=0` already.
>
> Thanks,
> Andrew Pinski
>
> >
> > > gcc.dg/tree-ssa/vrp24.c needed to be adjusted to before 
> > > r13-322-g7f04b0d786e13f.
> > > gcc.dg/analyzer/null-deref-pr102671-2.c needs an increased 
> > > analyzer-max-svalue-depth
> > > not to get an extra warning.
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Remove.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/vrp24.c: Adjust.
> > > * gcc.dg/analyzer/null-deref-pr102671-2.c: Increase 
> > > analyzer-max-svalue-depth.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/match.pd  | 8 
> > >  gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c | 2 +-
> > >  gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +-
> > >  3 files changed, 2 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index ab496d923cc..418efc4230a 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6898,14 +6898,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  (if (ic == ncmp)
> > >   (ncmp @0 @1)
> > >   /* The following bits are handled by 
> > > fold_binary_op_with_conditional_arg.  */
> > > - (simplify
> > > -  (ne (cmp@2 @0 @1) integer_zerop)
> > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > -   (cmp @0 @1)))
> > > - (simplify
> > > -  (eq (cmp@2 @0 @1) integer_truep)
> > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > -   (cmp @0 @1)))
> > >   (simplify
> > >(ne (cmp@2 @0 @1) integer_truep)
> > >(if (types_match (type, TREE_TYPE (@2)))
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c 
> > > b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > > index 298e4839b98..bc141d5c028 100644
> > > --- a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-require-effective-target ptr_eq_long } */
> > > -/* { dg-additional-options "-O2 -Wno-shift-count-overflow" } */
> > > +/* { dg-additional-options "-O2 -Wno-shift-count-overflow 
> > > --param=analyzer-max-svalue-depth=19" } */
> > >
> > >  struct lisp;
> > >  union vectorlike_header { long size; };
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> > > index c28ca473fc6..f237b7741ec 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> > > +++ b/gcc/testsuite/

RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 9, 2025 2:44 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; Pengfei Li
> ; gcc-patches@gcc.gnu.org; ktkac...@nvidia.com
> Subject: RE: [PATCH] vect: Improve vectorization for small-trip-count loops 
> using
> subvectors
> 
> On Fri, 9 May 2025, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, May 9, 2025 11:08 AM
> > > To: Richard Sandiford 
> > > Cc: Pengfei Li ; gcc-patches@gcc.gnu.org;
> > > ktkac...@nvidia.com
> > > Subject: Re: [PATCH] vect: Improve vectorization for small-trip-count 
> > > loops
> using
> > > subvectors
> > >
> > > On Fri, 9 May 2025, Richard Sandiford wrote:
> > >
> > > > Richard Biener  writes:
> > > > > On Thu, 8 May 2025, Pengfei Li wrote:
> > > > >
> > > > >> This patch improves the auto-vectorization for loops with known small
> > > > >> trip counts by enabling the use of subvectors - bit fields of 
> > > > >> original
> > > > >> wider vectors. A subvector must have the same vector element type as 
> > > > >> the
> > > > >> original vector and enough bits for all vector elements to be 
> > > > >> processed
> > > > >> in the loop. Using subvectors is beneficial because machine 
> > > > >> instructions
> > > > >> operating on narrower vectors usually show better performance.
> > > > >>
> > > > >> To enable this optimization, this patch introduces a new target hook.
> > > > >> This hook allows the vectorizer to query the backend for a suitable
> > > > >> subvector type given the original vector type and the number of 
> > > > >> elements
> > > > >> to be processed in the small-trip-count loop. The target hook also 
> > > > >> has a
> > > > >> could_trap parameter to say if the subvector is allowed to have more
> > > > >> bits than needed.
> > > > >>
> > > > >> This optimization is currently enabled for AArch64 only. Below 
> > > > >> example
> > > > >> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
> > > > >> higher instruction throughput.
> > > > >>
> > > > >> Consider this loop operating on an array of 16-bit integers:
> > > > >>
> > > > >>  for (int i = 0; i < 5; i++) {
> > > > >>a[i] = a[i] < 0 ? -a[i] : a[i];
> > > > >>  }
> > > > >>
> > > > >> Before this patch, the generated AArch64 code would be:
> > > > >>
> > > > >>  ptrue   p7.h, vl5
> > > > >>  ptrue   p6.b, all
> > > > >>  ld1hz31.h, p7/z, [x0]
> > > > >>  abs z31.h, p6/m, z31.h
> > > > >>  st1hz31.h, p7, [x0]
> > > > >
> > > > > p6.b has all lanes active - why is the abs then not
> > > > > simply unmasked?
> > > >
> > > > There is no unpredicated abs for SVE.  The predicate has to be there,
> > > > and so expand introduces one even when the gimple stmt is unconditional.
> > > >
> > > > >> After this patch, it is optimized to:
> > > > >>
> > > > >>  ptrue   p7.h, vl5
> > > > >>  ld1hz31.h, p7/z, [x0]
> > > > >>  abs v31.8h, v31.8h
> > > > >>  st1hz31.h, p7, [x0]
> > > > >
> > > > > Help me decipher this - I suppose z31 and v31 "overlap" in the
> > > > > register file?  And z31 is a variable-length vector but
> > > > > z31.8h is a 8 element fixed length vector?  How can we
> > > >
> > > > v31.8h, but otherwise yes.
> > > >
> > > > > end up with just 8 elements here?  From the upper interation
> > > > > bound?
> > > >
> > > > Yeah.
> > > >
> > > > > I'm not sure why you need any target hook here.  It seems you
> > > > > do already have suitable vector modes so why not just ask
> > > > > for a suitable vector?  Is it because you need to have
> > > > > that register overlap guarantee (otherwise you'd get
> > > > > a move)?
> > > >
> > > > Yeah, the optimisation only makes sense for overlaid vector registers.
> > > >
> > > > > Why do we not simply use fixed-length SVE here in the first place?
> > > >
> > > > Fixed-length SVE is restricted to cases where the exact runtime length
> > > > is known: the compile-time length is both a minimum and a maximum.
> > > > In contrast, the code above would work even for 256-bit SVE.
> > > >
> > > > > To me doing this in this way in the vectorizer looks
> > > > > somewhat out-of-place.
> > > > >
> > > > > That said, we already have unmasked ABS in the IL:
> > > > >
> > > > >   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 
> > > > > 0, 0,
> > > > > 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
> > > > >   vect__2.7_16 = ABSU_EXPR ;
> > > > >   vect__3.8_17 = VIEW_CONVERT_EXPR > > int>(vect__2.7_16);
> > > > >   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 
> > > > > 0,
> > > > > 0, 0, ... }, vect__3.8_17); [tail call]
> > > > >
> > > > > so what's missing here?  I suppose having a constant masked ABSU here
> > > > > would allow RTL expansion to select a fixed-size mode?
> > > > >
> > > > > And the vectorizer could simply use the existing
> > > > > related_vector_mode hook instead?
> > > >
> > > > I agree it's a bit awkward. 

Re: [PATCH] match: Remove (ne (cmp) 0) and (eq (cmp) 1) patterns

2025-05-09 Thread Andrew Pinski
On Fri, May 9, 2025, 6:27 AM Richard Biener 
wrote:

> On Fri, May 9, 2025 at 1:34 PM Andrew Pinski  wrote:
> >
> > On Fri, May 9, 2025 at 1:21 AM Richard Biener
> >  wrote:
> > >
> > > On Fri, May 9, 2025 at 4:51 AM Andrew Pinski 
> wrote:
> > > >
> > > > These patterns are not needed any more. There were already
> > > > 2 patterns which did `(ne bool_var 0)` into `bool_var` and
> > > > `(eq bool_var 1)` into `bool_var`. Just they were after the
> > > > pattern that did `(cmp (cond @0 @1 @2) @3)` simplification but
> > > > that pattern is now after the ones.
> > > > Also these patterns will cause in some cases a new statement to
> > > > be created for the comparison. In the case of floating point
> comparison
> > > > wiht non-call exceptions (and trapping math), can cause a new
> statement
> > > > every time fold_stmt is called.
> > >
> > > Hmm, but do we still fold
> > >
> > >   _1 = _2 < 1;
> > >   if (_1 != 0)
> > >
> > > to
> > >
> > >   if (_2 < 1)
> > >
> > > or does that now again rely on forwprops explicit forwarding into
> > > gcond?  I wanted
> > > to get rid of the latter eventually.
> >
> > Oh. Yes this does rely on forwprop explicitly now.
> >
> > >
> > > I agree that the trapping math thing is bad - I wonder if we can catch
> that more
> > > intelligently (not sure how without following SSA use-def of gconds on
> bools
> > > and see whether they can trap and then not simplifying)
> >
> > I think I know the way to fix the trapping issue without fully
> > removing this. I am going to give it a go later today.
> > Since trapping only depends on the code and the type it should be easy
> > to add an extra condition here and the latter patterns catch the
> > trapping case of removing `bool!=0` already.
>
> Note it's really depending on context.
>
> _1 = _2 < 1.;
> _3 = _1 != 0;
>
> would be OK to fold to
>
> _3 = _2 < 1;
>


Right but as there is already a pattern below that does `bool_name != 0` to
`bool_name` which would catch the case. And with the recent patch to
gimple-fold.cc, not doing `(a cmp b) != 0` into `a cmp b` for trapping case
means fold_stmt will not keep on changing for gcond. And the gassign case
will just work as in above _3 assignment will become just `_3 = _1;` (yes
then will depend on copy prop but I think that is ok).

Thanks,
Andrew



> but not with the _1 != 0 in the gcond.  That's because gconds can't
> throw (and I think
> rightfully so).  In principle we should go full steam ahead to have
> single-operand
> gconds, just the boolean value.  Like we now do for COND_EXPRs.  But
> this unfortunately
> has very large fallout :/
>
> Thus the "workaround" for non-call-EH.  I believe any mitigation should be
> in
> the match-and-simplify plumbing that handles the gcond - which we already
> do,
> but the side-effect is the ping-pong you are observing.  Maybe we can do
> better in replace_stmt_with_simplification where we should hit(?)
>
>   else if (!inplace)
> {
>   tree res = maybe_push_res_to_seq (res_op, seq);
>   if (!res)
> return false;
>   gimple_cond_set_condition (cond_stmt, NE_EXPR, res,
>  build_zero_cst (TREE_TYPE (res)));
>
> and detect when the cond_stmt is SSA != 0 (or the reverse canonical form)
> and refuse to simplify if the simplification in 'res_op' is the same as the
> current definition of SSA?
>
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > > gcc.dg/tree-ssa/vrp24.c needed to be adjusted to before
> r13-322-g7f04b0d786e13f.
> > > > gcc.dg/analyzer/null-deref-pr102671-2.c needs an increased
> analyzer-max-svalue-depth
> > > > not to get an extra warning.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Remove.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/vrp24.c: Adjust.
> > > > * gcc.dg/analyzer/null-deref-pr102671-2.c: Increase
> analyzer-max-svalue-depth.
> > > >
> > > > Signed-off-by: Andrew Pinski 
> > > > ---
> > > >  gcc/match.pd  | 8 
> > > >  gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c | 2 +-
> > > >  gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +-
> > > >  3 files changed, 2 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index ab496d923cc..418efc4230a 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -6898,14 +6898,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >  (if (ic == ncmp)
> > > >   (ncmp @0 @1)
> > > >   /* The following bits are handled by
> fold_binary_op_with_conditional_arg.  */
> > > > - (simplify
> > > > -  (ne (cmp@2 @0 @1) integer_zerop)
> > > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > > -   (cmp @0 @1)))
> > > > - (simplify
> > > > -  (eq (cmp@2 @0 @1) integer_truep)
> > > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > > -   (cmp @0 @1)))
> > > >   (simplify
> > > >(ne (cmp@2 

[PATCH v2] RISC-V: Use vclmul for CRC expansion if available

2025-05-09 Thread Anton Blanchard
If the vector version of clmul (vclmul) is available and the scalar
one is not, use it for CRC expansion.

gcc/Changelog:

* config/riscv/bitmanip.md (crc_rev4): Check
TARGET_ZVBC.
(crc4): Likewise.
* config/riscv/riscv.cc (expand_crc_using_clmul): Emit code using
vclmul if TARGET_ZVBC.
(expand_reversed_crc_using_clmul): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/crc-builtin-zvbc.c: New test.

Signed-off-by: Anton Blanchard 
---
 gcc/config/riscv/bitmanip.md  |   5 +-
 gcc/config/riscv/riscv.cc | 110 +++---
 .../riscv/rvv/base/crc-builtin-zvbc.c |  66 +++
 3 files changed, 160 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/crc-builtin-zvbc.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d0919ece31f..86a8c5d5ed9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -1221,7 +1221,7 @@
  we can't keep it in 64 bit variable.)
  then use clmul instruction to implement the CRC,
  otherwise (TARGET_ZBKB) generate table based using brev.  */
-  if ((TARGET_ZBKC || TARGET_ZBC) && mode < word_mode)
+  if ((TARGET_ZBKC || TARGET_ZBC || TARGET_ZVBC) && mode < 
word_mode)
 expand_reversed_crc_using_clmul (mode, mode,
 operands);
   else if (TARGET_ZBKB)
@@ -1253,7 +1253,8 @@
  (match_operand:SUBX 3)]
  UNSPEC_CRC))]
   /* We don't support the case when data's size is bigger than CRC's size.  */
-  "(TARGET_ZBKC || TARGET_ZBC) && mode >= mode"
+  "(TARGET_ZBKC || TARGET_ZBC || TARGET_ZVBC)
+   && mode >= mode"
 {
   /* If we have the ZBC or ZBKC extension (ie, clmul) and
  it is possible to store the quotient within a single variable
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a0657323f65..13d6e157448 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13987,17 +13987,53 @@ expand_crc_using_clmul (scalar_mode crc_mode, 
scalar_mode data_mode,
   rtx data = gen_rtx_ZERO_EXTEND (word_mode, operands[2]);
   riscv_expand_op (XOR, word_mode, a0, crc, data);
 
-  if (TARGET_64BIT)
-emit_insn (gen_riscv_clmul_di (a0, a0, t0));
-  else
-emit_insn (gen_riscv_clmul_si (a0, a0, t0));
+  if (TARGET_ZBKC || TARGET_ZBC)
+{
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_clmul_di (a0, a0, t0));
+  else
+   emit_insn (gen_riscv_clmul_si (a0, a0, t0));
 
-  riscv_expand_op (LSHIFTRT, word_mode, a0, a0,
-  gen_int_mode (crc_size, word_mode));
-  if (TARGET_64BIT)
-emit_insn (gen_riscv_clmul_di (a0, a0, t1));
+  riscv_expand_op (LSHIFTRT, word_mode, a0, a0,
+  gen_int_mode (crc_size, word_mode));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_clmul_di (a0, a0, t1));
+  else
+   emit_insn (gen_riscv_clmul_si (a0, a0, t1));
+}
   else
-emit_insn (gen_riscv_clmul_si (a0, a0, t1));
+{
+  machine_mode vmode;
+  if (!riscv_vector::get_vector_mode (DImode, 1).exists (&vmode))
+   gcc_unreachable ();
+
+  rtx vec = gen_reg_rtx (vmode);
+
+  insn_code icode1 = code_for_pred_broadcast (vmode);
+  rtx ops1[] = {vec, a0};
+  emit_nonvlmax_insn (icode1, UNARY_OP, ops1, CONST1_RTX (Pmode));
+
+  rtx rvv1di_reg = gen_rtx_SUBREG (RVVM1DImode, vec, 0);
+  insn_code icode2 = code_for_pred_vclmul_scalar (UNSPEC_VCLMUL,
+ E_RVVM1DImode);
+  rtx ops2[] = {rvv1di_reg, rvv1di_reg, t0};
+  emit_nonvlmax_insn (icode2, riscv_vector::BINARY_OP, ops2, CONST1_RTX
+ (Pmode));
+
+  rtx shift_amount = gen_int_mode (data_size, Pmode);
+  insn_code icode3 = code_for_pred_scalar (LSHIFTRT, vmode);
+  rtx ops3[] = {vec, vec, shift_amount};
+  emit_nonvlmax_insn (icode3, BINARY_OP, ops3, CONST1_RTX (Pmode));
+
+  insn_code icode4 = code_for_pred_vclmul_scalar (UNSPEC_VCLMULH,
+ E_RVVM1DImode);
+  rtx ops4[] = {rvv1di_reg, rvv1di_reg, t1};
+  emit_nonvlmax_insn (icode4, riscv_vector::BINARY_OP, ops4, CONST1_RTX
+ (Pmode));
+
+  rtx vec_low_lane = gen_lowpart (DImode, vec);
+  riscv_emit_move (a0, vec_low_lane);
+}
 
   if (crc_size > data_size)
 {
@@ -14046,19 +14082,55 @@ expand_reversed_crc_using_clmul (scalar_mode 
crc_mode, scalar_mode data_mode,
   rtx a0 = gen_reg_rtx (word_mode);
   riscv_expand_op (XOR, word_mode, a0, crc, data);
 
-  if (TARGET_64BIT)
-emit_insn (gen_riscv_clmul_di (a0, a0, t0));
-  else
-emit_insn (gen_riscv_clmul_si (a0, a0, t0));
+  if (TARGET_ZBKC || TARGET_ZBC)
+{
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_clmul_di (a0, a0, t0));
+  else
+   emit_insn (gen_riscv_clmul_si (a0, a0, t0));
 
-  rtx num

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Pengfei Li
Hi Richard Biener,

As Richard Sandiford has already addressed your questions in another email, I
just wanted to add a few below.

> That said, we already have unmasked ABS in the IL:
> 
>   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
>   vect__2.7_16 = ABSU_EXPR ;
>   vect__3.8_17 = VIEW_CONVERT_EXPR(vect__2.7_16);
>   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, ... }, vect__3.8_17); [tail call]
> 
> so what's missing here?  I suppose having a constant masked ABSU here
> would allow RTL expansion to select a fixed-size mode?

Before implementing this patch, I have tried the approach you suggested. I
eventually decided not to move on with it for two reasons:

1) Having constant masked operations does indicate the inactive lanes, but it
doesn't model if we need to care about the inactive lanes. For some operations
(mainly floating-point) that may trap, we can't simply use the upper iteration
bound for the fixed-size mode. This is why I added a `could_trap` parameter to
the target hook I implemented. The `could_trap` information is available in
GIMPLE, but so far I haven't figured out how/if we can get it from RTL.

2) Transforming unmasked operations to masked operations for this seems adding
unnecessary complexity in GIMPLE. I'm not sure if it has any side effect or
may lead to unexpected performance regressions in some cases.

> And the vectorizer could simply use the existing
> related_vector_mode hook instead?

Thanks for pointing it out. I'm not familiar with that hook but I'll take a
look to see if there's anything I can reuse or build upon.

Thanks,
Pengfei

[PATCH v19 0/3] c: Add _Countof and

2025-05-09 Thread Alejandro Colomar
Hi!

Here's a revision of this patch, rebased after all the changes in master
in these 6 months.  This time, the name is _Countof, as the C Committee
has finally settled on that name.  It also includes the lowercase macro
and header in a separate patch, as specified by ISO C.

Here's the change list compared to v18:

-  Rename __countof__ => _Countof
-  Make the tests more robust.
-  Add countof and  (patch 3/3)
-  Rebase after 6 months of changes in master.
-  Add links in commit message to the new changes to this operator in
   ISO C.
-  gcc/c-family/c-common.cc: Use D_CONLY.

The range-diff is at the bottom.  The hashes don't match the range-diff
I presented in v18 because I lost the old commits, so I had to reapply
them from the v18 emails.

I have done a `make bootstrap`, and tested manually the tests, but I
haven't run `make check`.  I'll try to do that in the following days,
since I'll be traveling.

I haven't yet implemented the pedantic diagnostic for old C versions.
This is in my TODO list.  I think that's the only thing I'm missing.


Have a lovely night!
Alex

Alejandro Colomar (3):
  contrib/: Add support for Cc: and Link: tags
  c: Add _Countof operator
  c: Add 

 contrib/gcc-changelog/git_commit.py|   5 +-
 gcc/Makefile.in|   1 +
 gcc/c-family/c-common.cc   |  26 +
 gcc/c-family/c-common.def  |   3 +
 gcc/c-family/c-common.h|   2 +
 gcc/c/c-decl.cc|  22 +++-
 gcc/c/c-parser.cc  |  59 +++---
 gcc/c/c-tree.h |   4 +
 gcc/c/c-typeck.cc  | 115 +-
 gcc/doc/extend.texi|  30 +
 gcc/ginclude/stdcountof.h  |  31 +
 gcc/testsuite/gcc.dg/countof-compile.c | 130 +
 gcc/testsuite/gcc.dg/countof-vla.c |  51 
 gcc/testsuite/gcc.dg/countof.c | 154 +
 14 files changed, 608 insertions(+), 25 deletions(-)
 create mode 100644 gcc/ginclude/stdcountof.h
 create mode 100644 gcc/testsuite/gcc.dg/countof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/countof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/countof.c

Range-diff against v18:
1:  f7787bae38cb = 1:  796c82b0cba1 contrib/: Add support for Cc: and Link: tags
2:  f1a3df94b52c ! 2:  ae4691c8b451 c: Add __countof__ operator
@@ Metadata
 Author: Alejandro Colomar 
 
  ## Commit message ##
-c: Add __countof__ operator
+c: Add _Countof operator
 
 This operator is similar to sizeof but can only be applied to an array,
 and returns its number of elements.
@@ Commit message
 
 gcc/ChangeLog:
 
-* doc/extend.texi: Document __countof__ operator.
+* doc/extend.texi: Document _Countof operator.
 
 gcc/c-family/ChangeLog:
 
 * c-common.h
 * c-common.def
-* c-common.cc (c_countof_type): Add __countof__ operator.
+* c-common.cc (c_countof_type): Add _Countof operator.
 
 gcc/c/ChangeLog:
 
@@ Commit message
 (pop_maybe_used)
 (is_top_array_vla)
 (c_expr_countof_expr, c_expr_countof_type):
-Add __countof__ operator.
+Add _Countof operator.
 
 gcc/testsuite/ChangeLog:
 
 * gcc.dg/countof-compile.c
 * gcc.dg/countof-vla.c
-* gcc.dg/countof.c: Add tests for __countof__ operator.
+* gcc.dg/countof.c: Add tests for _Countof operator.
 
+Link: 
 Link: 
 Link: 
 Link: 

 Link: 
 Link: 
 Link: 
+Link: 
 Link: 
+Link: 
+Link: 
 Link: 
 Suggested-by: Xavier Del Campo Romero 
 Co-authored-by: Martin Uecker 
@@ Commit message
 
  ## gcc/c-family/c-common.cc ##
 @@ gcc/c-family/c-common.cc: const struct c_common_resword 
c_common_reswords[] =
-   { "__inline",   RID_INLINE, 0 },
-   { "__inline__", RID_INLINE, 0 },
-   { "__label__",  RID_LABEL,  0 },
-+  { "__countof__",   

[PATCH v19 2/3] c: Add _Countof operator

2025-05-09 Thread Alejandro Colomar
This operator is similar to sizeof but can only be applied to an array,
and returns its number of elements.

FUTURE DIRECTIONS:

-  We should make it work with array parameters to functions,
   and somehow magically return the number of elements of the array,
   regardless of it being really a pointer.

gcc/ChangeLog:

* doc/extend.texi: Document _Countof operator.

gcc/c-family/ChangeLog:

* c-common.h
* c-common.def
* c-common.cc (c_countof_type): Add _Countof operator.

gcc/c/ChangeLog:

* c-tree.h
(c_expr_countof_expr, c_expr_countof_type)
* c-decl.cc
(start_struct, finish_struct)
(start_enum, finish_enum)
* c-parser.cc
(c_parser_sizeof_expression)
(c_parser_countof_expression)
(c_parser_sizeof_or_countof_expression)
(c_parser_unary_expression)
* c-typeck.cc
(build_external_ref)
(record_maybe_used_decl)
(pop_maybe_used)
(is_top_array_vla)
(c_expr_countof_expr, c_expr_countof_type):
Add _Countof operator.

gcc/testsuite/ChangeLog:

* gcc.dg/countof-compile.c
* gcc.dg/countof-vla.c
* gcc.dg/countof.c: Add tests for _Countof operator.

Link: 
Link: 
Link: 
Link: 

Link: 
Link: 
Link: 
Link: 
Link: 
Link: 
Link: 
Link: 
Suggested-by: Xavier Del Campo Romero 
Co-authored-by: Martin Uecker 
Acked-by: "James K. Lowden" 
Cc: Joseph Myers 
Cc: Gabriel Ravier 
Cc: Jakub Jelinek 
Cc: Kees Cook 
Cc: Qing Zhao 
Cc: Jens Gustedt 
Cc: David Brown 
Cc: Florian Weimer 
Cc: Andreas Schwab 
Cc: Timm Baeder 
Cc: Daniel Plakosh 
Cc: "A. Jiang" 
Cc: Eugene Zelenko 
Cc: Aaron Ballman 
Cc: Paul Koning 
Cc: Daniel Lundin 
Cc: Nikolaos Strimpas 
Cc: JeanHeyd Meneide 
Cc: Fernando Borretti 
Cc: Jonathan Protzenko 
Cc: Chris Bazley 
Cc: Ville Voutilainen 
Cc: Alex Celeste 
Cc: Jakub Łukasiewicz 
Cc: Douglas McIlroy 
Cc: Jason Merrill 
Cc: "Gustavo A. R. Silva" 
Cc: Patrizia Kaye 
Cc: Ori Bernstein 
Cc: Robert Seacord 
Cc: Marek Polacek 
Cc: Sam James 
Cc: Richard Biener 
Signed-off-by: Alejandro Colomar 
---
 gcc/c-family/c-common.cc   |  26 +
 gcc/c-family/c-common.def  |   3 +
 gcc/c-family/c-common.h|   2 +
 gcc/c/c-decl.cc|  22 +++-
 gcc/c/c-parser.cc  |  59 +++---
 gcc/c/c-tree.h |   4 +
 gcc/c/c-typeck.cc  | 115 +-
 gcc/doc/extend.texi|  30 +
 gcc/testsuite/gcc.dg/countof-compile.c | 130 +
 gcc/testsuite/gcc.dg/countof-vla.c |  51 
 gcc/testsuite/gcc.dg/countof.c | 154 +
 11 files changed, 572 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/countof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/countof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/countof.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 587d76461e9e..f71cb2652d5a 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -394,6 +394,7 @@ const struct c_common_resword c_common_reswords[] =
 {
   { "_Alignas",RID_ALIGNAS,   D_CONLY },
   { "_Alignof",RID_ALIGNOF,   D_CONLY },
+  { "_Countof",RID_COUNTOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
   { "_BitInt", RID_BITINT,D_CONLY },
   { "_Bool",   RID_BOOL,  D_CONLY },
@@ -4080,6 +4081,31 @@ c_alignof_expr (location_t loc, tree expr)
 
   return fold_convert_loc (loc, size_type_node, t);
 }
+
+/* Implement the _Countof keyword:
+   Return the number of elements of an array.  */
+
+tree
+c_countof_type (location_t loc, tree type)
+{
+  enum tree_code type_code;
+
+  type_code = TREE_CODE (type);
+  if (type_code != ARRAY_TYPE)
+{
+  error_at (loc, "invalid application of %<_Countof%> to type %qT", type);
+  return error_mark_node;
+}
+  if (!COMPLETE_TYPE_P (type))
+{
+  error_at (loc,
+   "invalid application of %<_Countof%> to incomplete type %qT",
+   type);
+  return error_mark_node;
+}
+
+  return array_type_nelts_top (type);
+}
 
 /* Handle C and C++ default att

[PATCH v19 1/3] contrib/: Add support for Cc: and Link: tags

2025-05-09 Thread Alejandro Colomar
contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit):
Add support for 'Cc: ' and 'Link: ' tags.

Cc: Jason Merrill 
Signed-off-by: Alejandro Colomar 
---
 contrib/gcc-changelog/git_commit.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 5645f80ebb9b..068ec5cbcf28 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -187,7 +187,8 @@ CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
 
 REVIEW_PREFIXES = ('reviewed-by: ', 'reviewed-on: ', 'signed-off-by: ',
'acked-by: ', 'tested-by: ', 'reported-by: ',
-   'suggested-by: ')
+   'suggested-by: ', 'cc: ')
+LINK_PREFIXES = ('link: ')
 DATE_FORMAT = '%Y-%m-%d'
 
 
@@ -529,6 +530,8 @@ class GitCommit:
 continue
 elif lowered_line.startswith(REVIEW_PREFIXES):
 continue
+elif lowered_line.startswith(LINK_PREFIXES):
+continue
 else:
 m = cherry_pick_regex.search(line)
 if m:
-- 
2.49.0



[PATCH v19 3/3] c: Add

2025-05-09 Thread Alejandro Colomar
gcc/ChangeLog:

* Makefile.in
* ginclude/stdcountof.h: Add countof macro.

Signed-off-by: Alejandro Colomar 
---
 gcc/Makefile.in   |  1 +
 gcc/ginclude/stdcountof.h | 31 +++
 2 files changed, 32 insertions(+)
 create mode 100644 gcc/ginclude/stdcountof.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e3af923e0e04..8d5d357632e1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -481,6 +481,7 @@ USER_H = $(srcdir)/ginclude/float.h \
 $(srcdir)/ginclude/stdalign.h \
 $(srcdir)/ginclude/stdatomic.h \
 $(srcdir)/ginclude/stdckdint.h \
+$(srcdir)/ginclude/stdcountof.h \
 $(EXTRA_HEADERS)
 
 USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
diff --git a/gcc/ginclude/stdcountof.h b/gcc/ginclude/stdcountof.h
new file mode 100644
index ..b56097376275
--- /dev/null
+++ b/gcc/ginclude/stdcountof.h
@@ -0,0 +1,31 @@
+/* Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* ISO C2Y: 7.21 Array count .  */
+
+#ifndef _STDCOUNTOF_H
+#define _STDCOUNTOF_H
+
+#define countof _Countof
+
+#endif /* stdcountof.h */
-- 
2.49.0



[WWWDOCS] readings: add links to CTF and BTF format specifications

2025-05-09 Thread Indu Bhagat
Fix PR web/114649
---
 htdocs/readings.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 56398317..3b0556e6 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -598,6 +598,10 @@ names.
 
   https://dwarfstd.org";>DWARF Workgroup
 
+  https://sourceware.org/binutils/docs/ctf-spec.html";>Compact C 
Type Format (CTF)
+
+  https://www.kernel.org/doc/Documentation/bpf/btf.rst";>BPF Type 
Format (BTF)
+
 
 
   http://compilerconnection.com";>Links related to many
-- 
2.43.0



[PATCH 2/6] RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE

2025-05-09 Thread Vineet Gupta
This is effectively reverting e5d1f538bb7d
"(RISC-V: Allow different dynamic floating point mode to be merged)"
while retaining the testcase.

The change itself is valid, however it obfuscates the deficiencies in
current frm mode switching code.

Also for a SPEC2017 -Ofast -march=rv64gcv build, it ends up generating
net more FRM restores (writes) vs. the rest of this changeset.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_dynamic_frm_mode_p): Remove.
(riscv_mode_confluence): Ditto.
(TARGET_MODE_CONFLUENCE): Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 37 -
 1 file changed, 37 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3ee88db24fa5..62ec95d3b885 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12331,41 +12331,6 @@ riscv_mode_needed (int entity, rtx_insn *insn, 
HARD_REG_SET)
 }
 }
 
-/* Return TRUE if the rouding mode is dynamic.  */
-
-static bool
-riscv_dynamic_frm_mode_p (int mode)
-{
-  return mode == riscv_vector::FRM_DYN
-|| mode == riscv_vector::FRM_DYN_CALL
-|| mode == riscv_vector::FRM_DYN_EXIT;
-}
-
-/* Implement TARGET_MODE_CONFLUENCE.  */
-
-static int
-riscv_mode_confluence (int entity, int mode1, int mode2)
-{
-  switch (entity)
-{
-case RISCV_VXRM:
-  return VXRM_MODE_NONE;
-case RISCV_FRM:
-  {
-   /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible.
-  Although we already try to set the mode needed to FRM_DYN after a
-  function call, there are still some corner cases where both FRM_DYN
-  and FRM_DYN_CALL may appear on incoming edges.  */
-   if (riscv_dynamic_frm_mode_p (mode1)
-   && riscv_dynamic_frm_mode_p (mode2))
- return riscv_vector::FRM_DYN;
-   return riscv_vector::FRM_NONE;
-  }
-default:
-  gcc_unreachable ();
-}
-}
-
 /* Return TRUE that an insn is asm.  */
 
 static bool
@@ -14464,8 +14429,6 @@ bool need_shadow_stack_push_pop_p ()
 #define TARGET_MODE_EMIT riscv_emit_mode_set
 #undef TARGET_MODE_NEEDED
 #define TARGET_MODE_NEEDED riscv_mode_needed
-#undef TARGET_MODE_CONFLUENCE
-#define TARGET_MODE_CONFLUENCE riscv_mode_confluence
 #undef TARGET_MODE_AFTER
 #define TARGET_MODE_AFTER riscv_mode_after
 #undef TARGET_MODE_ENTRY
-- 
2.43.0



[PATCH 4/6] RISC-V: frm/mode-switch: TARGET_MODE_AFTER not needed for frm switching

2025-05-09 Thread Vineet Gupta
Stumbled upon this when trying to wholesale rewrite frm switching code
and seeing what pieces needed to be retained from current implementation.

My interpretation of how this hook worked, for the following case:

fsrmi 3
  fsrm a4
call
  frrm a4
fsrmi 1

TARGET_MODE_NEEDED(call_insn) returns DYN_EXIT (to generate fsrm) and
TARGET_MODE_AFTER(call_insn) returns DYN (to generate frrm). However
for a given insn, if the 2 hooks return different values, the final
state machine doesn't switch as expected above (and instead both NEEDED
and AFTER need to return the same mode, for most cases).

Anyhow it turns out that no-oping this (return the last_mode back) doesn't
change any testcase outcomes. There's no change to total number of FRM
read/writes emitted (static count) for SPEC2017 -Ofast -march=rv64gcv build
But we win again on reduced complexity and maintenance.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_frm_mode_needed): Move static
state update here.
(frm_unknown_dynamic_p): Delete.
(riscv_frm_mode_after): Delete.
(riscv_mode_after): Remove call to riscv_frm_mode_after ().

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 43 +++
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a500b046cd9a..f1b4b20499fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12193,6 +12193,8 @@ riscv_frm_mode_needed (rtx_insn *cur_insn, int code)
the emit mode set.
*/
 mode = riscv_frm_adjust_mode_after_call (cur_insn, mode);
+  else if (riscv_static_frm_mode_p (mode))
+STATIC_FRM_P (cfun) = true;
 
   return mode;
 }
@@ -12317,18 +12319,6 @@ vxrm_unknown_p (rtx_insn *insn)
   return false;
 }
 
-/* Return TRUE that an insn is unknown dynamic for FRM.  */
-
-static bool
-frm_unknown_dynamic_p (rtx_insn *insn)
-{
-  /* Return true if there is a definition of FRM.  */
-  if (reg_set_p (gen_rtx_REG (SImode, FRM_REGNUM), insn))
-return true;
-
-  return false;
-}
-
 /* Return the mode that an insn results in for VXRM.  */
 
 static int
@@ -12346,29 +12336,8 @@ riscv_vxrm_mode_after (rtx_insn *insn, int mode)
 return mode;
 }
 
-/* Return the mode that an insn results in for FRM.  */
-
-static int
-riscv_frm_mode_after (rtx_insn *insn, int mode)
-{
-  STATIC_FRM_P (cfun) = STATIC_FRM_P (cfun) || riscv_static_frm_mode_p (mode);
-
-  if (CALL_P (insn))
-return mode;
-
-  if (frm_unknown_dynamic_p (insn))
-return riscv_vector::FRM_DYN;
-
-  if (recog_memoized (insn) < 0)
-return mode;
-
-  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
-return get_attr_frm_mode (insn);
-  else
-return mode;
-}
-
-/* Return the mode that an insn results in.  */
+/* Implement TARGET_MODE_AFTER.
+   Return the mode that an insn results in.  */
 
 static int
 riscv_mode_after (int entity, int mode, rtx_insn *insn, HARD_REG_SET)
@@ -12377,8 +12346,10 @@ riscv_mode_after (int entity, int mode, rtx_insn 
*insn, HARD_REG_SET)
 {
 case RISCV_VXRM:
   return riscv_vxrm_mode_after (insn, mode);
+
+/* FRM state machine doesn't need after insn handling.  */
 case RISCV_FRM:
-  return riscv_frm_mode_after (insn, mode);
+  return mode;
 default:
   gcc_unreachable ();
 }
-- 
2.43.0



[PATCH 5/6] RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition

2025-05-09 Thread Vineet Gupta
FRM mode switching state machine has DYN as default state which it also
fallsback to after transitioning to other states such as DYN_CALL.
Currently TARGET_MODE_EMIT generates a FRM restore on any transition to
DYN leading to spurious/extraneous FRM restores.

Only do this if an interim static Rounding Mode was observed in the state
machine.

This reduces the number of FRM writes in SPEC2017 -Ofast -mrv64gcv build
significantly.

   BeforeAfter
  -  -
  frrm fsrmi fsrm   frrm fsrmi frrm
  perlbench_r   4204  1701
 cpugcc_r  1670   17  1100
 bwaves_r   1601  1601
mcf_r   1100  1100
 cactusBSSN_r   760   27  1901
   namd_r  1190   63  1401
 parest_r  1680  114  2401
 povray_r  1231   17  2616
lbm_r600   600
omnetpp_r   1701  1701
wrf_r 2287   13 19561268   13 1603
   cpuxalan_r   1701  1701
 ldecod_r   1100  1100
   x264_r   1401  1100
blender_r  724   12  182  61   12   42
   cam4_r  324   13  169  45   13   20
  deepsjeng_r   1100  1100
imagick_r  265   16   34 132   16   25
  leela_r   1200  1200
nab_r   1301  1301
  exchange2_r   1601  1601
  fotonik3d_r   200   11  1901
   roms_r   330   23  2101
 xz_r600   600
 -----
  4498   55 26231804   55 1707

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_frm_mode_set): check
STATIC_FRM_P for trnsition to DYN.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f1b4b20499fc..37f3ace49a8b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12121,7 +12121,7 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
  && prev_mode != riscv_vector::FRM_DYN
  && prev_mode != riscv_vector::FRM_DYN_CALL)
   /* Restore frm value when switch to DYN mode.  */
-  || (mode == riscv_vector::FRM_DYN
+  || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN
  && prev_mode != riscv_vector::FRM_DYN_CALL);
 
   if (restore_p)
-- 
2.43.0



Re: [PATCH] match: Remove (ne (cmp) 0) and (eq (cmp) 1) patterns

2025-05-09 Thread Andrew Pinski
On Fri, May 9, 2025 at 1:21 AM Richard Biener
 wrote:
>
> On Fri, May 9, 2025 at 4:51 AM Andrew Pinski  wrote:
> >
> > These patterns are not needed any more. There were already
> > 2 patterns which did `(ne bool_var 0)` into `bool_var` and
> > `(eq bool_var 1)` into `bool_var`. Just they were after the
> > pattern that did `(cmp (cond @0 @1 @2) @3)` simplification but
> > that pattern is now after the ones.
> > Also these patterns will cause in some cases a new statement to
> > be created for the comparison. In the case of floating point comparison
> > wiht non-call exceptions (and trapping math), can cause a new statement
> > every time fold_stmt is called.
>
> Hmm, but do we still fold
>
>   _1 = _2 < 1;
>   if (_1 != 0)
>
> to
>
>   if (_2 < 1)
>
> or does that now again rely on forwprops explicit forwarding into
> gcond?  I wanted
> to get rid of the latter eventually.

Oh. Yes this does rely on forwprop explicitly now.

>
> I agree that the trapping math thing is bad - I wonder if we can catch that 
> more
> intelligently (not sure how without following SSA use-def of gconds on bools
> and see whether they can trap and then not simplifying)

I think I know the way to fix the trapping issue without fully
removing this. I am going to give it a go later today.
Since trapping only depends on the code and the type it should be easy
to add an extra condition here and the latter patterns catch the
trapping case of removing `bool!=0` already.

Thanks,
Andrew Pinski

>
> > gcc.dg/tree-ssa/vrp24.c needed to be adjusted to before 
> > r13-322-g7f04b0d786e13f.
> > gcc.dg/analyzer/null-deref-pr102671-2.c needs an increased 
> > analyzer-max-svalue-depth
> > not to get an extra warning.
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Remove.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/vrp24.c: Adjust.
> > * gcc.dg/analyzer/null-deref-pr102671-2.c: Increase 
> > analyzer-max-svalue-depth.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/match.pd  | 8 
> >  gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c | 2 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +-
> >  3 files changed, 2 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index ab496d923cc..418efc4230a 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6898,14 +6898,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (if (ic == ncmp)
> >   (ncmp @0 @1)
> >   /* The following bits are handled by fold_binary_op_with_conditional_arg. 
> >  */
> > - (simplify
> > -  (ne (cmp@2 @0 @1) integer_zerop)
> > -  (if (types_match (type, TREE_TYPE (@2)))
> > -   (cmp @0 @1)))
> > - (simplify
> > -  (eq (cmp@2 @0 @1) integer_truep)
> > -  (if (types_match (type, TREE_TYPE (@2)))
> > -   (cmp @0 @1)))
> >   (simplify
> >(ne (cmp@2 @0 @1) integer_truep)
> >(if (types_match (type, TREE_TYPE (@2)))
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c 
> > b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > index 298e4839b98..bc141d5c028 100644
> > --- a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > +++ b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-require-effective-target ptr_eq_long } */
> > -/* { dg-additional-options "-O2 -Wno-shift-count-overflow" } */
> > +/* { dg-additional-options "-O2 -Wno-shift-count-overflow 
> > --param=analyzer-max-svalue-depth=19" } */
> >
> >  struct lisp;
> >  union vectorlike_header { long size; };
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> > index c28ca473fc6..f237b7741ec 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> > @@ -89,5 +89,5 @@ L7:
> > boolean operation.  */
> >
> >  /* { dg-final { scan-tree-dump-times "Simplified relational" 2 "evrp" } } 
> > */
> > -/* { dg-final { scan-tree-dump-times "if " 3 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "if " 4 "optimized" } } */
> >
> > --
> > 2.43.0
> >


Re: [PATCH RFC] libstdc++: run testsuite with -Wabi

2025-05-09 Thread Jonathan Wakely
On Thu, 8 May 2025 at 20:56, Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu.  Does this make sense for trunk?

Yes, it looks useful. I'm going to test it with my "very -std and -m32
and old-string ABI" test settings to be sure it doesn't cause any
problems.


> -- 8< --
>
> I added this locally to check whether the PR120012 fix affects libstdc++ (it
> doesn't) but it seems generally useful to catch whether compiler ABI
> changes have library impact.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/lib/libstdc++.exp: Add -Wabi.
> ---
>  libstdc++-v3/testsuite/lib/libstdc++.exp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
> b/libstdc++-v3/testsuite/lib/libstdc++.exp
> index 5e958d159de..74e7e5e98eb 100644
> --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
> +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
> @@ -586,6 +586,7 @@ proc v3_target_compile { source dest type options } {
>  global tool
>
>  lappend options "additional_flags=-fdiagnostics-plain-output"
> +lappend options "additional_flags=-Wabi=20";
>
>  if { [target_info needs_status_wrapper] != "" && [info exists gluefile] 
> } {
> lappend options "libs=${gluefile}"
>
> base-commit: abab79397ef97acf7c689c43e27d58d8d7d5c599
> --
> 2.49.0
>



[PATCH] gimple: Don't assert that switch has nondefault cases during lowering [PR120080]

2025-05-09 Thread Filip Kastl
Hi,

bootstrapped and regtested on x86_64 linux.  Ok to push?

Filip Kastl


-- 8< ---


I have mistakenly assumed that switch lowering cannot encounter a switch
with zero clusters.  This patch removes the relevant assert and instead
gives up bit-test lowering when this happens.

PR tree-optimization/120080

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Replace assert with return.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index dea217a01ef..bd4de966892 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
  end up with as few clusters as possible.  */
 
   unsigned l = clusters.length ();
-  auto_vec min;
-  min.reserve (l + 1);
 
-  gcc_checking_assert (l > 0);
+  if (l == 0)
+return clusters.copy ();
   gcc_checking_assert (l <= INT_MAX);
 
+  auto_vec min;
+  min.reserve (l + 1);
+
   int bits_in_word = GET_MODE_BITSIZE (word_mode);
 
   /* First phase: Compute the minimum number of clusters for each prefix of the
-- 
2.49.0



Re: [PATCH] gimple: Don't assert that switch has nondefault cases during lowering [PR120080]

2025-05-09 Thread Sam James
Filip Kastl  writes:

> Hi,
>
> bootstrapped and regtested on x86_64 linux.  Ok to push?
>
> Filip Kastl
>

No testcase? I think pinskia's reduced testcase from the bug should be
fine. I can handle adding that later if needed though.


PR fortran/120191

2025-05-09 Thread Daniil Kochergin
PR fortran/120191

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc):

Call strip_kind_from_actual unconditionally.


* gfortran.dg/pr120191.f90: New test.


patch_minmaxloc_120191.patch
Description: Binary data


gcc16-pr120191-test.patch
Description: Binary data


[PATCH 1/2] tree-optimization/114166 - vectorize to lowered form with word_mode

2025-05-09 Thread Richard Biener
The following adjusts the non-PLUS/MINUS/NEGATE_EXPR vectorizations
of "word_mode" vectors to emit the form vector lowering will later use.
This allows us to move the vector lowering pass before vectorization,
specifically closing the gap between vectorization and lowering,
so we can eventually assert the vectorizer doesn't emit any code
that's not directly supported by the target.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114166
* tree-vect-stmts.cc (vectorizable_operation): Lower also
bitwise operations on word-mode vectors.
---
 gcc/tree-vect-stmts.cc | 139 -
 1 file changed, 80 insertions(+), 59 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ae9644ad278..efe6a2c9c42 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -7025,9 +7025,10 @@ vectorizable_operation (vec_info *vinfo,
 ops we have to lower the lowering code assumes we are
 dealing with word_mode.  */
   if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))
+ || !GET_MODE_SIZE (vec_mode).is_constant ()
  || (((code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
-   || !target_support_p)
-  && maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
+  || !target_support_p)
+ && maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
  /* Check only during analysis.  */
  || (!vec_stmt && !vect_can_vectorize_without_simd_p (code)))
{
@@ -7167,88 +7168,108 @@ vectorizable_operation (vec_info *vinfo,
   vop1 = ((op_type == binary_op || op_type == ternary_op)
  ? vec_oprnds1[i] : NULL_TREE);
   vop2 = ((op_type == ternary_op) ? vec_oprnds2[i] : NULL_TREE);
-  if (using_emulated_vectors_p
- && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR))
+  if (using_emulated_vectors_p)
{
  /* Lower the operation.  This follows vector lowering.  */
- unsigned int width = vector_element_bits (vectype);
- tree inner_type = TREE_TYPE (vectype);
- tree word_type
-   = build_nonstandard_integer_type (GET_MODE_BITSIZE (word_mode), 1);
- HOST_WIDE_INT max = GET_MODE_MASK (TYPE_MODE (inner_type));
- tree low_bits = build_replicated_int_cst (word_type, width, max >> 1);
- tree high_bits
-   = build_replicated_int_cst (word_type, width, max & ~(max >> 1));
+ tree word_type = build_nonstandard_integer_type
+(GET_MODE_BITSIZE (vec_mode).to_constant (), 1);
  tree wvop0 = make_ssa_name (word_type);
  new_stmt = gimple_build_assign (wvop0, VIEW_CONVERT_EXPR,
  build1 (VIEW_CONVERT_EXPR,
  word_type, vop0));
  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- tree result_low, signs;
- if (code == PLUS_EXPR || code == MINUS_EXPR)
+ tree wvop1 = NULL_TREE;
+ if (vop1)
{
- tree wvop1 = make_ssa_name (word_type);
+ wvop1 = make_ssa_name (word_type);
  new_stmt = gimple_build_assign (wvop1, VIEW_CONVERT_EXPR,
  build1 (VIEW_CONVERT_EXPR,
  word_type, vop1));
  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- signs = make_ssa_name (word_type);
- new_stmt = gimple_build_assign (signs,
- BIT_XOR_EXPR, wvop0, wvop1);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- tree b_low = make_ssa_name (word_type);
- new_stmt = gimple_build_assign (b_low,
- BIT_AND_EXPR, wvop1, low_bits);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- tree a_low = make_ssa_name (word_type);
- if (code == PLUS_EXPR)
-   new_stmt = gimple_build_assign (a_low,
-   BIT_AND_EXPR, wvop0, low_bits);
- else
-   new_stmt = gimple_build_assign (a_low,
-   BIT_IOR_EXPR, wvop0, high_bits);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- if (code == MINUS_EXPR)
+   }
+
+ tree result_low;
+ if (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
+   {
+ unsigned int width = vector_element_bits (vectype);
+ tree inner_type = TREE_TYPE (vectype);
+ HOST_WIDE_INT max = GET_MODE_MASK (TYPE_MODE (inner_type));
+ tree low_bits
+   = build_replicated_int_cst (word_type, width, max >> 1);
+ tree high_bit

Re: [GCC16 stage1][PATCH v3 0/3] extend "counted_by" attribute to pointer fields of structures

2025-05-09 Thread Qing Zhao
Hi, 

Some update on the v3 of the patch after more study and discussion.

1.  There were more discussion among Kees, Bill, I and also Apple engineers on 
whether we should support 
  “counted_by” on pointers with type “void *”.  And finally we agreed to:

  A. Do not support the “counted_by” on pointers with type “void *”, issue 
errors when “counted_by” is attached
   to a pointer with “void *”;
  B. For such pointers with type “void *”, a future new attribute 
“sized_by” will be provided. 

  The major reason for this decision is: (from Yeoul Na, apple engineer):

  "By definition, with `__counted_by(N)`, the pointer points to memory that 
contains N elements of pointee type 
   (see 
https://clang.llvm.org/docs/BoundsSafety.html#external-bounds-annotations). And 
there’s no such thing 
as elements of void; that’s why, we can’t do `void arr[10]`. So `void 
*__counted_by(N)` feels just wrong”

   And I agree with her on this argument. 
I think it’s better to provide a “sized_by” attribute for “void *” 
pointers. 

Let me know if you have a different opinion on this decision.

2. I also studied further on the implementation of the array bounds sanitizer 
for pointers with counted_by attribute.

I tried the other approach I thought it might be better than my current 
implementation:

For the following:

struct annotated {
 int b;
 int *c __attribute__ ((counted_by (b)));
} *p_array_annotated;

p_array_annotated->c[annotated_index] = 2;

generate ARRAY_REF instead of INDIRECT_REF for the above 
p_array_annotated->c[annotated_index]
in C FE.  then the INDEX info was kept nicely in the IR when getting to the 
bound sanitizer  instrumentation
phase, and all the hacks that try to get the index from the OFFSET computation 
expression are avoided.

However, there are two major issues with this approach:
  A. Now the ARRAY_REF might include an  operand with POINTER TYPE (it was 
assumed to 
   be ARRAY_TYPE by default), all the corresponding routines that have 
such assumption need to be
   adjusted (for example, debug_generic_expr does not work anymore, I 
believe that more other routines
Need to be adjusted).

  B. When converting the ARRAY_REF to INDIRECT_REF after the bound 
instrumentation for the pointer array,
   the current routines cannot handle the “COMPOUND_EXPR” that wrapping 
the  index and instrumented code
   Correctly. All the corresponding routines need to be adjusted to 
specially handle the following COMPOUND_EXPR:
   
(.UBSAN_BOUNDS (0B, SAVE_EXPR , (sizetype) 
MAX_EXPR  [(void *)&*p_array_annotated], 0>), SAVE_EXPR 
)

So, I think that this approach is even worse than my current one. 

I’d like to keep the current implementation for bound sanitizer. 

Let me know if you have a different opinion on this.


I will modify my v3 patch to include the above 2, and send out the v4 after 
that.

thanks.

Qing
  



> On Apr 30, 2025, at 08:49, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 3rd version of the patch set to extend "counted_by" attribute
> to pointer fields of structures.
> 
> compared to the 2nd version:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681727.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681728.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681729.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681730.html
> 
> The major change is:
> 
> "The counted_by attribute is allowed for a void pointer field, the element
> size of such pointer array is assumed as size 1."
> 
> both __builtin_dynamic_object_size and bounds sanitizer handle this.
> 
> This patch set includes 3 parts:
> 
> 1.Extend "counted_by" attribute to pointer fields of structures. 
> 2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
>and use it in builtinin-object-size.
> 3.Use the counted_by attribute of pointers in array bound checker.
> 
> In which, the patch 1 and 2 are simple and straightforward, however, the 
> patch 3  
> is a little complicate due to the following reason:
> 
>Current array bound checker only instruments ARRAY_REF, and the INDEX
>information is the 2nd operand of the ARRAY_REF.
> 
>When extending the array bound checker to pointer references with
>counted_by attributes, the hardest part is to get the INDEX of the
>corresponding array ref from the offset computation expression of
>the pointer ref. 
> 
> So, the patch #3 is a RFC: I do need some comments and suggestions on it.
> And I do wonder for the access to pointer arrays:
> 
> struct annotated {
>  int b;
>  int *c __attribute__ ((counted_by (b)));
> } *p_array_annotated;
> 
> p_array_annotated->c[annotated_index] = 2;
> 
> Is it possible to generate ARRAY_REF instead of INDIRECT_REF for the above 
> p_array_annotated->c[annotated_index]
> in C FE? then we can keep the INDEX info

[PATCH 2/3] Remove non-SLP path from vectorizable_operation

2025-05-09 Thread Richard Biener
Step 2, fold trivial conditions
---
 gcc/tree-vect-stmts.cc | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7a6a10ee0e6..bfbece56464 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6985,16 +6985,8 @@ vectorizable_operation (vec_info *vinfo,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (1)
-{
-  ncopies = 1;
-  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-}
-  else
-{
-  ncopies = vect_get_num_copies (loop_vinfo, vectype);
-  vec_num = 1;
-}
+  ncopies = 1;
+  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
 
   gcc_assert (ncopies >= 1);
 
@@ -7098,10 +7090,9 @@ vectorizable_operation (vec_info *vinfo,
}
 
   /* Put types on constant and invariant SLP children.  */
-  if (1
- && (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
- || !vect_maybe_update_slp_op_vectype (slp_op1, vectype)
- || !vect_maybe_update_slp_op_vectype (slp_op2, vectype)))
+  if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
+ || !vect_maybe_update_slp_op_vectype (slp_op1, vectype)
+ || !vect_maybe_update_slp_op_vectype (slp_op2, vectype))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -7119,8 +7110,7 @@ vectorizable_operation (vec_info *vinfo,
 in the prologue and (mis-)costs one of the stmts as
 vector stmt.  See below for the actual lowering that will
 be applied.  */
- unsigned n
-   = 1 ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies;
+ unsigned n = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
  switch (code)
{
case PLUS_EXPR:
@@ -7428,15 +7418,9 @@ vectorizable_operation (vec_info *vinfo,
   new_stmt, gsi);
}
 
-  if (1)
-   slp_node->push_vec_def (new_stmt);
-  else
-   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+  slp_node->push_vec_def (new_stmt);
 }
 
-  if (0)
-*vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
-
   vec_oprnds0.release ();
   vec_oprnds1.release ();
   vec_oprnds2.release ();
-- 
2.43.0



Re: [PATCH] ltmain.in: don't suppress output for PIC compilations

2025-05-09 Thread Sam James
Sam James  writes:

> When working on xz, I set `-Werror=suggest-attribute=returns_nonnull`, and
> the build failed (as I expected it to), but with no visible error from
> the compiler. There's a mysterious '>/dev/null 2>&1' on the second line where
> liblzma_la-common.o is built without PIC.
>
> With -fPIC, IPA doesn't end up doing attribute discovery. Without it,
> it does. This behaviour is IMO undesirable and we should patch it out.
>
> This also affects building GCC and is particularly unfortunate if we hit
> an ICE.
>
> See https://bugs.gentoo.org/135865 and 
> https://lists.gnu.org/archive/html/libtool-patches/2024-08/msg0.html.
>
>   * ltmain.sh (func_mode_compile): Don't suppress stdout/stderr for no-PIC
>   build.
> ---
> I've proposed this upstream and it ended up being bikeshedded a bit. I've
> been running this locally for nearly a year now and on IRC, others
> expressed interest in this, as it causes pain when debugging.

ping

>
>  ltmain.sh | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/ltmain.sh b/ltmain.sh
> index 1a710177fb26..79cd7c57f42e 100644
> --- a/ltmain.sh
> +++ b/ltmain.sh
> @@ -1576,8 +1576,6 @@ compiler."
>   command="$command -o $obj"
>fi
>  
> -  # Suppress compiler output if we already did a PIC compilation.
> -  command="$command$suppress_output"
>func_show_eval_locale "$command" \
>  '$opt_dry_run || $RM $removelist; exit $EXIT_FAILURE'


RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Biener
On Fri, 9 May 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 9, 2025 11:08 AM
> > To: Richard Sandiford 
> > Cc: Pengfei Li ; gcc-patches@gcc.gnu.org;
> > ktkac...@nvidia.com
> > Subject: Re: [PATCH] vect: Improve vectorization for small-trip-count loops 
> > using
> > subvectors
> > 
> > On Fri, 9 May 2025, Richard Sandiford wrote:
> > 
> > > Richard Biener  writes:
> > > > On Thu, 8 May 2025, Pengfei Li wrote:
> > > >
> > > >> This patch improves the auto-vectorization for loops with known small
> > > >> trip counts by enabling the use of subvectors - bit fields of original
> > > >> wider vectors. A subvector must have the same vector element type as 
> > > >> the
> > > >> original vector and enough bits for all vector elements to be processed
> > > >> in the loop. Using subvectors is beneficial because machine 
> > > >> instructions
> > > >> operating on narrower vectors usually show better performance.
> > > >>
> > > >> To enable this optimization, this patch introduces a new target hook.
> > > >> This hook allows the vectorizer to query the backend for a suitable
> > > >> subvector type given the original vector type and the number of 
> > > >> elements
> > > >> to be processed in the small-trip-count loop. The target hook also has 
> > > >> a
> > > >> could_trap parameter to say if the subvector is allowed to have more
> > > >> bits than needed.
> > > >>
> > > >> This optimization is currently enabled for AArch64 only. Below example
> > > >> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
> > > >> higher instruction throughput.
> > > >>
> > > >> Consider this loop operating on an array of 16-bit integers:
> > > >>
> > > >>for (int i = 0; i < 5; i++) {
> > > >>  a[i] = a[i] < 0 ? -a[i] : a[i];
> > > >>}
> > > >>
> > > >> Before this patch, the generated AArch64 code would be:
> > > >>
> > > >>ptrue   p7.h, vl5
> > > >>ptrue   p6.b, all
> > > >>ld1hz31.h, p7/z, [x0]
> > > >>abs z31.h, p6/m, z31.h
> > > >>st1hz31.h, p7, [x0]
> > > >
> > > > p6.b has all lanes active - why is the abs then not
> > > > simply unmasked?
> > >
> > > There is no unpredicated abs for SVE.  The predicate has to be there,
> > > and so expand introduces one even when the gimple stmt is unconditional.
> > >
> > > >> After this patch, it is optimized to:
> > > >>
> > > >>ptrue   p7.h, vl5
> > > >>ld1hz31.h, p7/z, [x0]
> > > >>abs v31.8h, v31.8h
> > > >>st1hz31.h, p7, [x0]
> > > >
> > > > Help me decipher this - I suppose z31 and v31 "overlap" in the
> > > > register file?  And z31 is a variable-length vector but
> > > > z31.8h is a 8 element fixed length vector?  How can we
> > >
> > > v31.8h, but otherwise yes.
> > >
> > > > end up with just 8 elements here?  From the upper interation
> > > > bound?
> > >
> > > Yeah.
> > >
> > > > I'm not sure why you need any target hook here.  It seems you
> > > > do already have suitable vector modes so why not just ask
> > > > for a suitable vector?  Is it because you need to have
> > > > that register overlap guarantee (otherwise you'd get
> > > > a move)?
> > >
> > > Yeah, the optimisation only makes sense for overlaid vector registers.
> > >
> > > > Why do we not simply use fixed-length SVE here in the first place?
> > >
> > > Fixed-length SVE is restricted to cases where the exact runtime length
> > > is known: the compile-time length is both a minimum and a maximum.
> > > In contrast, the code above would work even for 256-bit SVE.
> > >
> > > > To me doing this in this way in the vectorizer looks
> > > > somewhat out-of-place.
> > > >
> > > > That said, we already have unmasked ABS in the IL:
> > > >
> > > >   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 
> > > > 0,
> > > > 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
> > > >   vect__2.7_16 = ABSU_EXPR ;
> > > >   vect__3.8_17 = VIEW_CONVERT_EXPR > int>(vect__2.7_16);
> > > >   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > > > 0, 0, ... }, vect__3.8_17); [tail call]
> > > >
> > > > so what's missing here?  I suppose having a constant masked ABSU here
> > > > would allow RTL expansion to select a fixed-size mode?
> > > >
> > > > And the vectorizer could simply use the existing
> > > > related_vector_mode hook instead?
> > >
> > > I agree it's a bit awkward.  The problem is that we want conflicting
> > > things.  On the one hand, it would make conceptual sense to use SVE
> > > instructions to provide conditional optabs for Advanced SIMD vector modes.
> > > E.g. SVE's LD1W could act as a predicated load for an Advanced SIMD
> > > int32x4_t vector.  The main problem with that is that Advanced SIMD's
> > > native boolean vector type is an integer vector of 0s and -1s, rather
> > > than an SVE predicate.  For some (native Advanced SIMD) operations we'd
> > > want one type of boolean, for some (SVE em

Re: [PATCH RFC] libstdc++: run testsuite with -Wabi

2025-05-09 Thread Jonathan Wakely
On Fri, 9 May 2025 at 18:13, Jonathan Wakely  wrote:
>
> On Fri, 9 May 2025 at 11:19, Jonathan Wakely  wrote:
> >
> > On Thu, 8 May 2025 at 20:56, Jason Merrill  wrote:
> > >
> > > Tested x86_64-pc-linux-gnu.  Does this make sense for trunk?
> >
> > Yes, it looks useful. I'm going to test it with my "very -std and -m32
> > and old-string ABI" test settings to be sure it doesn't cause any
> > problems.
>
> There are a few failures when using GLIBCXX_TESTSUITE_STDS=20 to run
> tests as C++20 or later:
>
> FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++23
> (test for excess errors)
> Excess errors:
> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/experimental/internet:2100:
> warning: offset of
> 'std::experimental::net::v1::ip::basic_resolver::_M_ctx'
> for '-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]

We have code like this in the networking TS headers:

struct Base {
protected:
Base() = default;
~Base() = default;
};

struct Derived : Base {
void* ptr;
};

Is the warning wrong?

>
> FAIL: experimental/optional/requirements.cc  -std=gnu++20 (test for
> excess errors)
> Excess errors:
> /home/test/src/gcc/libstdc++-v3/testsuite/experimental/optional/requirements.cc:80:
> warning: offset of 'no_copy_assignment::__as_base ' base class for
> '-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]
> /home/test/src/gcc/libstdc++-v3/testsuite/experimental/optional/requirements.cc:81:
> warning: offset of 'no_move_assignment::__as_base ' base class for
> '-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]

This is just test code, which looks like:

struct no_move_constructor
{
  no_move_constructor() = default;
  no_move_constructor(no_move_constructor const&) = default;
  no_move_constructor& operator=(no_move_constructor const&) = default;
  no_move_constructor(no_move_constructor&&) = delete;
  no_move_constructor& operator=(no_move_constructor&&) = default;
};

struct no_move_assignment
{
  no_move_assignment() = default;
  no_move_assignment(no_move_assignment const&) = default;
  no_move_assignment& operator=(no_move_assignment const&) = default;
  no_move_assignment(no_move_assignment&&) = default;
  no_move_assignment& operator=(no_move_assignment&&) = delete;
};

struct no_move : no_move_constructor, no_move_assignment { };



Re: [PATCH] libstdc++: Use scope guard for deallocating nodes in deque.

2025-05-09 Thread Jonathan Wakely
On Fri, 9 May 2025 at 16:13, Tomasz Kaminski  wrote:
>
>
>
> On Thu, May 8, 2025 at 7:46 PM Jonathan Wakely  wrote:
>>
>> On Fri, 18 Apr 2025 at 10:03, Tomasz Kamiński  wrote:
>> >
>> > This patch adds a _Guard_nodes scope guard nested to the _Deque_base,
>> > that deallocates the range of nodes, and replaces __try/__catch block
>> > with approparietly constructed guard object.
>>
>> "appropriately"
>>
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> > * include/bits/deque.tcc (_Deque_base<_Tp, _Alloc>::_Guard_nodes): 
>> > Define.
>>
>> There's no need for the template argument list here, just
>> "_Deque_base" is unambiguous (there's no partial or explicit
>> specialization that could be disambiguated with template argument
>> lists). And just "deque" below.
>>
>> > (_Deque_base<_Tp, _Alloc>::_M_create_nodes): Moved defintion from 
>> > stl_deque.h
>> > and replace __try/__catch with _Guard_nodes scope object.
>> > (deque<_Tp, _Alloc>::_M_fill_insert, deque<_Tp, 
>> > _Alloc>::_M_default_append)
>> > (deque<_Tp, _Alloc>::_M_push_back_aux, deque<_Tp, 
>> > _Alloc>::_M_push_front_aux)
>> > (deque<_Tp, _Alloc>::_M_range_prepend, deque<_Tp, 
>> > _Alloc>::_M_range_append)
>> > (deque<_Tp, _Alloc>::_M_insert_aux): Replace __try/__catch with 
>> > _Guard_nodes
>> > scope object.
>> > (deque<_Tp, _Alloc>::_M_new_elements_at_back)
>> > (deque<_Tp, _Alloc>::_M_new_elements_at_back): Use _M_create_nodes.
>> > * include/bits/stl_deque.h (_Deque_base<_Tp, 
>> > _Alloc>::_Guard_nodes): Declare.
>> > (_Deque_base<_Tp, _Alloc)::_M_create_nodes): Move defintion to 
>> > deque.tcc.
>> > (deque<_Tp, _Alloc>::_Guard_nodes): Add typedef, so name is found 
>> > by lookup.
>> > ---
>> > Testing x86_64-linux, default test configuration passed.
>> > OK for trunk?
>> >
>> >  libstdc++-v3/include/bits/deque.tcc   | 424 --
>> >  libstdc++-v3/include/bits/stl_deque.h |  20 +-
>> >  2 files changed, 196 insertions(+), 248 deletions(-)
>> >
>> > diff --git a/libstdc++-v3/include/bits/deque.tcc 
>> > b/libstdc++-v3/include/bits/deque.tcc
>> > index dabb6ec5365..b70eed69294 100644
>> > --- a/libstdc++-v3/include/bits/deque.tcc
>> > +++ b/libstdc++-v3/include/bits/deque.tcc
>> > @@ -63,6 +63,40 @@ namespace std _GLIBCXX_VISIBILITY(default)
>> >  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >  _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>> >
>> > +  template
>> > +struct
>>
>> No new line here, just "struct _Deque_base...".
>>
>> > +_Deque_base<_Tp, _Alloc>::_Guard_nodes
>> > +  {
>> > +   _Guard_nodes(_Deque_base& __self,
>> > +_Map_pointer __first, _Map_pointer __last)
>> > +   : _M_self(__self), _M_first(__first), _M_last(__last)
>> > +   { }
>> > +
>> > +   ~_Guard_nodes()
>> > +   { _M_self._M_destroy_nodes(_M_first, _M_last); }
>> > +
>> > +   void _M_disarm()
>> > +   { _M_first = _M_last; }
>> > +
>> > +   _Deque_base& _M_self;
>> > +   _Map_pointer _M_first;
>> > +   _Map_pointer _M_last;
>> > +
>> > +  private:
>> > +   _Guard_nodes(_Guard_nodes const&);
>> > +  };
>> > +
>> > +  template
>> > +void
>> > +_Deque_base<_Tp, _Alloc>::
>> > +_M_create_nodes(_Map_pointer __nstart, _Map_pointer __nfinish)
>> > +{
>> > +  _Guard_nodes __guard(*this, __nstart, __nstart);
>> > +  for (_Map_pointer& __cur = __guard._M_last; __cur < __nfinish; 
>> > ++__cur)
>> > +   *__cur = this->_M_allocate_node();
>> > +  __guard._M_disarm();
>> > +}
>> > +
>> >  #if __cplusplus >= 201103L
>> >template 
>> >  void
>> > @@ -310,35 +344,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>> >if (__pos._M_cur == this->_M_impl._M_start._M_cur)
>> > {
>> >   iterator __new_start = _M_reserve_elements_at_front(__n);
>> > - __try
>> > -   {
>> > - std::__uninitialized_fill_a(__new_start, 
>> > this->_M_impl._M_start,
>> > - __x, _M_get_Tp_allocator());
>> > - this->_M_impl._M_start = __new_start;
>> > -   }
>> > - __catch(...)
>> > -   {
>> > - _M_destroy_nodes(__new_start._M_node,
>> > -  this->_M_impl._M_start._M_node);
>> > - __throw_exception_again;
>> > -   }
>> > + _Guard_nodes __guard(*this, __new_start._M_node,
>> > + this->_M_impl._M_start._M_node);
>> > +
>> > + std::__uninitialized_fill_a(__new_start, this->_M_impl._M_start,
>> > + __x, _M_get_Tp_allocator());
>> > + __guard._M_disarm();
>> > + this->_M_impl._M_start = __new_start;
>> > }
>> >else if (__pos._M_cur == this->_M_impl._M_finish._M_cur)
>> > {
>> >   iterator __new_finish = _M_reserve_elements_at_back(__n);
>> > - __try
>> > -   {
>> > -

Re: [PATCH] Restore lrealpath() fallback scenario

2025-05-09 Thread Rink Springer
diff -rubB gcc-15.1.0-base/libiberty/lrealpath.c 
gcc-15.1.0/libiberty/lrealpath.c
--- gcc-15.1.0-base/libiberty/lrealpath.c 2025-04-25 10:18:04.0 +0200
+++ gcc-15.1.0/libiberty/lrealpath.c 2025-05-09 16:49:35.228340555 +0200
@@ -303,4 +303,7 @@
return res;
}
#endif // _WIN32
+
+ /* This system is a lost cause, just duplicate the filename. */
+ return strdup (filename);
}

> On 05/09/2025 5:04 PM CEST Rink Springer  wrote:
>  
>  
> Hi all!
>  
> Git commit e2bb55ec3b70cf45088bb73ba9ca990129328d60 (pr/108350) removes the 
> fallback scenario for lrealpath() when none of the #ifdef's match (in which 
> case the function is empty)
> In this situation, there is no return statement, and hence an uninitialized 
> pointer value is returned. This is bad, as make_relative_prefix_1() uses this 
> pointer and passes it to free().
>  
> This crashes in my hobby OS (https://github.com/zhmu/dogfood) as I do not 
> implement any of the proper lrealpath() scenarios  ;-) This is likely not an 
> issue for any real use case, but I'm submitting this to prevent others from 
> having to chase this.
>  
> I've attached a simple patch that restores the fallback behaviour (return 
> strdup(filename);) which it did prior to pr/108350.
>  
> Regards,
> Rink
> 


[PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-09 Thread Vineet Gupta
This showed up when debugging the testcase for PR119164.

RISC-V FRM mode-switching state machine has special handling for transitions
to and from a call_insn as FRM needs to saved/restored around calls (any
call is considered potentially FRM clobbering). Consider the following
canonical example where insns 2, 4, 6 come are due to user code, while
the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the
ABI semantics.

test_float_point_frm_static:
1:  frrma5 <--
 2: fsrmi   2
3:  fsrma5 <--
 4: callnormalize_vl
5:  frrma5 <--
 6: fsrmi   3
7:  fsrma5 <--

Current implementation of RISC-V TARGET_MODE_NEEDED has special handling
if the call_insn is last insn of BB, to ensure FRM save/reads are emitted
on all the edges. However it doesn't work as intended and is borderline
bogus for following reasons:

 - It fails to detect call_insn as last of BB (PR119164 test) if the
   next BB starts with a code label (say due to call being conditional).
   Granted this is a deficiency of API next_nonnote_nondebug_insn_bb ()
   which incorrectly returns next BB code_label as opposed to returning
   NULL (and this behavior is kind of relied upon by much of gcc).
   This causes missed/delayed state transition to DYN.

 - If code is tightened to actually detect above such as:

 -  rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
 -  if (!insn)
 +  if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn)

   edge insertion happens but ends up splitting the BB which generic
   mode-sw doesn't expect and ends up hittng an ICE.

 - TARGET_MODE_NEEDED hook typically don't modify the CFG.

 - For abnormal edges, insert_insn_end_basic_block () is called, which
   by design on encountering call_insn as last in BB, inserts new insn
   BEFORE the call, not after.

So this is just all wrong and ripe for removal. Moreover there seems to
be no testsuite coverage for this code path at all. Results don't change
at all if this is removed.

The total number of FRM read/writes emitted (static count) across all
benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly
so its a net win even if minimal but the real gain is reduced complexity
and maintenance.

   Before Patch
     ---
frrm fsrmi fsrm   frrm fsrmi frrm
perlbench_r   4204  4204
   cpugcc_r  1670   17 1670   17
   bwaves_r   1601  1601
  mcf_r   1100  1100
   cactusBSSN_r   790   27  760   27
 namd_r  1190   63 1190   63
   parest_r  2180  114 1680  114 <--
   povray_r  1231   17 1231   17
  lbm_r600   600
  omnetpp_r   1701  1701
  wrf_r 2287   13 19562287   13 1956
 cpuxalan_r   1701  1701
   ldecod_r   1100  1100
 x264_r   1401  1401
  blender_r  724   12  182 724   12  182
 cam4_r  324   13  169 324   13  169
deepsjeng_r   1100  1100
  imagick_r  265   16   34 265   16   34
leela_r   1200  1200
  nab_r   1301  1301
exchange2_r   1601  1601
fotonik3d_r   200   11  200   11
 roms_r   330   23  330   23
   xz_r600   600
     ---
4551   55 26234498   55 2623

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Delete.
(riscv_frm_mode_needed): Remove call riscv_frm_emit_after_bb_end.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 46 ---
 1 file changed, 46 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 62ec95d3b885..a500b046cd9a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12162,45 +12162,6 @@ riscv_frm_adjust_mode_after_call (rtx_insn *cur_insn, 
int mode)
   return mode;
 }
 
-/* Insert the backup frm insn to the end of the bb if and only if the call
-   is the last insn of this bb.  */
-
-static void
-riscv_frm_emit_after_bb_end (rtx_insn *cur_insn)
-{
-  edge eg;
-  bool abnormal_edge_p = false;
-  edge_iterator eg_iterator;
-  basic_block bb = BLOCK_FOR_INSN (cur_insn);
-
-  FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
-{
-  if (eg->flags & EDGE_ABNORMAL)
-   abnormal_edge_p = true;
-  else
-   {
- start_sequence ();
- emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
- rtx_insn *backup_insn = get_insns ();
- end_sequence ();
-
- insert_insn_on_edge (backup_insn, eg);
-   }
-}
-
-  if (abnormal_edge_p

Re: [PATCH v19 3/3] c: Add

2025-05-09 Thread Alejandro Colomar
Hi Jakub,

On Fri, May 09, 2025 at 10:02:17PM +0200, Jakub Jelinek wrote:
> On Fri, May 09, 2025 at 09:32:50PM +0200, Alejandro Colomar wrote:
> > gcc/ChangeLog:
> > 
> > * Makefile.in
> 
> Missing (USER_H): Add stdcountof.h.

Thanks!

> > --- /dev/null
> > +++ b/gcc/ginclude/stdcountof.h
> > @@ -0,0 +1,31 @@
> > +/* Copyright (C) 2025 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify
> > +it under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +Under Section 7 of GPL version 3, you are granted additional
> > +permissions described in the GCC Runtime Library Exception, version
> > +3.1, as published by the Free Software Foundation.
> > +
> > +You should have received a copy of the GNU General Public License and
> > +a copy of the GCC Runtime Library Exception along with this program;
> > +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +.  */
> > +
> > +/* ISO C2Y: 7.21 Array count .  */
> > +
> > +#ifndef _STDCOUNTOF_H
> > +#define _STDCOUNTOF_H
> 
> This should define also
> __STDC_VERSION_STDCOUNTOF_H__
> macro (guess to 202502L or when the paper has been approved for now
> or maybe 202500L as something clearly before that).

Hmmm, I was wondering what value I should put there, and eventually I
thought I should just remoove it because this has not yet been released
by ISO, so technically this is an experimental extension that we think
will match a future ISO C release.

Please confirm if I should add some value there.

> > +
> > +#define countof _Countof
> 
> N3550 says
> The macro
> countof(...)
> expands to _Countof(__VA_ARGS__).

This is an editorial mistake in the standard.  I reported it this
morning, and the next draft will be fixed.  See my paper below (but it
won't be voted nor officially presented, because the changes will be
accepted editorially).

> 
> I believe that means it should be function-like macro and
> #include 
> int countof = 42;
> IMHO is valid C2Y source.
> countof doesn't match any of the J.6.2 patterns, it is listed in J.6.3
> instead.
> 
>   Jakub


Have a lovely night!
Alex

---
Name
alx-0016r1 - countof shouldn't be function-like

Category
Operators; consistency

Author
Alejandro Colomar 

History


r0 (2025-05-09):
-  Initial draft.

r1 (2025-05-09):
-  Fix link.

Description
The countof() macro wasn't discussed much, since most of the
discussion went to the operator, _Countof, and its name.  It
seems we accidentally merged a macro that requires parentheses,
which is inconsistent with the real operator, which we agreed it
should not require parentheses.

This proposal is consistent with how the macro 'alignof' was
specified in C17.



Proposed wording
Based on N3550.

7.21  Array count 
@@ p2
 The macro
-   countof(...)
+   countof
 expands to
-_Countof(__VA_ARGS__).
+_Countof.

-- 



signature.asc
Description: PGP signature


[PATCH 0/6] RISC-V: frm state-machine improvements

2025-05-09 Thread Vineet Gupta
Hi,

This came out of Rivos perf team reporting (shoutout to Siavash) that
some of the SPEC2017 workloads had unnecessary FRM wiggles, when
none were needed. The writes in particular could be expensive.

I started with reduced test for PR/119164 from blender:node_testure_util.c.

However in trying to understand (and a botched rewrite of whole thing)
it turned out that lot of code was just unnecessary leading to more
complexity than warranted. As a result there are more deletions here and
the actual improvements come from just a few lines of actual changes.

I've verified each patch incrementally with
 - Testsuite run (unchanged, 1 unexpected pass 
gcc.target/riscv/rvv/autovec/pr119114.c)
 - SPEC build
 - Static analysis of FRM read/write insns emitted in all of SPEC binaries.
 - There's BPI date for some of this too, but the delta there is not
   significant as this could really be uarch specific.

Here's the result for static analysis.


1. revert-confluence  2. remove-edge-insert  4-fewer-frm-restore  
5-call-backtrack
  3. remove-mode-after
  ---    ---  
---
frrm fsrmi fsrm   frrm fsrmi fsrm   frrm fsrmi fsrm 
frrm fsrmi fsrm
perlbench_r   4204  4204  1701  
  1701
   cpugcc_r  1670   17 1670   17  1100  
  1100
   bwaves_r   1601  1601  1601  
  1601
  mcf_r   1100  1100  1100  
  1100
   cactusBSSN_r   790   27  760   27  1901  
  1901
 namd_r  1190   63 1190   63  1401  
  1401
   parest_r  2180  114 1680  114  2401  
  2401
   povray_r  1231   17 1231   17  2616  
  2616
  lbm_r600   600   600  
   600
  omnetpp_r   1701  1701  1701  
  1701
  wrf_r 2287   13 19562287   13 19561268   13 1603  
 613   13   82
 cpuxalan_r   1701  1701  1701  
  1701
   ldecod_r   1100  1100  1100  
  1100
 x264_r   1401  1401  1100  
  1100
  blender_r  724   12  182 724   12  182  61   12   42  
  39   12   16
 cam4_r  324   13  169 324   13  169  45   13   20  
  40   13   17
deepsjeng_r   1100  1100  1100  
  1100
  imagick_r  265   16   34 265   16   34 132   16   25  
  33   16   18
leela_r   1200  1200  1200  
  1200
  nab_r   1301  1301  1301  
  1301
exchange2_r   1601  1601  1601  
  1601
fotonik3d_r   200   11  200   11  1901  
  1901
 roms_r   330   23  330   23  2101  
  2101
   xz_r600   600   600  
   600
    ---  ---  

4551   55 26234498   55 26231804   55 1707  
1023   55  150
    ---  ---  

  7729  7176  3566  
  1228
    ---  ---  


It seems wrf still has half of all read/writes
 613   13   82

with one function having the bulk of them
  solve_em_  5551   50

This is 1 static RM so ideally needs 1 save and 1 restore.

I have a feeling this has to do with following:
https://godbolt.org/z/Px9es7j1r

The function call code path need not bother with frm save/restore at
all. This is currently being investigated but could take more time.

Please review.

Thx,
-Vineet

Vineet Gupta (6):
  emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into
next BB
  RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
  RISC-V: frm/mode-switch: remove dubious frm edge insertion before
call_insn
  RISC-V: frm/mode-switch: TARGET_MODE_AFTER not needed for frm
switching
  RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition
  RISC-V: frm/mode-switch: robustify call_insn backtracking
[PR119

Re: [PATCH v19 3/3] c: Add

2025-05-09 Thread Jakub Jelinek
On Fri, May 09, 2025 at 09:32:50PM +0200, Alejandro Colomar wrote:
> gcc/ChangeLog:
> 
>   * Makefile.in

Missing (USER_H): Add stdcountof.h.

> --- /dev/null
> +++ b/gcc/ginclude/stdcountof.h
> @@ -0,0 +1,31 @@
> +/* Copyright (C) 2025 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +.  */
> +
> +/* ISO C2Y: 7.21 Array count .  */
> +
> +#ifndef _STDCOUNTOF_H
> +#define _STDCOUNTOF_H

This should define also
__STDC_VERSION_STDCOUNTOF_H__
macro (guess to 202502L or when the paper has been approved for now
or maybe 202500L as something clearly before that).

> +
> +#define countof _Countof

N3550 says
The macro
countof(...)
expands to _Countof(__VA_ARGS__).

I believe that means it should be function-like macro and
#include 
int countof = 42;
IMHO is valid C2Y source.
countof doesn't match any of the J.6.2 patterns, it is listed in J.6.3
instead.

Jakub



Re: [WWWDOCS] readings: add links to CTF and BTF format specifications

2025-05-09 Thread Andrew Pinski
On Fri, May 9, 2025 at 12:47 PM Indu Bhagat  wrote:
>
> Fix PR web/114649

I think this counts as obvious. And LGTM too.

> ---
>  htdocs/readings.html | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/htdocs/readings.html b/htdocs/readings.html
> index 56398317..3b0556e6 100644
> --- a/htdocs/readings.html
> +++ b/htdocs/readings.html
> @@ -598,6 +598,10 @@ names.
>
>https://dwarfstd.org";>DWARF Workgroup
>
> +  https://sourceware.org/binutils/docs/ctf-spec.html";>Compact C 
> Type Format (CTF)
> +
> +  https://www.kernel.org/doc/Documentation/bpf/btf.rst";>BPF 
> Type Format (BTF)
> +
>  
>
>http://compilerconnection.com";>Links related to many
> --
> 2.43.0
>


[pushed] c++: visibility of instantiated template friends

2025-05-09 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In 20_util/variant/visit_member.cc, instantiation of the variant friend
declaration of __get for variant was being marked as internal
because that variant specialization is itself internal.  And therefore
check_module_override didn't try to merge it with the non-exported
namespace-scope declaration of __get.

But the template parms of variant are not part of the friend template's
identity, so they should not affect its visibility.  If they are substituted
into the friend declaration, we'll handle that when looking at the
declaration itself.

This change no longer seems necessary to fix the testcase, but does still
seem correct.  We definitely still get here during tsubst_friend_function.

gcc/cp/ChangeLog:

* decl2.cc (determine_visibility): Ignore args for friend templates.
---
 gcc/cp/decl2.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 21156f1dd3b..15db1d65734 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -3160,7 +3160,9 @@ determine_visibility (tree decl)
  && !attr)
{
  int depth = TMPL_ARGS_DEPTH (args);
- if (DECL_VISIBILITY_SPECIFIED (decl))
+ if (DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (TI_TEMPLATE (tinfo)))
+   /* Class template args don't affect template friends.  */;
+ else if (DECL_VISIBILITY_SPECIFIED (decl))
{
  /* A class template member with explicit visibility
 overrides the class visibility, so we need to apply

base-commit: 21842fe301caa5dbc69a69033cdc17bb29b8c399
-- 
2.49.0



Re: [PATCH v19 3/3] c: Add

2025-05-09 Thread Joseph Myers
On Fri, 9 May 2025, Alejandro Colomar wrote:

> > This should define also
> > __STDC_VERSION_STDCOUNTOF_H__
> > macro (guess to 202502L or when the paper has been approved for now
> > or maybe 202500L as something clearly before that).
> 
> Hmmm, I was wondering what value I should put there, and eventually I
> thought I should just remoove it because this has not yet been released
> by ISO, so technically this is an experimental extension that we think
> will match a future ISO C release.

I think it makes most sense to add / update these values only when the 
final version macro value is known and all the features from that version 
have been implemented (that is, I agree with omitting it in this patch).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v19 1/3] contrib/: Add support for Cc: and Link: tags

2025-05-09 Thread Joseph Myers
On Fri, 9 May 2025, Alejandro Colomar wrote:

> contrib/ChangeLog:
> 
>   * gcc-changelog/git_commit.py (GitCommit):
>   Add support for 'Cc: ' and 'Link: ' tags.

Please remove this patch from this patch series; it has nothing to do with 
_Countof and would probably be reviewed by different people.  Just don't 
use any unsupported tags in commit messages.  (And I don't think Cc: 
belongs in a commit message at all; information about who you chose to CC 
on a patch submission is of no use to people looking at the commit 
history, so doesn't belong in that history.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] gimple: Canonical order for invariants [PR118902]

2025-05-09 Thread Andrew Pinski
On Mon, Apr 21, 2025 at 1:42 AM Richard Biener
 wrote:
>
> On Thu, Apr 17, 2025 at 7:37 PM Andrew Pinski  
> wrote:
> >
> > So unlike constants, address invariants are currently put first if
> > used with a SSA NAME.
> > It would be better if address invariants are consistent with constants
> > and this patch changes that.
> > gcc.dg/tree-ssa/pr118902-1.c is an example where this canonicalization
> > can help. In it if `p` variable was a global variable, FRE (VN) would have 
> > figured
> > it out that `a` could never be equal to `&p` inside the loop. But without 
> > the
> > canonicalization we end up with `&p == a.0_1` which VN does try to handle 
> > for conditional
> > VN.
> >
> > Bootstrapped and tested on x86_64.
> >
> > PR tree-optimization/118902
> > gcc/ChangeLog:
> >
> > * fold-const.cc (tree_swap_operands_p): Place invariants in the 
> > first operand
> > if not used with constants.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/pr118902-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/fold-const.cc  |  6 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c | 21 +
> >  2 files changed, 27 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 1275ef75315..c9471ea44b0 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -7246,6 +7246,12 @@ tree_swap_operands_p (const_tree arg0, const_tree 
> > arg1)
> >if (TREE_CONSTANT (arg0))
> >  return true;
> >
> > +  /* Put invariant address in arg1. */
> > +  if (is_gimple_invariant_address (arg1))
> > +return false;
> > +  if (is_gimple_invariant_address (arg0))
> > +return true;
>
> We could make this cheaper by considering all ADDR_EXPRs here?
>
> I'll note that with this or the above
>
>   /* Put SSA_NAMEs last.  */
>   if (TREE_CODE (arg1) == SSA_NAME)
> return false;
>   if (TREE_CODE (arg0) == SSA_NAME)
> return true;
>
> is a bit redundant and contradicting, when we are in GIMPLE, at least.
> I'd say on GIMPLE reversing the above to put SSA_NAMEs first would
> solve the ADDR_EXPR issue as well.
>
> The idea of tree_swap_operands_p seems to be to put "simple" things
> second, but on GIMPLE SSA_NAME is not simple.  With GENERIC
> this would put memory refs first, SSA_NAME second, which is reasonable.
>
> I'd say since an ADDR_EXPR is always a "value" (not memory), putting it
> last makes sense in general, whether invariant or not.  Can you test that?
> The issue with is_gimple_invariant_address is that it walks all handled
> components.

Coming back to this, I will make a change to put ADDR first instead of
my patch of is_gimple_invariant_address, next week.

Note I just noticed while trying to remove
forward_propagate_into_gimple_cond and
forward_propagate_into_comparison that we have:
(for cmp (eq ne)
 (simplify
  /* SSA names are canonicalized to 2nd place.  */
  (cmp addr@0 SSA_NAME@1)

But that seems wrong if we had SSA_NAME which was defined by an
ADDR_EXPR as we don't redo canonicalization when doing valueization.
It just happens to work in the end because fold will do the
canonicalization  before match and simplify and forwprop uses fold do
the simplifcation during forward_propagate_into_comparison_1. While
for match and simplify on gimple, it does not while valueization of
the names. Anyways I will fix that; and add a comment on why it is not
always canonicalized. Just bringing it up as a related issue here.

Thanks,
Andrew Pinski

>
> Richard.
>
> > +
> >/* It is preferable to swap two SSA_NAME to ensure a canonical form
> >   for commutative and comparison operators.  Ensuring a canonical
> >   form allows the optimizers to find additional redundancies without
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c
> > new file mode 100644
> > index 000..fa21b8a74ef
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +void foo(int);
> > +void l(int**);
> > +int f1(int j, int t)
> > +{
> > +  int p = 0;
> > +  int *a = &p;
> > +  l(&a);
> > +  if (a == &p)
> > +return 0;
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (a == &p) foo(p);
> > +  }
> > +  return 0;
> > +}
> > +
> > +/* We should be able to remove the call to foo because a is never equal to 
> > &p inside the loop.  */
> > +/* { dg-final { scan-tree-dump-not "foo " "optimized"} } */
> > --
> > 2.43.0
> >


[PATCH 6/6] RISC-V: frm/mode-switch: robustify call_insn backtracking [PR119164][PR120203]

2025-05-09 Thread Vineet Gupta
So this is where my FRM excursions really began, trying to elide the
extraneous FRM save/restores when none was needed.

After a call_insn, mode needs to be switched from DYN_CALL to DYN.
Failing to do so could defer the DYN_CALL to DYN, creating interim
transitions leading to extraneous FRM save/retore.

The current back checking of call_insn was too coarse-grained and flawed
just like flawed forward check which was removed earlier in the series.
The API prev_nonnote_nondebug_insn_bb () implies current insn and the
call_insn to be in same BB which need not always be true. The problem is
not with the API, but the use thereof.

Fix this by tracking call_insn more explicitly in TARGET_MODE_NEEDED.
 - On seeing a call_insn, make a note.
 - On subsequent insns, if note seen, do the state switch and clear the note.

The number of FRM read/writes across SPEC2017 -Ofast -mrv64gcv improves
as well.

   Before After
   ----
  frrm fsrmi fsrm   frrm fsrmi frrm
   perlbench_r  1701  1701
  cpugcc_r  1100  1100
  bwaves_r  1601  1601
 mcf_r  1100  1100
  cactusBSSN_r  1901  1901
namd_r  1401  1401
  parest_r  2401  2401
  povray_r  2616  2616
 lbm_r   600   600
 omnetpp_r  1701  1701
 wrf_r1268   13 1603 613   13   82
cpuxalan_r  1701  1701
  ldecod_r  1100  1100
x264_r  1100  1100
 blender_r  61   12   42  39   12   16
cam4_r  45   13   20  40   13   17
   deepsjeng_r  1100  1100
 imagick_r 132   16   25  33   16   18
   leela_r  1200  1200
 nab_r  1301  1301
   exchange2_r  1601  1601
   fotonik3d_r  1901  1901
roms_r  2101  2101
  xz_r   600   600
-  ---
  1804   55 17071023   55  150

This fixes PR119164 (and also PR119832 w/o need for TARGET_MODE_CONFLUENCE).

It also fixes PR120203 where current codegen seemed wrong for 
float-point-dynamic-frm-74.c
It was missing a FRM save after 2nd call normalize_vl_2 ().

|frrma5
|fsrmi   1
|
|vfadd.vv v1,v8,v9
|fsrma5
|beq a1,zero,.L2
|
|callnormalize_vl_1
|frrma5
|
| .L3:
|fsrmi   3
|vfadd.vv v8,v8,v9
|fsrma5
|jr  ra
|
| .L2:
|callnormalize_vl_2
|frrma5   <-- missing
|j   .L3

PR target/119164
PR target/120203

gcc/ChangeLog:

* config/riscv/riscv.cc (CFUN_IN_CALL): New macro.
(struct mode_switching_info): Add new field.
(riscv_frm_adjust_mode_after_call): Remove.
(riscv_frm_mode_needed): Track call_insn.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Bump
expected FRRM by 1.
* gcc.target/riscv/rvv/base/pr119164.c: New test.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 42 +++
 .../rvv/base/float-point-dynamic-frm-74.c |  2 +-
 .../gcc.target/riscv/rvv/base/pr119164.c  | 22 ++
 3 files changed, 39 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 37f3ace49a8b..cd959241e023 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -107,6 +107,8 @@ along with GCC; see the file COPYING3.  If not see
 /* True the mode switching has static frm, or false.  */
 #define STATIC_FRM_P(c) ((c)->machine->mode_sw_info.static_frm_p)
 
+#define CFUN_IN_CALL(c) ((c)->machine->mode_sw_info.cfun_call)
+
 /* True if we can use the instructions in the XTheadInt extension
to handle interrupts, or false.  */
 #define TH_INT_INTERRUPT(c)\
@@ -176,10 +178,13 @@ struct GTY(()) mode_switching_info {
  mode instruction in the function or not.  */
   bool static_frm_p;
 
+  bool cfun_call;
+
   mode_switching_info ()
 {
   dynamic_frm = NULL_RTX;
   static_frm_p = false;
+  cfun_call = false;
 }
 };
 
@@ -12148,20 +12153,6 @@ riscv_emit_mode_set (int entity, int mode, int 
prev_mode,
 }
 }
 
-/* Adjust the FRM_NONE insn after a call to FRM_DYN for the
-   underlying emit.  */
-
-static int
-riscv_frm_adjust_mode_after_call (rt

Re: [PATCH v19 1/3] contrib/: Add support for Cc: and Link: tags

2025-05-09 Thread Alejandro Colomar
Hi Joseph,

On Fri, May 09, 2025 at 09:00:58PM +, Joseph Myers wrote:
> On Fri, 9 May 2025, Alejandro Colomar wrote:
> 
> > contrib/ChangeLog:
> > 
> > * gcc-changelog/git_commit.py (GitCommit):
> > Add support for 'Cc: ' and 'Link: ' tags.
> 
> Please remove this patch from this patch series; it has nothing to do with 
> _Countof and would probably be reviewed by different people.

Who should review that?  Nobody applied it, nor replied to it.  (IIRC
someone said the patch looks good, but not someone who should merge it,
so it was of no use.)  Please point me to the people who are responsible
for that.

>  Just don't 
> use any unsupported tags in commit messages.  (And I don't think Cc: 
> belongs in a commit message at all; information about who you chose to CC 
> on a patch submission is of no use to people looking at the commit 
> history, so doesn't belong in that history.)

I disagree.  In projects I (co-)maintain, this info has been very
useful, for example to ask about a very old patch, where I couldn't
reach the author, but I could reach people that was CCd in discussions
leading to the patch.  Thanks to having their names in commit messages,
I could reach them.

But since you're the maintainer here, it's your prerrogative, so I can
drop them if you confirm I should do that.


Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v19 1/3] contrib/: Add support for Cc: and Link: tags

2025-05-09 Thread Joseph Myers
On Fri, 9 May 2025, Alejandro Colomar wrote:

> Hi Joseph,
> 
> On Fri, May 09, 2025 at 09:00:58PM +, Joseph Myers wrote:
> > On Fri, 9 May 2025, Alejandro Colomar wrote:
> > 
> > > contrib/ChangeLog:
> > > 
> > >   * gcc-changelog/git_commit.py (GitCommit):
> > >   Add support for 'Cc: ' and 'Link: ' tags.
> > 
> > Please remove this patch from this patch series; it has nothing to do with 
> > _Countof and would probably be reviewed by different people.
> 
> Who should review that?  Nobody applied it, nor replied to it.  (IIRC

Putting a patch in an unrelated series makes it *less* likely anyone will 
look at it, since people concerned with that patch may be disjoint from 
those concerned with the apparent subject of the series.  (So if a change 
to one part of the compiler depends on a change to another, I think it's 
always better to post the dependency separately, not as part of the main 
series.)

> someone said the patch looks good, but not someone who should merge it,
> so it was of no use.)  Please point me to the people who are responsible
> for that.

People who have worked on that code lately?  (In practice people can 
maintain their own scripts in contrib/, but Martin Liska is no longer 
significantly active in GCC development so that doesn't help here.)

> I disagree.  In projects I (co-)maintain, this info has been very
> useful, for example to ask about a very old patch, where I couldn't
> reach the author, but I could reach people that was CCd in discussions
> leading to the patch.  Thanks to having their names in commit messages,
> I could reach them.
> 
> But since you're the maintainer here, it's your prerrogative, so I can
> drop them if you confirm I should do that.

I think CC should be dropped from commit messages.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v19 1/3] contrib/: Add support for Cc: and Link: tags

2025-05-09 Thread Alejandro Colomar
Hi Joseph,

On Fri, May 09, 2025 at 09:39:34PM +, Joseph Myers wrote:
> > > > contrib/ChangeLog:
> > > > 
> > > > * gcc-changelog/git_commit.py (GitCommit):
> > > > Add support for 'Cc: ' and 'Link: ' tags.
> > > 
> > > Please remove this patch from this patch series; it has nothing to do 
> > > with 
> > > _Countof and would probably be reviewed by different people.
> > 
> > Who should review that?  Nobody applied it, nor replied to it.  (IIRC
> 
> Putting a patch in an unrelated series makes it *less* likely anyone will 
> look at it, since people concerned with that patch may be disjoint from 
> those concerned with the apparent subject of the series.  (So if a change 
> to one part of the compiler depends on a change to another, I think it's 
> always better to post the dependency separately, not as part of the main 
> series.)

IIRC, I did resend that patch also separate from this series a long time
ago, and no response.

> 
> > someone said the patch looks good, but not someone who should merge it,
> > so it was of no use.)  Please point me to the people who are responsible
> > for that.
> 
> People who have worked on that code lately?  (In practice people can 
> maintain their own scripts in contrib/, but Martin Liska is no longer 
> significantly active in GCC development so that doesn't help here.)

I did ping several people who touched that code lately, but they said
they weren't responsible for it.  :|

> > I disagree.  In projects I (co-)maintain, this info has been very
> > useful, for example to ask about a very old patch, where I couldn't
> > reach the author, but I could reach people that was CCd in discussions
> > leading to the patch.  Thanks to having their names in commit messages,
> > I could reach them.
> > 
> > But since you're the maintainer here, it's your prerrogative, so I can
> > drop them if you confirm I should do that.
> 
> I think CC should be dropped from commit messages.

Thanks; I'll do that for v20.  (And I'll drop patch 1/3, since it will
be unnecessary then.)


Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


[pushed] c++: CWG2369 workaround and ... [PR120185]

2025-05-09 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

My r16-479 adjustment to the PR99599 workaround broke on a class with a
varargs constructor.

It also occurred to me that we don't need to do non-dep conversion checking
in two phases when concepts aren't supported.

PR c++/99599
PR c++/120185

gcc/cp/ChangeLog:

* class.cc (type_has_converting_constructor): Handle null parm.
* pt.cc (fn_type_unification): Skip early non-dep checking if
no concepts.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-nondep6.C: New test.
---
 gcc/cp/class.cc   |  3 +++
 gcc/cp/pt.cc  |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-nondep6.C | 12 
 3 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep6.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 370bfa35f9e..2764bb52ddd 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -5744,6 +5744,9 @@ type_has_converting_constructor (tree t)
 {
   tree fn = *iter;
   tree parm = FUNCTION_FIRST_USER_PARMTYPE (fn);
+  if (parm == NULL_TREE)
+   /* Varargs.  */
+   return true;
   if (parm == void_list_node
  || !sufficient_parms_p (TREE_CHAIN (parm)))
/* Can't accept a single argument, so won't be considered for
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0694c28cde3..09f74a2814b 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23254,7 +23254,7 @@ fn_type_unification (tree fn,
  conversions that we know are not going to induce template instantiation
  (PR99599).  */
   if (strict == DEDUCE_CALL
-  && incomplete
+  && incomplete && flag_concepts
   && check_non_deducible_conversions (parms, args, nargs, fn, strict, 
flags,
  convs, explain_p,
  /*noninst_only_p=*/true))
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-nondep6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-nondep6.C
new file mode 100644
index 000..7adf6ecfb86
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-nondep6.C
@@ -0,0 +1,12 @@
+// PR c++/120185
+
+struct A {
+  A(...);
+};
+
+template  void f(A, T) { }
+
+int main()
+{
+  f(42, 24);
+}

base-commit: 3ae6b582d629e63e12d0ecfb7cbe44033778f88c
-- 
2.49.0



Re: [PATCH 2/2] c++/modules: Remove unnecessary lazy_load_pendings

2025-05-09 Thread Jason Merrill

On 4/21/25 6:22 AM, Nathaniel Shead wrote:

This call is not necessary, as we don't access the bodies of any classes
that we instantiate here.


This turns out to break

20_util/function_objects/mem_fn/constexpr.cc
std/ranges/view.cc

when modified to use import std (as attached).  For the former, I see


In file included from 
/home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/stdc++.h:55,
 from 
/home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std.cc:30,
of module std, imported at 
/home/jason/gt/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc:21:
/home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional: In 
instantiation of ‘class std::_Mem_fn_base’:
/home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:211:12:   
required from ‘struct std::_Mem_fn’
  211 | struct _Mem_fn<_Res _Class::*>
  |^~~
/home/jason/gt/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc:36:21:
   required from here
   36 |   return std::mem_fn(&F::i)(f);
  |  ~~~^~~
/home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:190:23: error: 
conflicting declaration of template ‘template struct std::_Bind_check_arity’
  190 | friend struct _Bind_check_arity;
  |   ^
/home/jason/s/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/functional:834:12: note: 
previous declaration ‘template struct 
std::_Bind_check_arity’
  834 | struct _Bind_check_arity { };
  |^


lookup_imported_hidden_friend is failing without the lazy_load_pendings, 
so we try and fail to push the instantiation.  Reverting this patch 
makes them pass.


Jasondiff --git a/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc b/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc
index dd1dcc9947a..59568d630c4 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/mem_fn/constexpr.cc
@@ -17,7 +17,11 @@
 
 // { dg-do compile { target c++20 } }
 
+#if __cpp_modules
+import std;
+#else
 #include 
+#endif
 
 struct F
 {
diff --git a/libstdc++-v3/testsuite/std/ranges/view.cc b/libstdc++-v3/testsuite/std/ranges/view.cc
index 57cd6503f0c..52ea0e2e0ac 100644
--- a/libstdc++-v3/testsuite/std/ranges/view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/view.cc
@@ -17,6 +17,10 @@
 
 // { dg-do compile { target c++20 } }
 
+#include 
+#if __cpp_modules
+import std;
+#else
 #include 
 #include 
 #include 
@@ -26,13 +30,15 @@
 #include 
 #include 
 #include 
-#include 
+#endif
 
 static_assert(std::ranges::view>);
 static_assert(std::ranges::view>);
 static_assert(std::ranges::view>); // Changed with P2325R3
 static_assert(std::ranges::view);
+#ifndef __cpp_modules
 static_assert(std::ranges::view);
+#endif
 
 static_assert(!std::ranges::view>);
 static_assert(!std::ranges::view>);


Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Biener 
>> Sent: Friday, May 9, 2025 11:08 AM
>> To: Richard Sandiford 
>> Cc: Pengfei Li ; gcc-patches@gcc.gnu.org;
>> ktkac...@nvidia.com
>> Subject: Re: [PATCH] vect: Improve vectorization for small-trip-count loops 
>> using
>> subvectors
>> 
>> On Fri, 9 May 2025, Richard Sandiford wrote:
>> 
>> > Richard Biener  writes:
>> > > On Thu, 8 May 2025, Pengfei Li wrote:
>> > >
>> > >> This patch improves the auto-vectorization for loops with known small
>> > >> trip counts by enabling the use of subvectors - bit fields of original
>> > >> wider vectors. A subvector must have the same vector element type as the
>> > >> original vector and enough bits for all vector elements to be processed
>> > >> in the loop. Using subvectors is beneficial because machine instructions
>> > >> operating on narrower vectors usually show better performance.
>> > >>
>> > >> To enable this optimization, this patch introduces a new target hook.
>> > >> This hook allows the vectorizer to query the backend for a suitable
>> > >> subvector type given the original vector type and the number of elements
>> > >> to be processed in the small-trip-count loop. The target hook also has a
>> > >> could_trap parameter to say if the subvector is allowed to have more
>> > >> bits than needed.
>> > >>
>> > >> This optimization is currently enabled for AArch64 only. Below example
>> > >> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
>> > >> higher instruction throughput.
>> > >>
>> > >> Consider this loop operating on an array of 16-bit integers:
>> > >>
>> > >> for (int i = 0; i < 5; i++) {
>> > >>   a[i] = a[i] < 0 ? -a[i] : a[i];
>> > >> }
>> > >>
>> > >> Before this patch, the generated AArch64 code would be:
>> > >>
>> > >> ptrue   p7.h, vl5
>> > >> ptrue   p6.b, all
>> > >> ld1hz31.h, p7/z, [x0]
>> > >> abs z31.h, p6/m, z31.h
>> > >> st1hz31.h, p7, [x0]
>> > >
>> > > p6.b has all lanes active - why is the abs then not
>> > > simply unmasked?
>> >
>> > There is no unpredicated abs for SVE.  The predicate has to be there,
>> > and so expand introduces one even when the gimple stmt is unconditional.
>> >
>> > >> After this patch, it is optimized to:
>> > >>
>> > >> ptrue   p7.h, vl5
>> > >> ld1hz31.h, p7/z, [x0]
>> > >> abs v31.8h, v31.8h
>> > >> st1hz31.h, p7, [x0]
>> > >
>> > > Help me decipher this - I suppose z31 and v31 "overlap" in the
>> > > register file?  And z31 is a variable-length vector but
>> > > z31.8h is a 8 element fixed length vector?  How can we
>> >
>> > v31.8h, but otherwise yes.
>> >
>> > > end up with just 8 elements here?  From the upper interation
>> > > bound?
>> >
>> > Yeah.
>> >
>> > > I'm not sure why you need any target hook here.  It seems you
>> > > do already have suitable vector modes so why not just ask
>> > > for a suitable vector?  Is it because you need to have
>> > > that register overlap guarantee (otherwise you'd get
>> > > a move)?
>> >
>> > Yeah, the optimisation only makes sense for overlaid vector registers.
>> >
>> > > Why do we not simply use fixed-length SVE here in the first place?
>> >
>> > Fixed-length SVE is restricted to cases where the exact runtime length
>> > is known: the compile-time length is both a minimum and a maximum.
>> > In contrast, the code above would work even for 256-bit SVE.
>> >
>> > > To me doing this in this way in the vectorizer looks
>> > > somewhat out-of-place.
>> > >
>> > > That said, we already have unmasked ABS in the IL:
>> > >
>> > >   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 
>> > > 0,
>> > > 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
>> > >   vect__2.7_16 = ABSU_EXPR ;
>> > >   vect__3.8_17 = VIEW_CONVERT_EXPR> int>(vect__2.7_16);
>> > >   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > > 0, 0, ... }, vect__3.8_17); [tail call]
>> > >
>> > > so what's missing here?  I suppose having a constant masked ABSU here
>> > > would allow RTL expansion to select a fixed-size mode?
>> > >
>> > > And the vectorizer could simply use the existing
>> > > related_vector_mode hook instead?
>> >
>> > I agree it's a bit awkward.  The problem is that we want conflicting
>> > things.  On the one hand, it would make conceptual sense to use SVE
>> > instructions to provide conditional optabs for Advanced SIMD vector modes.
>> > E.g. SVE's LD1W could act as a predicated load for an Advanced SIMD
>> > int32x4_t vector.  The main problem with that is that Advanced SIMD's
>> > native boolean vector type is an integer vector of 0s and -1s, rather
>> > than an SVE predicate.  For some (native Advanced SIMD) operations we'd
>> > want one type of boolean, for some (SVE emulating Advanced SIMD)
>> > operations we'd want the other type of boolean.
>> >
>> > The patch goes the other way and treats using Advanced SIMD as an
>> > opt

[PATCH v3 1/2] RISC-V: Support RISC-V Profiles 20/22.

2025-05-09 Thread Jiawei
This patch introduces support for RISC-V Profiles RV20 and RV22 [1],
enabling developers to utilize these profiles through the -march option.

[1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0

Version log:
Using lowercase letters to present Profiles.
Using '_' as divsor between Profiles and other RISC-V extension.
Add descriptions in invoke.texi.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_profiles): New 
struct.
(riscv_subset_list::parse_profiles): New parser.
(riscv_subset_list::parse_base_ext): Ditto.
* config/riscv/riscv-subset.h: New def.
* doc/invoke.texi: New option descriptions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-49.c: New test.
* gcc.target/riscv/arch-50.c: New test.
* gcc.target/riscv/arch-51.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc  | 75 +++-
 gcc/config/riscv/riscv-subset.h  |  2 +
 gcc/doc/invoke.texi  | 17 --
 gcc/testsuite/gcc.target/riscv/arch-49.c |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-50.c | 12 
 gcc/testsuite/gcc.target/riscv/arch-51.c | 12 
 6 files changed, 116 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-49.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-50.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-51.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index ca14eb96b..fcf694dba 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -274,6 +274,12 @@ struct riscv_ext_version
   int minor_version;
 };
 
+struct riscv_profiles
+{
+  const char *profile_name;
+  const char *profile_string;
+};
+
 /* All standard extensions defined in all supported ISA spec.  */
 static const struct riscv_ext_version riscv_ext_version_table[] =
 {
@@ -502,6 +508,31 @@ static const struct riscv_ext_version riscv_combine_info[] 
=
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
 
+/* This table records the mapping form RISC-V Profiles into march string.  */
+static const riscv_profiles riscv_profiles_table[] =
+{
+  /* RVI20U only contains the base extension 'i' as mandatory extension.  */
+  {"rvi20u64", "rv64i"},
+  {"rvi20u32", "rv32i"},
+
+  /* RVA20U contains the 'i,m,a,f,d,c,zicsr,zicntr,ziccif,ziccrse,ziccamoa,
+ zicclsm,za128rs' as mandatory extensions.  */
+  {"rva20u64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_za128rs"},
+
+  /* RVA22U contains the 'i,m,a,f,d,c,zicsr,zihintpause,zba,zbb,zbs,zicntr,
+ zihpm,ziccif,ziccrse,ziccamoa, zicclsm,zic64b,za64rs,zicbom,zicbop,zicboz,
+ zfhmin,zkt' as mandatory extensions.  */
+  {"rva22u64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt"},
+
+  /* Currently we do not define S/M mode Profiles in gcc part.  */
+
+  /* Terminate the list.  */
+  {NULL, NULL}
+};
+
 static const riscv_cpu_info riscv_cpu_tables[] =
 {
 #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
@@ -1109,6 +1140,46 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   return p;
 }
 
+/* Parsing RISC-V Profiles in -march string.
+   Return string with mandatory extensions of Profiles.  */
+const char *
+riscv_subset_list::parse_profiles (const char *p)
+{
+  /* Checking if input string contains a Profiles.
+ There are two cases use Profiles in -march option:
+
+ 1. Only use Profiles in '-march' as input
+ 2. Mixed Profiles with other extensions
+
+ Use '_' to split Profiles and other extension.  */
+  for (int i = 0; riscv_profiles_table[i].profile_name != NULL; ++i)
+{
+  const char *match = strstr (p, riscv_profiles_table[i].profile_name);
+  const char *plus_ext = strchr (p, '_');
+  /* Find profile at the begin.  */
+  if (match != NULL && match == p)
+   {
+ /* If there's no '_' sign, return the profile_string directly.  */
+ if (!plus_ext)
+   return riscv_profiles_table[i].profile_string;
+ /* If there's a '_' sign, need to add profiles with other ext.  */
+ else
+ {
+   size_t arch_len = strlen (riscv_profiles_table[i].profile_string)
+ + strlen (plus_ext);
+   /* Reset the input string with Profiles mandatory extensions,
+  end with '_' to connect other additional extensions.  */
+   char *result = new char[arch_len + 2];
+   strcpy (result, riscv_profiles_table[i].profile_string);
+   strcat (result, "_");
+   strcat (result, plus_ext + 1); /* skip the '_'.  */
+   return result;
+ }
+   }
+}
+  return p;
+}
+
 /* Parsing function for base extensions, rv[32|64][i|e|g]
 
Return Value:
@@ -1123,6 +1194,8 @@ riscv_subset_list::parse_base_ext (const char *p)
   unsigned minor_version = 0;
   bool explicit_versio

[PATCH] Fix wrong optimization of complex boolean expression

2025-05-09 Thread Eric Botcazou
Hi,

this is a regression introduced on the mainline, 15 and 14 branches by:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639303.html
although one may consider that the problem was latent before.

The VRP2 pass turns:

  # prephitmp_3 = PHI <0(4)>
  _1 = prephitmp_3 == 0;
  _5 = stretch_14(D) ^ 1;
  _39 = _1 & _5;
  _40 = _39 | last_20(D);

into

  _5 = stretch_14(D) ^ 1;
  _42 = ~stretch_14(D);
  _39 = _42;
  _40 = last_20(D) | _39;

using the following step:

Folding statement: _1 = prephitmp_3 == 0;
Queued stmt for removal.  Folds to: 1
Folding statement: _5 = stretch_14(D) ^ 1;
Not folded
Folding statement: _39 = _1 & _5;
gimple_simplified to _42 = ~stretch_14(D);
_39 = _42 & 1;
Folded into: _39 = _42;

Folding statement: _40 = _39 | last_20(D);
Folded into: _40 = last_20(D) | _39;

but stretch_14 is a 8-bit boolean so the two forms are not equivalent, that is 
to say dropping the "& 1" is wrong.  It's another instance of the issue at:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558537.html

"The problem is the bitwise/logical dichotomy for operators and the transition 
from the former to the latter for boolean types: if they are 1-bit, that's 
straightforward but, if they are larger, then you need to be careful because 
you cannot, on the one hand, turn a bitwise AND into a logical AND and, on the 
other hand, *not* turn e.g. a bitwise NOT into a logical NOT if they occur in 
the same computation, as the first change will drop the masking that may need 
to be applied after the bitwise NOT if it is not also changed."

Here it's the reverse case: the bitwise NOT (~) is treated as logical by the 
machinery in range-op.cc but the bitwise OR (|) is *not* treated as logical by 
that of vr-values.cc, leading to the same problematic outcome.

Tested on x86-64/Linux, OK for the mainline, 15 and 14 branches?


2025-05-09  Eric Botcazou  

* vr-values.cc (simplify_using_ranges::simplify) :
Do not call simplify_bit_ops_using_ranges for boolean types whose
precision is not 1.


2025-05-09  Eric Botcazou  

* gnat.dg/opt106.adb: New test.
* gnat.dg/opt106_pkg1.ads, gnat.dg/opt106_pkg1.adb: New helper.
* gnat.dg/opt106_pkg2.ads, gnat.dg/opt106_pkg2.adb: Likewise.

-- 
Eric Botcazoudiff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 6603d90c392..4c787593b95 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -1996,10 +1996,13 @@ simplify_using_ranges::simplify (gimple_stmt_iterator *gsi)
 
 	case BIT_AND_EXPR:
 	case BIT_IOR_EXPR:
-	  /* Optimize away BIT_AND_EXPR and BIT_IOR_EXPR
-	 if all the bits being cleared are already cleared or
-	 all the bits being set are already set.  */
-	  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
+	  /* Optimize away BIT_AND_EXPR and BIT_IOR_EXPR if all the bits
+	 being cleared are already cleared or all the bits being set
+	 are already set.  Beware that boolean types must be handled
+	 logically (see range-op.cc) unless they have precision 1.  */
+	  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
+	  && (TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE
+		  || TYPE_PRECISION (TREE_TYPE (rhs1)) == 1))
 	return simplify_bit_ops_using_ranges (gsi, stmt);
 	  break;
 
-- { dg-do run }
-- { dg-options "-O2" }

with Opt106_Pkg1; use Opt106_Pkg1;

procedure Opt106 is
  Obj : T := (False, 0, 0, 0, True);

begin
  Proc (Obj, 0, False, True);
end;
with Opt106_Pkg2; use Opt106_Pkg2;

package body Opt106_Pkg1 is

  procedure Proc (Obj : in out T;
  Data: Integer;
  Last: Boolean;
  Stretch : Boolean) is

  begin
if Stretch and then (Obj.Delayed /= 0 or else not Obj.Attach_Last) then
  raise Program_Error;
end if;

if Obj.Delayed /= 0 then
  Stop (Obj.Delayed, Obj.Before, Data, False);
end if;

if Last or (Obj.Delayed = 0 and not Stretch) then
  Stop (Data, Obj.Before, 0, Last);

  if Last then
Obj.Initialized := False;
  else
Obj.Next := 0;
Obj.Before := Data;
  end if;

else
  if Stretch then
Obj.Next := 1;
  else
Obj.Before := Obj.Delayed;
  end if;
  Obj.Delayed := Data;
end if;
  end;

end Opt106_Pkg1;
package body Opt106_Pkg2 is

  procedure Stop (Delayed : Integer;
  Before  : Integer;
  After   : Integer;
  Last: Boolean) is
  begin
 raise Program_Error;
  end;

end Opt106_Pkg2;
package Opt106_Pkg1 is

  type T is record
Initialized   : Boolean;
Before: Integer;
Delayed   : Integer;
Next  : Integer;
Attach_Last   : Boolean;
  end record;

  procedure Proc (Obj : in out T;
  Data: Integer;
  Last: Boolean;
  Stretch : Boolean);

end Opt106_Pkg1;
package Opt106_Pkg2 is

  procedure Stop (Delayed : Integer;
  Before  : Integer;
  After   : Integer;

Re: [PATCH] match: Remove (ne (cmp) 0) and (eq (cmp) 1) patterns

2025-05-09 Thread Richard Biener
On Fri, May 9, 2025 at 1:34 PM Andrew Pinski  wrote:
>
> On Fri, May 9, 2025 at 1:21 AM Richard Biener
>  wrote:
> >
> > On Fri, May 9, 2025 at 4:51 AM Andrew Pinski  
> > wrote:
> > >
> > > These patterns are not needed any more. There were already
> > > 2 patterns which did `(ne bool_var 0)` into `bool_var` and
> > > `(eq bool_var 1)` into `bool_var`. Just they were after the
> > > pattern that did `(cmp (cond @0 @1 @2) @3)` simplification but
> > > that pattern is now after the ones.
> > > Also these patterns will cause in some cases a new statement to
> > > be created for the comparison. In the case of floating point comparison
> > > wiht non-call exceptions (and trapping math), can cause a new statement
> > > every time fold_stmt is called.
> >
> > Hmm, but do we still fold
> >
> >   _1 = _2 < 1;
> >   if (_1 != 0)
> >
> > to
> >
> >   if (_2 < 1)
> >
> > or does that now again rely on forwprops explicit forwarding into
> > gcond?  I wanted
> > to get rid of the latter eventually.
>
> Oh. Yes this does rely on forwprop explicitly now.
>
> >
> > I agree that the trapping math thing is bad - I wonder if we can catch that 
> > more
> > intelligently (not sure how without following SSA use-def of gconds on bools
> > and see whether they can trap and then not simplifying)
>
> I think I know the way to fix the trapping issue without fully
> removing this. I am going to give it a go later today.
> Since trapping only depends on the code and the type it should be easy
> to add an extra condition here and the latter patterns catch the
> trapping case of removing `bool!=0` already.

Note it's really depending on context.

_1 = _2 < 1.;
_3 = _1 != 0;

would be OK to fold to

_3 = _2 < 1;

but not with the _1 != 0 in the gcond.  That's because gconds can't
throw (and I think
rightfully so).  In principle we should go full steam ahead to have
single-operand
gconds, just the boolean value.  Like we now do for COND_EXPRs.  But
this unfortunately
has very large fallout :/

Thus the "workaround" for non-call-EH.  I believe any mitigation should be in
the match-and-simplify plumbing that handles the gcond - which we already do,
but the side-effect is the ping-pong you are observing.  Maybe we can do
better in replace_stmt_with_simplification where we should hit(?)

  else if (!inplace)
{
  tree res = maybe_push_res_to_seq (res_op, seq);
  if (!res)
return false;
  gimple_cond_set_condition (cond_stmt, NE_EXPR, res,
 build_zero_cst (TREE_TYPE (res)));

and detect when the cond_stmt is SSA != 0 (or the reverse canonical form)
and refuse to simplify if the simplification in 'res_op' is the same as the
current definition of SSA?

>
> Thanks,
> Andrew Pinski
>
> >
> > > gcc.dg/tree-ssa/vrp24.c needed to be adjusted to before 
> > > r13-322-g7f04b0d786e13f.
> > > gcc.dg/analyzer/null-deref-pr102671-2.c needs an increased 
> > > analyzer-max-svalue-depth
> > > not to get an extra warning.
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Remove.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/vrp24.c: Adjust.
> > > * gcc.dg/analyzer/null-deref-pr102671-2.c: Increase 
> > > analyzer-max-svalue-depth.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/match.pd  | 8 
> > >  gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c | 2 +-
> > >  gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +-
> > >  3 files changed, 2 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index ab496d923cc..418efc4230a 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6898,14 +6898,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  (if (ic == ncmp)
> > >   (ncmp @0 @1)
> > >   /* The following bits are handled by 
> > > fold_binary_op_with_conditional_arg.  */
> > > - (simplify
> > > -  (ne (cmp@2 @0 @1) integer_zerop)
> > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > -   (cmp @0 @1)))
> > > - (simplify
> > > -  (eq (cmp@2 @0 @1) integer_truep)
> > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > -   (cmp @0 @1)))
> > >   (simplify
> > >(ne (cmp@2 @0 @1) integer_truep)
> > >(if (types_match (type, TREE_TYPE (@2)))
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c 
> > > b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > > index 298e4839b98..bc141d5c028 100644
> > > --- a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-require-effective-target ptr_eq_long } */
> > > -/* { dg-additional-options "-O2 -Wno-shift-count-overflow" } */
> > > +/* { dg-additional-options "-O2 -Wno-shift-count-overflow 
> > > --param=analyzer-max-svalue-depth=19" } */
> > >
> > >  struct lisp;
> > >  union vectorlike_header { long 

Re: [PATCH RFC] libstdc++: run testsuite with -Wabi

2025-05-09 Thread Jonathan Wakely
On Fri, 9 May 2025 at 11:19, Jonathan Wakely  wrote:
>
> On Thu, 8 May 2025 at 20:56, Jason Merrill  wrote:
> >
> > Tested x86_64-pc-linux-gnu.  Does this make sense for trunk?
>
> Yes, it looks useful. I'm going to test it with my "very -std and -m32
> and old-string ABI" test settings to be sure it doesn't cause any
> problems.

There are a few failures when using GLIBCXX_TESTSUITE_STDS=20 to run
tests as C++20 or later:

FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++23
(test for excess errors)
Excess errors:
/tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/experimental/internet:2100:
warning: offset of
'std::experimental::net::v1::ip::basic_resolver::_M_ctx'
for '-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]

FAIL: experimental/optional/requirements.cc  -std=gnu++20 (test for
excess errors)
Excess errors:
/home/test/src/gcc/libstdc++-v3/testsuite/experimental/optional/requirements.cc:80:
warning: offset of 'no_copy_assignment::__as_base ' base class for
'-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]
/home/test/src/gcc/libstdc++-v3/testsuite/experimental/optional/requirements.cc:81:
warning: offset of 'no_move_assignment::__as_base ' base class for
'-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]

They're all in  headers, and nothing that's exported
from the shared lib.

I'll look into them.



Re: [PATCH] libstdc++: Suppress GDB output from new 'skip' commands [PR118260]

2025-05-09 Thread Tomasz Kaminski
On Fri, May 9, 2025 at 1:32 PM Jonathan Wakely  wrote:

> I added some gdb.execute('skip -rfu ...') commands to the Python hook
> loaded with libstdc++.so but this makes GDB print output like:
>
> Function(s) ^std::(move|forward|as_const|(__)?addressof) will be skipped
> when stepping.
>
> This probably aren't interesting to users, so this change suppresses
> that output by capturing the output into the gdb.execute return value
> (which is then ignored). An exception is thrown if the gdb.execute
> command fails, so this doesn't suppress any errors which might be
> meaningful to users or libstdc++ developers.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/118260
> * python/hook.in: Suppress output from gdb.execute calls to
> register skips.
> ---
>
> Tested manually with GDB.
>
Looks good, and I think skipping move/forward/as_const/addressof is OK.
I am less sold on preexisting skips for functions like begin, empty, front,
but that is not relevant to this change.

>
>  libstdc++-v3/python/hook.in | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/python/hook.in b/libstdc++-v3/python/hook.in
> index d63909d2af4c..74a097cd0a00 100644
> --- a/libstdc++-v3/python/hook.in
> +++ b/libstdc++-v3/python/hook.in
> @@ -55,10 +55,14 @@ if gdb.current_objfile () is not None:
>  if not dir_ in sys.path:
>  sys.path.insert(0, dir_)
>
> -gdb.execute('skip -rfu ^std::(move|forward|as_const|(__)?addressof)')
> -gdb.execute('skip -rfu ^std::(shared|unique)_ptr<.*>::(get|operator)')
> -gdb.execute('skip -rfu
> ^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty)')
> -gdb.execute('skip -rfu
> ^std::(basic_string|vector|array|deque|span)<.*>::operator.]')
> +gdb.execute('skip -rfu ^std::(move|forward|as_const|(__)?addressof)',
> +to_string=True)
> +gdb.execute('skip -rfu ^std::(shared|unique)_ptr<.*>::(get|operator)',
> +to_string=True)
> +gdb.execute('skip -rfu
> ^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty)',
> +to_string=True)
> +gdb.execute('skip -rfu
> ^std::(basic_string|vector|array|deque|span)<.*>::operator.]',
> +to_string=True)
>
>  # Call a function as a plain import would not execute body of the
> included file
>  # on repeated reloads of this object file.
> --
> 2.49.0
>
>


Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-09 Thread LIU Hao

在 2025-5-3 20:52, LIU Hao 写道:

在 2025-5-2 01:25, LIU Hao 写道:
Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of 
`incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more.





I suddenly realized the previous patch was for GCC 15 branch. Here's a new one, 
rebased on master.




Ping.

Also, I think I need some comments on the `force_align_arg_pointer` hunk. There's really no good reason 
for the change, except that 'we had better let the attribute do something.'


--
Best regards,
LIU Hao


OpenPGP_signature.asc
Description: OpenPGP digital signature


[PATCH v3 2/2] RISC-V: Support RISC-V Profiles 23.

2025-05-09 Thread Jiawei
This patch introduces support for RISC-V Profiles RV23A and RV23B [1],
enabling developers to utilize these profiles through the -march option.

[1] 
https://github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-v0.7-ratification-vote

Version log:
Update the testcases, using lowercase letter.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New profile.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-52.c: New test.
* gcc.target/riscv/arch-53.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc  | 16 
 gcc/testsuite/gcc.target/riscv/arch-52.c | 11 +++
 gcc/testsuite/gcc.target/riscv/arch-53.c | 10 ++
 3 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-52.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-53.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index fcf694dba..4c8e9244f 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -527,6 +527,22 @@ static const riscv_profiles riscv_profiles_table[] =
"_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
"_zicboz_zfhmin_zkt"},
 
+  /* RVA23 contains all mandatory base ISA for RVA22U64 and the new extension
+ 'v,zihintntl,zvfhmin,zvbb,zvkt,zicond,zimop,zcmop,zfa,zawrs' as mandatory
+ extensions.  */
+  {"rva23u64", "rv64imafdcv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb"
+   "_zfa_zawrs"},
+
+  /* RVB23 contains all mandatory base ISA for RVA22U64 and the new extension
+ 'zihintntl,zicond,zimop,zcmop,zfa,zawrs' as mandatory
+ extensions.  */
+  {"rvb23u64", "rv64imafdc_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt_zihintntl_zicond_zimop_zcmop_zcb"
+   "_zfa_zawrs"},
+
   /* Currently we do not define S/M mode Profiles in gcc part.  */
 
   /* Terminate the list.  */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-52.c 
b/gcc/testsuite/gcc.target/riscv/arch-52.c
new file mode 100644
index 0..8210978ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-52.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rva23u64 -mabi=lp64d" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0"
+"_b1p0_v1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0"
+_ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0"
+_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0"
+_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0"
+_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0\"" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-53.c 
b/gcc/testsuite/gcc.target/riscv/arch-53.c
new file mode 100644
index 0..6d242dfba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-53.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rvb23u64 -mabi=lp64d" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0"
+"_b1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0_ziccrse1p0"
+"_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0_za64rs1p0"
+"_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0_zba1p0"
+"_zbb1p0_zbs1p0_zkt1p0\"" } } */
-- 
2.43.0



Re: [PATCH] Printf properly on systems without %zu [PR120086]

2025-05-09 Thread Jørgen Kvalsvik

On 5/6/25 13:36, Jakub Jelinek wrote:

On Tue, May 06, 2025 at 01:28:16PM +0200, Jørgen Kvalsvik wrote:

Mostly because it would make the print more noisy, and because by the time
we have 4 billion prime paths, all systems would probably already have been
crushed under the load of computing them.

I'm happy to change to fmt_size_t everywhere, of course, but the use of
size_t for pathno was my own automatic default type.


Ok either way.

Jakub



Pushed.

Thanks,
Jørgen


Re: [PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Richard Biener
On Fri, 9 May 2025, Jakub Jelinek wrote:

> On Fri, May 09, 2025 at 09:34:14AM +0200, Richard Biener wrote:
> > > Perhaps better CONST_SCALAR_INT_P instead of CONST_INT_P?
> > 
> > Do we ever get a wide_int for Pmode/ptr_mode?  But sure, I can
> 
> Most likely not.  Only if we start supporting > 64-bit pointers.

Both variants passed bootstrap and regtest, which one should I push?

Richard.


[committed] libgomp.{c, fortran}/interop-{hip, cuda}: Fix dg-run target selection

2025-05-09 Thread Tobias Burnus
Committed asr16-497-g94e63410474a36. It turned out that the interop HIP/CUDA 
testcases could FAIL in the following less common case: The CUDA or HIP 
runtime is installed but the default device is not an Nvidia or AMD 
device, respectively. With the commit, the 'dg-do run' is downgraded to 
a 'dg-do link' if the default device is not the expected GPU type. Note 
that the cuBLAS/hipBLAS tests do not need this as they already have a 
runtime test for the GPU type and iterate all devices (including the 
host), using a fallback if the vendor lib is not supported for a device. 
Thanks to Sandra for reporting the issue! Tobias
commit 94e63410474a36655e1800387eabd73a6f930048
Author: Tobias Burnus 
Date:   Fri May 9 10:57:44 2025 +0200

libgomp.{c,fortran}/interop-{hip,cuda}: Fix dg-run target selection

While the tests checked whether the CUDA/HIP runtime is available
before processing them, the execution was then done unconditionally,
leading to FAIL when the default device was the host (or the wrong
offload device).

Now the test is only executed ('run') when the default device is an
Nvidia or AMD GPU (depending on the test case, cf. the test file name).
Otherwise, only a 'link' test is done. (Except when the effective-target
check cannot find the runtime lib - then the test is skipped [as before].)

Note: The cublas/hipblas tests use variant functions and iterate over
all devices, such that the cublas or hipblas, respectively, is only
called when the active device is an AMD or Nvidia device, respectively,
while for the host and other device types the fallback is called.

libgomp/ChangeLog:

* testsuite/libgomp.c/interop-cuda-full.c: Use 'link' instead
of 'run' when the default device is "! offload_device_nvptx".
* testsuite/libgomp.c/interop-cuda-libonly.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: Likewise.
* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: Likewise.
* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: Likewise.
* testsuite/libgomp.c/interop-hip-amd-full.c: Use 'link' instead
of 'run' when the default device is "! offload_device_gcn".
* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: Likewise.
* testsuite/libgomp.fortran/interop-hip-amd-full.F90: Likewise.
* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: Likewise.
---
 libgomp/testsuite/libgomp.c/interop-cuda-full.c| 3 +++
 libgomp/testsuite/libgomp.c/interop-cuda-libonly.c | 3 +++
 libgomp/testsuite/libgomp.c/interop-hip-amd-full.c | 3 +++
 libgomp/testsuite/libgomp.c/interop-hip-amd-no-hip-header.c| 3 +++
 libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c  | 3 +++
 libgomp/testsuite/libgomp.c/interop-hip-nvidia-no-headers.c| 3 +++
 libgomp/testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c | 3 +++
 libgomp/testsuite/libgomp.fortran/interop-hip-amd-full.F90 | 3 +++
 libgomp/testsuite/libgomp.fortran/interop-hip-amd-no-module.F90| 3 +++
 libgomp/testsuite/libgomp.fortran/interop-hip-nvidia-full.F90  | 3 +++
 libgomp/testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90 | 3 +++
 11 files changed, 33 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/interop-cuda-full.c b/libgomp/testsuite/libgomp.c/interop-cuda-full.c
index 38aa6b130bb..c48a934978d 100644
--- a/libgomp/testsuite/libgomp.c/interop-cuda-full.c
+++ b/libgomp/testsuite/libgomp.c/interop-cuda-full.c
@@ -1,3 +1,6 @@
+/* { dg-do run { target { offload_device_nvptx } } } */
+/* { dg-do link { target { ! offload_device_nvptx } } } */
+
 /* { dg-require-effective-target openacc_cuda } */
 /* { dg-require-effective-target openacc_cudart } */
 /* { dg-additional-options "-lcuda -lcudart" } */
diff --git a/libgomp/testsuite/libgomp.c/interop-cuda-libonly.c b/libgomp/testsuite/libgomp.c/interop-cuda-libonly.c
index 17cbb159183..bc257a24ee8 100644
--- a/libgomp/testsuite/libgomp.c/interop-cuda-libonly.c
+++ b/libgomp/testsuite/libgomp.c/interop-cuda-libonly.c
@@ -1,3 +1,6 @@
+/* { dg-do run { target { offload_device_nvptx } } } */
+/* { dg-do link { target { ! offload_device_nvptx } } } */
+
 /* { dg-require-effective-target openacc_libcudart } */
 /* { dg-require-effective-target openacc_libcuda } */
 /* { dg-additional-options "-lcuda -lcudart" } */
diff --git a/libgomp/testsuite/libgomp.c/interop-hip-amd-full.c b/libgomp/testsuite/libgomp.c/interop-hip-amd-full.c
index d7725fc8e34..bd44f442210 100644
--- a/libgomp/testsuite/libgomp.c/interop-hip-amd-full.c
+++ b/libgomp/testsuite/libgomp.c/interop-hip-amd-full.c
@@ -1,3 +1,6 @@
+/* { dg-do run { target { offload_devic

Re: [PATCH] Extend vect_recog_cond_expr_convert_pattern to handle floating point type.

2025-05-09 Thread Richard Biener
On Mon, Apr 28, 2025 at 3:30 PM liuhongt  wrote:
>
> For floating point, !flag_trapping_math is needed for the pattern which
> transforms 2 conversions to 1 conversion, and may lose 1 potential trap.
> There shouldn't be any accuracy issue.
>
> It also handles real_cst if it can be represented in different floating point
> types without loss of precision.
>
> Bootstrapp and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/103771
> * match.pd (cond_expr_convert_p): Extend the match to handle
> scalar floating point type.
> * tree-vect-patterns.cc
> (vect_recog_cond_expr_convert_pattern): Handle floating point
> type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103771-4.c: New test.
> ---
>  gcc/match.pd   |  55 -
>  gcc/testsuite/gcc.target/i386/pr103771-4.c | 132 +
>  gcc/tree-vect-patterns.cc  |  41 +--
>  3 files changed, 214 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-4.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ba036e52837..6f49f66aed2 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -11290,15 +11290,18 @@ and,
>
>  (match (cond_expr_convert_p @0 @2 @3 @6)
>   (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3))
> -  (if (INTEGRAL_TYPE_P (type)
> -   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> -   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> -   && INTEGRAL_TYPE_P (TREE_TYPE (@3))
> +  (if ((INTEGRAL_TYPE_P (type)
> +   || (!flag_trapping_math && SCALAR_FLOAT_TYPE_P (type)))
> +   && ((INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@3)))
> +   || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> +  && TREE_TYPE (@2) == TREE_TYPE (@3)))
> && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0))
> && TYPE_PRECISION (TREE_TYPE (@0))
>   == TYPE_PRECISION (TREE_TYPE (@2))
> && TYPE_PRECISION (TREE_TYPE (@0))
>   == TYPE_PRECISION (TREE_TYPE (@3))
> +   && tree_nop_conversion_p (TREE_TYPE (@2), TREE_TYPE (@3))
> /* For vect_recog_cond_expr_convert_pattern, @2 and @3 can differ in
>   signess when convert is truncation, but not ok for extension since
>   it's sign_extend vs zero_extend.  */

Can you instead of mangling in float support use separate (match like
for the below cases?

> @@ -11308,6 +11311,50 @@ and,
> && single_use (@4)
> && single_use (@5
>
> +(match (cond_expr_convert_p @0 @2 @3 @6)
> + (cond (simple_comparison@6 @0 @1) (float@4 @2) (float@5 @3))
> +  (if (SCALAR_FLOAT_TYPE_P (type) && !flag_trapping_math
> +   && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0))

so this fails to constrain the comparison types (above we check
INTEGRAL_TYPE_P),
if it happens to be a vector type using TYPE_PRECISION will ICE.

I think the main intent of the vectorizer pattern is to match up the
_size_ of the
vector elements, so maybe re-formulate the constraint this way with
operand_equal_p (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0)))

This is also because precision on floats is not equal to the number of bits in
the mode.

> +   && TYPE_PRECISION (TREE_TYPE (@0))
> + == TYPE_PRECISION (TREE_TYPE (@2))
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +   && TREE_TYPE (@2) == TREE_TYPE (@3)
> +   && single_use (@4)
> +   && single_use (@5
> +
> +(match (cond_expr_convert_p @0 @2 @3 @6)
> + (cond (simple_comparison@6 @0 @1) (fix_trunc@4 @2) (fix_trunc@5 @3))
> +  (if (INTEGRAL_TYPE_P (type) && !flag_trapping_math
> +   && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0))
> +   && TYPE_PRECISION (TREE_TYPE (@0))
> + == TYPE_PRECISION (TREE_TYPE (@2))
> +   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> +   && TREE_TYPE (@2) == TREE_TYPE (@3)

Please use types_match () instead of TREE_TYPE pointer compares.

> +   && single_use (@4)
> +   && single_use (@5
> +
> +(match (cond_expr_convert_p @0 @2 @3 @6)
> + (cond (simple_comparison@6 @0 @1) (REAL_CST@2) (convert@5 @3))

I think the same issue exists for INTEGER_CSTs.

> +  (if ((INTEGRAL_TYPE_P (type)
> +   || (!flag_trapping_math && SCALAR_FLOAT_TYPE_P (type)))
> +   && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0))
> +   && TYPE_PRECISION (TREE_TYPE (@0))
> + == TYPE_PRECISION (TREE_TYPE (@3))
> +   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@3))
> +   && single_use (@5)
> +   && const_unop (CONVERT_EXPR, TREE_TYPE (@3), @2

I'm not sure this is a good check?  Say, for type == double and
typeof(@3) == float
the REAL_CST can have extra precision that you'd drop when rewriting this as
(double)(cnd ? (float)@2 : @3).  You'd need to check the REAL_CST is exactly
representable in the type of @3.  Same for a possible integer case.  Same for
handling fix

Re: [PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Jakub Jelinek
On Fri, May 09, 2025 at 11:01:58AM +0200, Richard Biener wrote:
> On Fri, 9 May 2025, Jakub Jelinek wrote:
> 
> > On Fri, May 09, 2025 at 09:34:14AM +0200, Richard Biener wrote:
> > > > Perhaps better CONST_SCALAR_INT_P instead of CONST_INT_P?
> > > 
> > > Do we ever get a wide_int for Pmode/ptr_mode?  But sure, I can
> > 
> > Most likely not.  Only if we start supporting > 64-bit pointers.
> 
> Both variants passed bootstrap and regtest, which one should I push?

I'd go with CONST_SCALAR_INT_P, in theory we need to handle that way all
scalar integers, so it feels more correct even when just CONST_INT_P is
needed now.

Jakub



Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 8 May 2025, Pengfei Li wrote:
>
>> This patch improves the auto-vectorization for loops with known small
>> trip counts by enabling the use of subvectors - bit fields of original
>> wider vectors. A subvector must have the same vector element type as the
>> original vector and enough bits for all vector elements to be processed
>> in the loop. Using subvectors is beneficial because machine instructions
>> operating on narrower vectors usually show better performance.
>> 
>> To enable this optimization, this patch introduces a new target hook.
>> This hook allows the vectorizer to query the backend for a suitable
>> subvector type given the original vector type and the number of elements
>> to be processed in the small-trip-count loop. The target hook also has a
>> could_trap parameter to say if the subvector is allowed to have more
>> bits than needed.
>> 
>> This optimization is currently enabled for AArch64 only. Below example
>> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
>> higher instruction throughput.
>> 
>> Consider this loop operating on an array of 16-bit integers:
>> 
>>  for (int i = 0; i < 5; i++) {
>>a[i] = a[i] < 0 ? -a[i] : a[i];
>>  }
>> 
>> Before this patch, the generated AArch64 code would be:
>> 
>>  ptrue   p7.h, vl5
>>  ptrue   p6.b, all
>>  ld1hz31.h, p7/z, [x0]
>>  abs z31.h, p6/m, z31.h
>>  st1hz31.h, p7, [x0]
>
> p6.b has all lanes active - why is the abs then not
> simply unmasked?

There is no unpredicated abs for SVE.  The predicate has to be there,
and so expand introduces one even when the gimple stmt is unconditional.

>> After this patch, it is optimized to:
>> 
>>  ptrue   p7.h, vl5
>>  ld1hz31.h, p7/z, [x0]
>>  abs v31.8h, v31.8h
>>  st1hz31.h, p7, [x0]
>
> Help me decipher this - I suppose z31 and v31 "overlap" in the
> register file?  And z31 is a variable-length vector but
> z31.8h is a 8 element fixed length vector?  How can we

v31.8h, but otherwise yes.

> end up with just 8 elements here?  From the upper interation
> bound?

Yeah.

> I'm not sure why you need any target hook here.  It seems you
> do already have suitable vector modes so why not just ask
> for a suitable vector?  Is it because you need to have
> that register overlap guarantee (otherwise you'd get
> a move)?

Yeah, the optimisation only makes sense for overlaid vector registers.

> Why do we not simply use fixed-length SVE here in the first place?

Fixed-length SVE is restricted to cases where the exact runtime length
is known: the compile-time length is both a minimum and a maximum.
In contrast, the code above would work even for 256-bit SVE.

> To me doing this in this way in the vectorizer looks
> somewhat out-of-place.
>
> That said, we already have unmasked ABS in the IL:
>
>   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
>   vect__2.7_16 = ABSU_EXPR ;
>   vect__3.8_17 = VIEW_CONVERT_EXPR(vect__2.7_16);
>   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, ... }, vect__3.8_17); [tail call]
>
> so what's missing here?  I suppose having a constant masked ABSU here
> would allow RTL expansion to select a fixed-size mode?
>
> And the vectorizer could simply use the existing
> related_vector_mode hook instead?

I agree it's a bit awkward.  The problem is that we want conflicting
things.  On the one hand, it would make conceptual sense to use SVE
instructions to provide conditional optabs for Advanced SIMD vector modes.
E.g. SVE's LD1W could act as a predicated load for an Advanced SIMD
int32x4_t vector.  The main problem with that is that Advanced SIMD's
native boolean vector type is an integer vector of 0s and -1s, rather
than an SVE predicate.  For some (native Advanced SIMD) operations we'd
want one type of boolean, for some (SVE emulating Advanced SIMD)
operations we'd want the other type of boolean.

The patch goes the other way and treats using Advanced SIMD as an
optimisation for SVE loops.

related_vector_mode suffers from the same problem.  If we ask for a
vector mode of >=5 halfwords for a load or store, we want the SVE mode,
since that can be conditional on an SVE predicate.  But if we ask for
a vector mode of >=5 halfwords for an integer absolute operation,
we want the Advanced SIMD mode.  So I suppose the new hook is effectively
providing context.  Perhaps we could do that using an extra parameter to
related_vector_mode, if that seems better.

It's somewhat difficult to recover this information after vectorisation,
since like you say, the statements are often unconditional and operate
on all lanes.

Thanks,
Richard


Re: [PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Richard Biener
On Fri, 9 May 2025, Jakub Jelinek wrote:

> On Fri, May 09, 2025 at 11:01:58AM +0200, Richard Biener wrote:
> > On Fri, 9 May 2025, Jakub Jelinek wrote:
> > 
> > > On Fri, May 09, 2025 at 09:34:14AM +0200, Richard Biener wrote:
> > > > > Perhaps better CONST_SCALAR_INT_P instead of CONST_INT_P?
> > > > 
> > > > Do we ever get a wide_int for Pmode/ptr_mode?  But sure, I can
> > > 
> > > Most likely not.  Only if we start supporting > 64-bit pointers.
> > 
> > Both variants passed bootstrap and regtest, which one should I push?
> 
> I'd go with CONST_SCALAR_INT_P, in theory we need to handle that way all
> scalar integers, so it feels more correct even when just CONST_INT_P is
> needed now.

Pushed.

Thanks,
Richard.


[PATCH 3/3] Remove non-SLP path from vectorizable_operation

2025-05-09 Thread Richard Biener
This removes the non-SLP path from vectorizable_operation and folds
away ncopies, replaces STMT_VINFO_VECTYPE with SLP_TREE_VECTYPE
and removes a big comment that's inaccurate in many details since
a long time.  It does not get rid of the 'vec_stmt' argument
since splitting the function into analysis and transform would
require storing analysis results somewhere which should be done
separately.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vectorizable_operation): Remve non-SLP
path.
---
 gcc/tree-vect-stmts.cc | 84 ++
 1 file changed, 12 insertions(+), 72 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index bfbece56464..ae9644ad278 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6811,7 +6811,6 @@ vectorizable_operation (vec_info *vinfo,
   poly_uint64 nunits_in;
   poly_uint64 nunits_out;
   tree vectype_out;
-  unsigned int ncopies;
   int vec_num;
   int i;
   vec vec_oprnds0 = vNULL;
@@ -6872,7 +6871,7 @@ vectorizable_operation (vec_info *vinfo,
 }
 
   scalar_dest = gimple_assign_lhs (stmt);
-  vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  vectype_out = SLP_TREE_VECTYPE (slp_node);
 
   /* Most operations cannot handle bit-precision types without extra
  truncations.  */
@@ -6983,13 +6982,9 @@ vectorizable_operation (vec_info *vinfo,
 }
 
   /* Multiple types in SLP are handled by creating the appropriate number of
- vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
- case of SLP.  */
-  ncopies = 1;
+ vectorized stmts for each SLP node.  */
   vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
 
-  gcc_assert (ncopies >= 1);
-
   /* Reject attempts to combine mask types with nonmask types, e.g. if
  we have an AND between a (nonmask) boolean loaded from memory and
  a (mask) boolean result of a comparison.
@@ -7072,12 +7067,12 @@ vectorizable_operation (vec_info *vinfo,
  if (cond_len_fn != IFN_LAST
  && direct_internal_fn_supported_p (cond_len_fn, vectype,
 OPTIMIZE_FOR_SPEED))
-   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, vectype,
+   vect_record_loop_len (loop_vinfo, lens, vec_num, vectype,
  1);
  else if (cond_fn != IFN_LAST
   && direct_internal_fn_supported_p (cond_fn, vectype,
  OPTIMIZE_FOR_SPEED))
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
+   vect_record_loop_mask (loop_vinfo, masks, vec_num,
   vectype, NULL);
  else
{
@@ -7103,7 +7098,7 @@ vectorizable_operation (vec_info *vinfo,
   STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
   DUMP_VECT_SCOPE ("vectorizable_operation");
   vect_model_simple_cost (vinfo, stmt_info,
- ncopies, dt, ndts, slp_node, cost_vec);
+ 1, dt, ndts, slp_node, cost_vec);
   if (using_emulated_vectors_p)
{
  /* The above vect_model_simple_cost call handles constants
@@ -7163,60 +7158,7 @@ vectorizable_operation (vec_info *vinfo,
   else
 vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
-  /* In case the vectorization factor (VF) is bigger than the number
- of elements that we can fit in a vectype (nunits), we have to generate
- more than one vector stmt - i.e - we need to "unroll" the
- vector stmt by a factor VF/nunits.  In doing so, we record a pointer
- from one copy of the vector stmt to the next, in the field
- STMT_VINFO_RELATED_STMT.  This is necessary in order to allow following
- stages to find the correct vector defs to be used when vectorizing
- stmts that use the defs of the current stmt.  The example below
- illustrates the vectorization process when VF=16 and nunits=4 (i.e.,
- we need to create 4 vectorized stmts):
-
- before vectorization:
-RELATED_STMTVEC_STMT
-S1: x = memref  -   -
-S2: z = x + 1   -   -
-
- step 1: vectorize stmt S1 (done in vectorizable_load. See more details
- there):
-RELATED_STMTVEC_STMT
-VS1_0:  vx0 = memref0   VS1_1   -
-VS1_1:  vx1 = memref1   VS1_2   -
-VS1_2:  vx2 = memref2   VS1_3   -
-VS1_3:  vx3 = memref3   -   -
-S1: x = load-   VS1_0
-S2: z = x + 1   -   -
-
- step2: vectorize stmt S2 (done here):
-To vectorize stmt S2 we first need to find the relevant vector
-def for the first operand 'x'.  This is, as usual, obtained from
-the vector stmt recorded in the STMT_VINFO_VEC_STMT of the stmt

[PATCH 1/3] Remove non-SLP path from vectorizable_operation

2025-05-09 Thread Richard Biener
Step 1

---
 gcc/tree-vect-stmts.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3373d75a8ae..7a6a10ee0e6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6985,7 +6985,7 @@ vectorizable_operation (vec_info *vinfo,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (slp_node)
+  if (1)
 {
   ncopies = 1;
   vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
@@ -7098,7 +7098,7 @@ vectorizable_operation (vec_info *vinfo,
}
 
   /* Put types on constant and invariant SLP children.  */
-  if (slp_node
+  if (1
  && (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
  || !vect_maybe_update_slp_op_vectype (slp_op1, vectype)
  || !vect_maybe_update_slp_op_vectype (slp_op2, vectype)))
@@ -7120,7 +7120,7 @@ vectorizable_operation (vec_info *vinfo,
 vector stmt.  See below for the actual lowering that will
 be applied.  */
  unsigned n
-   = slp_node ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies;
+   = 1 ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies;
  switch (code)
{
case PLUS_EXPR:
@@ -7428,13 +7428,13 @@ vectorizable_operation (vec_info *vinfo,
   new_stmt, gsi);
}
 
-  if (slp_node)
+  if (1)
slp_node->push_vec_def (new_stmt);
   else
STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
 }
 
-  if (!slp_node)
+  if (0)
 *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
 
   vec_oprnds0.release ();
-- 
2.43.0



Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Biener
On Fri, 9 May 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Thu, 8 May 2025, Pengfei Li wrote:
> >
> >> This patch improves the auto-vectorization for loops with known small
> >> trip counts by enabling the use of subvectors - bit fields of original
> >> wider vectors. A subvector must have the same vector element type as the
> >> original vector and enough bits for all vector elements to be processed
> >> in the loop. Using subvectors is beneficial because machine instructions
> >> operating on narrower vectors usually show better performance.
> >> 
> >> To enable this optimization, this patch introduces a new target hook.
> >> This hook allows the vectorizer to query the backend for a suitable
> >> subvector type given the original vector type and the number of elements
> >> to be processed in the small-trip-count loop. The target hook also has a
> >> could_trap parameter to say if the subvector is allowed to have more
> >> bits than needed.
> >> 
> >> This optimization is currently enabled for AArch64 only. Below example
> >> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
> >> higher instruction throughput.
> >> 
> >> Consider this loop operating on an array of 16-bit integers:
> >> 
> >>for (int i = 0; i < 5; i++) {
> >>  a[i] = a[i] < 0 ? -a[i] : a[i];
> >>}
> >> 
> >> Before this patch, the generated AArch64 code would be:
> >> 
> >>ptrue   p7.h, vl5
> >>ptrue   p6.b, all
> >>ld1hz31.h, p7/z, [x0]
> >>abs z31.h, p6/m, z31.h
> >>st1hz31.h, p7, [x0]
> >
> > p6.b has all lanes active - why is the abs then not
> > simply unmasked?
> 
> There is no unpredicated abs for SVE.  The predicate has to be there,
> and so expand introduces one even when the gimple stmt is unconditional.
> 
> >> After this patch, it is optimized to:
> >> 
> >>ptrue   p7.h, vl5
> >>ld1hz31.h, p7/z, [x0]
> >>abs v31.8h, v31.8h
> >>st1hz31.h, p7, [x0]
> >
> > Help me decipher this - I suppose z31 and v31 "overlap" in the
> > register file?  And z31 is a variable-length vector but
> > z31.8h is a 8 element fixed length vector?  How can we
> 
> v31.8h, but otherwise yes.
> 
> > end up with just 8 elements here?  From the upper interation
> > bound?
> 
> Yeah.
> 
> > I'm not sure why you need any target hook here.  It seems you
> > do already have suitable vector modes so why not just ask
> > for a suitable vector?  Is it because you need to have
> > that register overlap guarantee (otherwise you'd get
> > a move)?
> 
> Yeah, the optimisation only makes sense for overlaid vector registers.
> 
> > Why do we not simply use fixed-length SVE here in the first place?
> 
> Fixed-length SVE is restricted to cases where the exact runtime length
> is known: the compile-time length is both a minimum and a maximum.
> In contrast, the code above would work even for 256-bit SVE.
>
> > To me doing this in this way in the vectorizer looks
> > somewhat out-of-place.
> >
> > That said, we already have unmasked ABS in the IL:
> >
> >   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 
> > 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
> >   vect__2.7_16 = ABSU_EXPR ;
> >   vect__3.8_17 = VIEW_CONVERT_EXPR(vect__2.7_16);
> >   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> > 0, 0, ... }, vect__3.8_17); [tail call]
> >
> > so what's missing here?  I suppose having a constant masked ABSU here
> > would allow RTL expansion to select a fixed-size mode?
> >
> > And the vectorizer could simply use the existing
> > related_vector_mode hook instead?
> 
> I agree it's a bit awkward.  The problem is that we want conflicting
> things.  On the one hand, it would make conceptual sense to use SVE
> instructions to provide conditional optabs for Advanced SIMD vector modes.
> E.g. SVE's LD1W could act as a predicated load for an Advanced SIMD
> int32x4_t vector.  The main problem with that is that Advanced SIMD's
> native boolean vector type is an integer vector of 0s and -1s, rather
> than an SVE predicate.  For some (native Advanced SIMD) operations we'd
> want one type of boolean, for some (SVE emulating Advanced SIMD)
> operations we'd want the other type of boolean.
> 
> The patch goes the other way and treats using Advanced SIMD as an
> optimisation for SVE loops.
> 
> related_vector_mode suffers from the same problem.  If we ask for a
> vector mode of >=5 halfwords for a load or store, we want the SVE mode,
> since that can be conditional on an SVE predicate.  But if we ask for
> a vector mode of >=5 halfwords for an integer absolute operation,
> we want the Advanced SIMD mode.  So I suppose the new hook is effectively
> providing context.  Perhaps we could do that using an extra parameter to
> related_vector_mode, if that seems better.
> 
> It's somewhat difficult to recover this information after vectorisation,
> since like you say, the statements are often unconditional and operate
> on all lanes.

So it seems 

Re: [PATCH 6/7] OpenMP: C front end support for "begin declare variant"

2025-05-09 Thread Tobias Burnus

Hi all, hi Sandra,

I am wondering whether such an example (see attachment) should also be 
added, confirming that always the correct function is called.


(It mixes base/variant calls in various ways, also thanks to 'dispatch 
novariants(true)'.)


I expected it to pass – as the code should fall back to the existing 
variant handling – and, indeed, it does work, but still, it might be 
useful to have it as testcase.


Tobias
/* Check that the correct function is used;
   assumes that vendor(gnu) is always true.  */

int inner() { return 1; }

int outer(int is_novar) {
  int k;
  if (!is_novar) __builtin_abort();

  k = inner();
  if (k != 22) __builtin_abort();

  #pragma omp dispatch novariants(1)
k = inner();
  if (k != 1) __builtin_abort();
  return 3;
}

#pragma omp begin declare variant match(implementation={vendor(gnu)})
int outer(int is_novar) {
  int k;
  if (is_novar) __builtin_abort();

  k = inner();
  if (k != 22) __builtin_abort();

  #pragma omp dispatch novariants(1)
k = inner();
  if (k != 1) __builtin_abort();
  return 44;
}

int inner() { return 22; }
#pragma omp end declare variant

int
main()
{
  int j;
  j = outer(0);
  if (j != 44) __builtin_abort();

  #pragma omp dispatch novariants(1)
j = outer(1);
  if (j != 3) __builtin_abort();
  return 0;
}


Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Biener
On Thu, 8 May 2025, Pengfei Li wrote:

> This patch improves the auto-vectorization for loops with known small
> trip counts by enabling the use of subvectors - bit fields of original
> wider vectors. A subvector must have the same vector element type as the
> original vector and enough bits for all vector elements to be processed
> in the loop. Using subvectors is beneficial because machine instructions
> operating on narrower vectors usually show better performance.
> 
> To enable this optimization, this patch introduces a new target hook.
> This hook allows the vectorizer to query the backend for a suitable
> subvector type given the original vector type and the number of elements
> to be processed in the small-trip-count loop. The target hook also has a
> could_trap parameter to say if the subvector is allowed to have more
> bits than needed.
> 
> This optimization is currently enabled for AArch64 only. Below example
> shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
> higher instruction throughput.
> 
> Consider this loop operating on an array of 16-bit integers:
> 
>   for (int i = 0; i < 5; i++) {
> a[i] = a[i] < 0 ? -a[i] : a[i];
>   }
> 
> Before this patch, the generated AArch64 code would be:
> 
>   ptrue   p7.h, vl5
>   ptrue   p6.b, all
>   ld1hz31.h, p7/z, [x0]
>   abs z31.h, p6/m, z31.h
>   st1hz31.h, p7, [x0]

p6.b has all lanes active - why is the abs then not
simply unmasked?

> After this patch, it is optimized to:
> 
>   ptrue   p7.h, vl5
>   ld1hz31.h, p7/z, [x0]
>   abs v31.8h, v31.8h
>   st1hz31.h, p7, [x0]

Help me decipher this - I suppose z31 and v31 "overlap" in the
register file?  And z31 is a variable-length vector but
z31.8h is a 8 element fixed length vector?  How can we
end up with just 8 elements here?  From the upper interation
bound?

I'm not sure why you need any target hook here.  It seems you
do already have suitable vector modes so why not just ask
for a suitable vector?  Is it because you need to have
that register overlap guarantee (otherwise you'd get
a move)?  Why do we not simply use fixed-length SVE here
in the first place?

To me doing this in this way in the vectorizer looks
somewhat out-of-place.

That said, we already have unmasked ABS in the IL:

  vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, ... }, { 0, ... });
  vect__2.7_16 = ABSU_EXPR ;
  vect__3.8_17 = VIEW_CONVERT_EXPR(vect__2.7_16);
  .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, ... }, vect__3.8_17); [tail call]

so what's missing here?  I suppose having a constant masked ABSU here
would allow RTL expansion to select a fixed-size mode?

And the vectorizer could simply use the existing
related_vector_mode hook instead?


> This patch also helps eliminate the ptrue in the case.
> 
> Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.cc (aarch64_find_subvector_type):
>   Implement target hook for finding subvectors for AArch64.
>   * doc/tm.texi: Document the new target hook.
>   * doc/tm.texi.in: Document the new target hook.
>   * expmed.cc (extract_bit_field_as_subreg): Support expanding
>   BIT_FIELD_REF for subvector types to SUBREG in RTL.
>   * match.pd: Prevent simplification of BIT_FIELD_REF for
>   subvector types to VIEW_CONVERT.
>   * target.def: New target hook definition.
>   * targhooks.cc (default_vectorize_find_subvector_type): Provide
>   default implementation for the target hook.
>   * tree-cfg.cc (verify_types_in_gimple_reference): Update GIMPLE
>   verification for BIT_FIELD_REF used for subvectors.
>   * tree-vect-stmts.cc (vectorizable_operation): Output vectorized
>   GIMPLE with subvector types.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/cond_unary_6.c: Adjust loop trip counts
>   to avoid triggering this new optimization.
>   * gcc.target/aarch64/vect-subvector-1.c: New test.
>   * gcc.target/aarch64/vect-subvector-2.c: New test.
> ---
>  gcc/config/aarch64/aarch64.cc | 39 
>  gcc/doc/tm.texi   | 12 +++
>  gcc/doc/tm.texi.in|  2 +
>  gcc/expmed.cc |  5 +-
>  gcc/match.pd  |  3 +-
>  gcc/target.def| 17 
>  gcc/targhooks.cc  |  8 ++
>  gcc/targhooks.h   |  3 +
>  .../gcc.target/aarch64/sve/cond_unary_6.c |  4 +-
>  .../gcc.target/aarch64/vect-subvector-1.c | 28 ++
>  .../gcc.target/aarch64/vect-subvector-2.c | 28 ++
>  gcc/tree-cfg.cc   |  8 ++
>  gcc/tree-vect-stmts.cc| 90 ++-
>  13 files changed, 240 insertions(+), 7 deletions(-

Re: [PATCH] AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.

2025-05-09 Thread Jennifer Schmitz


> On 8 May 2025, at 12:28, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Sorry for the slow review.
> 
> Jennifer Schmitz  writes:
>> SVE loads and stores where the predicate is all-true can be optimized to
>> unpredicated instructions. For example,
>> svuint8_t foo (uint8_t *x)
>> {
>>  return svld1 (svptrue_b8 (), x);
>> }
>> was compiled to:
>> foo:
>>  ptrue   p3.b, all
>>  ld1bz0.b, p3/z, [x0]
>>  ret
>> but can be compiled to:
>> foo:
>>  ldr z0, [x0]
>>  ret
>> 
>> Late_combine2 had already been trying to do this, but was missing the
>> instruction:
>> (set (reg/i:VNx16QI 32 v0)
>>(unspec:VNx16QI [
>>(const_vector:VNx16BI repeat [
>>(const_int 1 [0x1])
>>])
>>(mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106])
>>[0 MEM  [(unsigned char *)x_2(D)]+0 S[16, 16] A8])
>>] UNSPEC_PRED_X))
>> 
>> This patch adds a new define_insn_and_split that matches the missing
>> instruction and splits it to an unpredicated load/store. Because LDR
>> offers fewer addressing modes than LD1[BHWD], the pattern is
>> guarded under reload_completed to only apply the transform once the
>> address modes have been chosen during RA.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>  * config/aarch64/aarch64-sve.md (*aarch64_sve_ptrue_ldr_str):
>>  Add define_insn_and_split to fold predicated SVE loads/stores with
>>  ptrue predicates to unpredicated instructions.
>> 
>> gcc/testsuite/
>>  * gcc.target/aarch64/sve/ptrue_ldr_str.c: New test.
>>  * gcc.target/aarch64/sve/cost_model_14.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/cost_model_4.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/cost_model_5.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/cost_model_6.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/cost_model_7.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/peel_ind_2.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/single_1.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/single_2.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/single_3.c: Adjust expected outcome.
>>  * gcc.target/aarch64/sve/single_4.c: Adjust expected outcome.
>> ---
>> gcc/config/aarch64/aarch64-sve.md | 17 
>> .../aarch64/sve/acle/general/attributes_6.c   |  8 +-
>> .../gcc.target/aarch64/sve/cost_model_14.c|  4 +-
>> .../gcc.target/aarch64/sve/cost_model_4.c |  3 +-
>> .../gcc.target/aarch64/sve/cost_model_5.c |  3 +-
>> .../gcc.target/aarch64/sve/cost_model_6.c |  3 +-
>> .../gcc.target/aarch64/sve/cost_model_7.c |  3 +-
>> .../aarch64/sve/pcs/varargs_2_f16.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_f32.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_f64.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_mf8.c   | 32 +++
>> .../aarch64/sve/pcs/varargs_2_s16.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_s32.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_s64.c   | 93 +--
>> .../gcc.target/aarch64/sve/pcs/varargs_2_s8.c | 34 +++
>> .../aarch64/sve/pcs/varargs_2_u16.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_u32.c   | 93 +--
>> .../aarch64/sve/pcs/varargs_2_u64.c   | 93 +--
>> .../gcc.target/aarch64/sve/pcs/varargs_2_u8.c | 32 +++
>> .../gcc.target/aarch64/sve/peel_ind_2.c   |  4 +-
>> .../gcc.target/aarch64/sve/ptrue_ldr_str.c| 31 +++
>> .../gcc.target/aarch64/sve/single_1.c | 11 ++-
>> .../gcc.target/aarch64/sve/single_2.c | 11 ++-
>> .../gcc.target/aarch64/sve/single_3.c | 11 ++-
>> .../gcc.target/aarch64/sve/single_4.c | 1

Re: [PATCH] gimple-fold: Don't replace `{true/false} != false` with `true/false` inside GIMPLE_COND

2025-05-09 Thread Richard Biener
On Thu, May 8, 2025 at 9:05 PM Andrew Pinski  wrote:
>
> This is like the patch where we don't want to replace `bool_name != 0`
> with `bool_name` but for instead for INTEGER_CST. The only thing
> difference is there are a few different forms for always true/always
> false; only handle it if it was in the canonical form. A few new helpers are
> added for the canonical form detection.
>
> This also replaces the previous version of the patch which did an early
> exit from fold_stmt_1 instead so we can change the non-canonical form
> into a canonical in the end.

OK.

> gcc/ChangeLog:
>
> * gimple.h (gimple_cond_true_canonical_p): New function.
> (gimple_cond_false_canonical_p): New function.
> * gimple-fold.cc (replace_stmt_with_simplification): Return
> false if replacing the operands of GIMPLE_COND with an INTEGER_CST
> and already in canonical form.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-fold.cc | 15 +--
>  gcc/gimple.h   | 30 ++
>  2 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index f801e8b6d41..e63fd6f2f2f 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -6258,10 +6258,21 @@ replace_stmt_with_simplification 
> (gimple_stmt_iterator *gsi,
> }
>else if (code == INTEGER_CST)
> {
> + /* Make into the canonical form `1 != 0` and `0 != 0`.
> +If already in the canonical form return false
> +saying nothing has been done.  */
>   if (integer_zerop (ops[0]))
> -   gimple_cond_make_false (cond_stmt);
> +   {
> + if (gimple_cond_false_canonical_p (cond_stmt))
> +   return false;
> + gimple_cond_make_false (cond_stmt);
> +   }
>   else
> -   gimple_cond_make_true (cond_stmt);
> +   {
> + if (gimple_cond_true_canonical_p (cond_stmt))
> +   return false;
> + gimple_cond_make_true (cond_stmt);
> +   }
> }
>else if (!inplace)
> {
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 032365f3da2..977ff1c923c 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -3875,6 +3875,21 @@ gimple_cond_true_p (const gcond *gs)
>return false;
>  }
>
> +/* Check if conditional statement GS is in the caonical form of 'if (1 != 
> 0)'. */
> +
> +inline bool
> +gimple_cond_true_canonical_p (const gcond *gs)
> +{
> +  tree lhs = gimple_cond_lhs (gs);
> +  tree rhs = gimple_cond_rhs (gs);
> +  tree_code code = gimple_cond_code (gs);
> +  if (code == NE_EXPR
> +  && lhs == boolean_true_node
> +  && rhs == boolean_false_node)
> +return true;
> +  return false;
> +}
> +
>  /* Check if conditional statement GS is of the form 'if (1 != 1)',
> 'if (0 != 0)', 'if (1 == 0)' or 'if (0 == 1)' */
>
> @@ -3900,6 +3915,21 @@ gimple_cond_false_p (const gcond *gs)
>return false;
>  }
>
> +/* Check if conditional statement GS is in the caonical form of 'if (0 != 
> 0)'. */
> +
> +inline bool
> +gimple_cond_false_canonical_p (const gcond *gs)
> +{
> +  tree lhs = gimple_cond_lhs (gs);
> +  tree rhs = gimple_cond_rhs (gs);
> +  tree_code code = gimple_cond_code (gs);
> +  if (code == NE_EXPR
> +  && lhs == boolean_false_node
> +  && rhs == boolean_false_node)
> +return true;
> +  return false;
> +}
> +
>  /* Set the code, LHS and RHS of GIMPLE_COND STMT from CODE, LHS and RHS.  */
>
>  inline void
> --
> 2.43.0
>


Re: [PATCH] match: Remove (ne (cmp) 0) and (eq (cmp) 1) patterns

2025-05-09 Thread Richard Biener
On Fri, May 9, 2025 at 4:51 AM Andrew Pinski  wrote:
>
> These patterns are not needed any more. There were already
> 2 patterns which did `(ne bool_var 0)` into `bool_var` and
> `(eq bool_var 1)` into `bool_var`. Just they were after the
> pattern that did `(cmp (cond @0 @1 @2) @3)` simplification but
> that pattern is now after the ones.
> Also these patterns will cause in some cases a new statement to
> be created for the comparison. In the case of floating point comparison
> wiht non-call exceptions (and trapping math), can cause a new statement
> every time fold_stmt is called.

Hmm, but do we still fold

  _1 = _2 < 1;
  if (_1 != 0)

to

  if (_2 < 1)

or does that now again rely on forwprops explicit forwarding into
gcond?  I wanted
to get rid of the latter eventually.

I agree that the trapping math thing is bad - I wonder if we can catch that more
intelligently (not sure how without following SSA use-def of gconds on bools
and see whether they can trap and then not simplifying)

> gcc.dg/tree-ssa/vrp24.c needed to be adjusted to before 
> r13-322-g7f04b0d786e13f.
> gcc.dg/analyzer/null-deref-pr102671-2.c needs an increased 
> analyzer-max-svalue-depth
> not to get an extra warning.
>
> gcc/ChangeLog:
>
> * match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Remove.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/vrp24.c: Adjust.
> * gcc.dg/analyzer/null-deref-pr102671-2.c: Increase 
> analyzer-max-svalue-depth.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  | 8 
>  gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c | 2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +-
>  3 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ab496d923cc..418efc4230a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6898,14 +6898,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (ic == ncmp)
>   (ncmp @0 @1)
>   /* The following bits are handled by fold_binary_op_with_conditional_arg.  
> */
> - (simplify
> -  (ne (cmp@2 @0 @1) integer_zerop)
> -  (if (types_match (type, TREE_TYPE (@2)))
> -   (cmp @0 @1)))
> - (simplify
> -  (eq (cmp@2 @0 @1) integer_truep)
> -  (if (types_match (type, TREE_TYPE (@2)))
> -   (cmp @0 @1)))
>   (simplify
>(ne (cmp@2 @0 @1) integer_truep)
>(if (types_match (type, TREE_TYPE (@2)))
> diff --git a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c 
> b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> index 298e4839b98..bc141d5c028 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target ptr_eq_long } */
> -/* { dg-additional-options "-O2 -Wno-shift-count-overflow" } */
> +/* { dg-additional-options "-O2 -Wno-shift-count-overflow 
> --param=analyzer-max-svalue-depth=19" } */
>
>  struct lisp;
>  union vectorlike_header { long size; };
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> index c28ca473fc6..f237b7741ec 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp24.c
> @@ -89,5 +89,5 @@ L7:
> boolean operation.  */
>
>  /* { dg-final { scan-tree-dump-times "Simplified relational" 2 "evrp" } } */
> -/* { dg-final { scan-tree-dump-times "if " 3 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "if " 4 "optimized" } } */
>
> --
> 2.43.0
>


Re: [PATCH] RISC-V: Use vclmul for CRC expansion if available

2025-05-09 Thread Anton Blanchard
Hi Jeff,

> So this failed pre-commit CI:
>
> > https://github.com/ewlu/gcc-precommit-ci/issues/3301#issuecomment-2849962485
>
> Whatever failure that is needs to be fixed :-)

Sorry about that, I'll submit a new version with a fix to the test case.

Thanks,
Anton


[pushed] [PATCH] check_GNU_style: Remove literal prefix

2025-05-09 Thread Torbjörn SVENSSON
Hello,

Below patch is pushed as obvious (r16-490-g86a7642ef59).

Kind regards,
Torbjörn

--

The path "b/binutils/dwarf.c" should be printed as binutils/dwarf.c",
not "inutils/dwarf.c".

contrib/ChangeLog:

* check_GNU_style_lib.py: Remove literal prefix.

Signed-off-by: Torbjörn SVENSSON 
---
 contrib/check_GNU_style_lib.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/check_GNU_style_lib.py b/contrib/check_GNU_style_lib.py
index f680983d2f3..8b930ef6bdb 100755
--- a/contrib/check_GNU_style_lib.py
+++ b/contrib/check_GNU_style_lib.py
@@ -279,7 +279,9 @@ def check_GNU_style_file(file, format):
 patch = PatchSet(file)
 
 for pfile in patch.added_files + patch.modified_files:
-t = pfile.target_file.lstrip('b/')
+t = pfile.target_file
+if t.startswith('b/'):
+t = t[2:]
 # Skip testsuite files
 if 'testsuite' in t or t.endswith('.py'):
 continue
-- 
2.25.1



[PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Richard Biener
RTL DSE forms store groups from unique invariant bases but that is
confused when presented with constant addresses where it assigns
one store group per unique address.  That causes it to not consider
0x101:QI to alias 0x100:SI.  Constant accesses can really alias
to every object, in practice they appear for I/O and for access
to objects fixed via linker scripts for example.  So simply avoid
registering a store group for them.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?

Thanks,
Richard.

PR rtl-optimization/120182
* dse.cc (canon_address): Constant addresses have no
separate store group.

* gcc.dg/torture/pr120182.c: New testcase.
---
 gcc/dse.cc  |  5 ++-
 gcc/testsuite/gcc.dg/torture/pr120182.c | 42 +
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr120182.c

diff --git a/gcc/dse.cc b/gcc/dse.cc
index ffc86ffabe5..2b99c660a9a 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1190,7 +1190,10 @@ canon_address (rtx mem,
   address = strip_offset_and_add (address, offset);
 
   if (ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (mem))
- && const_or_frame_p (address))
+ && const_or_frame_p (address)
+ /* Literal addresses can alias any base, avoid creating a
+group for them.  */
+ && ! CONST_INT_P (address))
{
  group_info *group = get_group_info (address);
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr120182.c 
b/gcc/testsuite/gcc.dg/torture/pr120182.c
new file mode 100644
index 000..5e2d171ef98
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr120182.c
@@ -0,0 +1,42 @@
+/* { dg-do run { target { { *-*-linux* *-*-gnu* *-*-uclinux* } && mmap } } } */
+
+#include 
+#include 
+#include 
+
+struct S
+{
+  struct S *next;
+};
+
+static void __attribute__((noipa))
+allocate(void *addr, unsigned long long size)
+{
+  void *ptr = mmap((void *)addr, size,
+  PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE,
+  -1, 0);
+  if(ptr != addr)
+exit(0);
+}
+
+int main (void)
+{
+  int size = 0x8000;
+  char *ptr = (char *)0x288000ull;
+  allocate((void *)ptr, size);
+
+  struct S *s1 = (struct S *)ptr;
+  struct S *s2 = (struct S *)256;
+  for (int i = 0; i < 3; i++)
+{
+  for(char *addr = (char *)s1; addr < (char *)s1 + sizeof(*s1); ++addr)
+   *addr = 0;
+
+  if(s1->next)
+   s1->next = s1->next->next = s2;
+  else
+   s1->next = s2;
+}
+  return 0;
+}
-- 
2.43.0


Re: [pushed] [PATCH] check_GNU_style: Remove literal prefix

2025-05-09 Thread Andreas Schwab
On Mai 09 2025, Torbjörn SVENSSON wrote:

> diff --git a/contrib/check_GNU_style_lib.py b/contrib/check_GNU_style_lib.py
> index f680983d2f3..8b930ef6bdb 100755
> --- a/contrib/check_GNU_style_lib.py
> +++ b/contrib/check_GNU_style_lib.py
> @@ -279,7 +279,9 @@ def check_GNU_style_file(file, format):
>  patch = PatchSet(file)
>  
>  for pfile in patch.added_files + patch.modified_files:
> -t = pfile.target_file.lstrip('b/')
> +t = pfile.target_file
> +if t.startswith('b/'):
> +t = t[2:]

a.k.a. t = pfile.target_file.removeprefix('b/')

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [pushed] [PATCH] check_GNU_style: Remove literal prefix

2025-05-09 Thread Torbjorn SVENSSON




On 2025-05-09 09:30, Andreas Schwab wrote:

On Mai 09 2025, Torbjörn SVENSSON wrote:


diff --git a/contrib/check_GNU_style_lib.py b/contrib/check_GNU_style_lib.py
index f680983d2f3..8b930ef6bdb 100755
--- a/contrib/check_GNU_style_lib.py
+++ b/contrib/check_GNU_style_lib.py
@@ -279,7 +279,9 @@ def check_GNU_style_file(file, format):
  patch = PatchSet(file)
  
  for pfile in patch.added_files + patch.modified_files:

-t = pfile.target_file.lstrip('b/')
+t = pfile.target_file
+if t.startswith('b/'):
+t = t[2:]


a.k.a. t = pfile.target_file.removeprefix('b/')



Yes, but that would limit to python 3.9 or higher.

Kind regards,
Torbjörn


Re: [PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Richard Biener
On Fri, 9 May 2025, Jakub Jelinek wrote:

> On Fri, May 09, 2025 at 09:17:23AM +0200, Richard Biener wrote:
> > RTL DSE forms store groups from unique invariant bases but that is
> > confused when presented with constant addresses where it assigns
> > one store group per unique address.  That causes it to not consider
> > 0x101:QI to alias 0x100:SI.  Constant accesses can really alias
> > to every object, in practice they appear for I/O and for access
> > to objects fixed via linker scripts for example.  So simply avoid
> > registering a store group for them.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR rtl-optimization/120182
> > * dse.cc (canon_address): Constant addresses have no
> > separate store group.
> > 
> > * gcc.dg/torture/pr120182.c: New testcase.
> > ---
> >  gcc/dse.cc  |  5 ++-
> >  gcc/testsuite/gcc.dg/torture/pr120182.c | 42 +
> >  2 files changed, 46 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr120182.c
> > 
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index ffc86ffabe5..2b99c660a9a 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1190,7 +1190,10 @@ canon_address (rtx mem,
> >address = strip_offset_and_add (address, offset);
> >  
> >if (ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (mem))
> > - && const_or_frame_p (address))
> > + && const_or_frame_p (address)
> > + /* Literal addresses can alias any base, avoid creating a
> > +group for them.  */
> > + && ! CONST_INT_P (address))
> 
> Perhaps better CONST_SCALAR_INT_P instead of CONST_INT_P?

Do we ever get a wide_int for Pmode/ptr_mode?  But sure, I can
fix it that way.

Richard.

> Otherwise LGTM.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-09 Thread Richard Biener
On Thu, 8 May 2025, Pengfei Li wrote:

> This patch folds vector expressions of the form (x + y) >> 1 into
> IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that
> support averaging operations. For example, it can help improve the
> codegen on AArch64 from:
>   add v0.4s, v0.4s, v31.4s
>   ushrv0.4s, v0.4s, 1
> to:
>   uhadd   v0.4s, v0.4s, v31.4s
> 
> As this folding is only valid when the most significant bit of each
> element in both x and y is known to be zero, this patch checks leading
> zero bits of elements in x and y, and extends get_nonzero_bits_1() to
> handle uniform vectors. When the input is a uniform vector, the function
> now returns the nonzero bits of its element.
> 
> Additionally, this patch adds more checks to reject vector types in bit
> constant propagation (tree-bit-ccp), since tree-bit-ccp was designed for
> scalar values only, and the new vector logic in get_non_zero_bits_1()
> could lead to incorrect propagation results.
> 
> Bootstrapped and tested on aarch64-linux-gnu and x86_64_linux_gnu.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add folding rule for vector average.
>   * tree-ssa-ccp.cc (get_default_value): Reject vector types.
>   (evaluate_stmt): Reject vector types.
>   * tree-ssanames.cc (get_nonzero_bits_1): Extend to handle
>   uniform vectors.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/acle/uhadd_1.c: New test.
> ---
>  gcc/match.pd  |  9 +
>  .../gcc.target/aarch64/acle/uhadd_1.c | 34 +++
>  gcc/tree-ssa-ccp.cc   |  8 ++---
>  gcc/tree-ssanames.cc  |  8 +
>  4 files changed, 55 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ab496d923cc..ddd16a10944 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2177,6 +2177,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (view_convert (rshift (view_convert:ntype @0) @1))
>  (convert (rshift (convert:ntype @0) @1))
>  
> + /* Fold ((x + y) >> 1 into IFN_AVG_FLOOR (x, y) if x and y are vectors in
> +which each element is known to have at least one leading zero bit.  */
> +(simplify
> + (rshift (plus:cs @0 @1) integer_onep)
> + (if (VECTOR_TYPE_P (type)
> +  && wi::clz (get_nonzero_bits (@0)) > 0
> +  && wi::clz (get_nonzero_bits (@1)) > 0)
> +  (IFN_AVG_FLOOR @0 @1)))

You need to check that IFN_AVG_FLOOR is supported using
direct_internal_fn_supported_p here.

Otherwise this is OK with me.

Richard.

> +
>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> when profitable.
> For bitwise binary operations apply operand conversions to the
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> new file mode 100644
> index 000..f1748a199ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> @@ -0,0 +1,34 @@
> +/* Test if SIMD fused unsigned halving adds are generated */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +#define FUSED_SIMD_UHADD(vectype, q, ts, mask) \
> +  vectype simd_uhadd ## q ## _ ## ts ## _1 (vectype a) \
> +  { \
> +vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
> +vectype v2 = vdup ## q ## _n_ ## ts (mask); \
> +return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
> +  } \
> +  \
> +  vectype simd_uhadd ## q ## _ ## ts ## _2 (vectype a, vectype b) \
> +  { \
> +vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
> +vectype v2 = vand ## q ## _ ## ts (b, vdup ## q ## _n_ ## ts (mask)); \
> +return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
> +  }
> +
> +FUSED_SIMD_UHADD (uint8x8_t, , u8, 0x7f)
> +FUSED_SIMD_UHADD (uint8x16_t, q, u8, 0x7f)
> +FUSED_SIMD_UHADD (uint16x4_t, , u16, 0x7fff)
> +FUSED_SIMD_UHADD (uint16x8_t, q, u16, 0x7fff)
> +FUSED_SIMD_UHADD (uint32x2_t, , u32, 0x7fff)
> +FUSED_SIMD_UHADD (uint32x4_t, q, u32, 0x7fff)
> +
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.8b,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.16b,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.4h,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.8h,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.2s,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.4s,} 2 } } */
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 8d2cbb384c4..3e0c75cf2be 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -298,7 +298,7 @@ get_default_value (tree var)
>   {
> val.lattice_val = VARYING;
> val.mask = -1;
> -   if (flag_tree_bit_ccp)
> +   if (flag_tree_bit_ccp && !VECTOR_TYPE_P (TREE_TYPE (var)))
>   {
>  

[PATCH][14] tree-optimization/120156 - ICE in ptr_derefs_may_alias_p

2025-05-09 Thread Richard Biener
This picks the ptr_derefs_may_alias_p fix from the PR99954 fix
which said:  This makes us run into a latent issue in
ptr_deref_may_alias_decl_p when the pointer is something like &MEM[0].a
in which case we fail to handle non-SSA name pointers.  Add code
similar to what we have in ptr_derefs_may_alias_p.

Bootstrap/regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/120156
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.

(cherry picked from commit c290e6a0b7a9de5692963affc6627a4af7dc2411)
---
 gcc/tree-ssa-alias.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 790447b3840..0a1f3d257a4 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -295,6 +295,11 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
   if (!may_be_aliased (decl))
 return false;
 
+  /* From here we require a SSA name pointer.  Anything else aliases.  */
+  if (TREE_CODE (ptr) != SSA_NAME
+  || !POINTER_TYPE_P (TREE_TYPE (ptr)))
+return true;
+
   /* If we do not have useful points-to information for this pointer
  we cannot disambiguate anything else.  */
   pi = SSA_NAME_PTR_INFO (ptr);
-- 
2.43.0


Re: [PATCH] rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

2025-05-09 Thread Jakub Jelinek
On Fri, May 09, 2025 at 09:34:14AM +0200, Richard Biener wrote:
> > Perhaps better CONST_SCALAR_INT_P instead of CONST_INT_P?
> 
> Do we ever get a wide_int for Pmode/ptr_mode?  But sure, I can

Most likely not.  Only if we start supporting > 64-bit pointers.

Jakub



Re: [PATCH] [testsuite] [ppc] pr87600, pr89313: test for __PPC__ as well

2025-05-09 Thread Alexandre Oliva
On Apr 16, 2025, Alexandre Oliva  wrote:

> On Apr 14, 2025, Peter Bergner  wrote:
>> On 4/11/25 1:03 PM, Alexandre Oliva wrote:
>>> gcc.dg/pr87600.h and gcc.dg/pr89313.c test for __powerpc__ and
>>> __POWERPC__ to choose ppc register names, but ppc-elf defines neither;
>>> it defines __PPC__, so test for that as well.

>> That said, I think this probably falls under the "obvious" rule too.

> Thanks,

I'm putting it in, thanks for the advice.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


[PATCH, COMMITTED] MAINTAINERS: Update my email address

2025-05-09 Thread Peter Bergner

MAINTAINERS: Update my email address

2025-05-09  Peter Bergner  

/
    * MAINTAINERS: Update my email address and add myself to DCO.

Signed-off-by: Peter Bergner 

---
 MAINTAINERS | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b1e7fadf1b8..209966f741a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -309,7 +309,7 @@ loop optimizer  Zdenek Dvorak   


 LTO Richard Biener 
 LTO plugin  Cary Coutant 
 Plugin  Le-Chun Wu 
-register allocation Peter Bergner 
+register allocation Peter Bergner 
 register allocation Kenneth Zadeck 
 register allocation Seongbae Park 
 riscv port  Robin Dapp 
@@ -358,7 +358,7 @@ Serge Belyshev -   


 Jon Beniston    jbeniston 
 Andrew Bennett  - 
 Andrew Benson   abensonca 
-Peter Bergner   bergner 
+Peter Bergner   bergner 
 Daniel Berlin   dberlin 
 Pat Bernardi    - 
 Jan Beulich - 
@@ -933,6 +933,7 @@ information.


 Soumya AR 
+Peter Bergner 
 Juergen Christ 
 Giuseppe D'Angelo 
 Robin Dapp 
--
2.39.5 (Apple Git-154)




[PATCH] fortran: Fix debug info for unsigned(kind=1) and unsigned(kind=4) [PR120193]

2025-05-09 Thread Jakub Jelinek
Hi!

As the following testcase shows, debug info for unsigned(kind=1)
and unsigned(kind=4) vars is wrong while unsigned(kind=2), unsigned(kind=8)
and unsigned(kind=16) look right.
Instead of objects having unsigned(kind=1) type they have character(kind=1)
and instead of unsigned(kind=4) they have character(kind=4).
This means in gdb e.g. unsigned(kind=1) :: a(2) variable initialized to
97 will print as 'aa' rather than (97, 97) etc.
While there can be just one unsigned_char_type_node and one
unsigned_type_node type, each can have arbitrary number of variants
(e.g. consider C
typedef unsigned char uc;
where uc is a variant type to unsigned char) or even distinct types
with different TYPE_MAIN_VARIANT.

The following patch uses a variant of the character(kind=4) type
for unsigned(kind=4) and a distinct type based on character(kind=1)
type for unsigned(kind=1).  The reason for the latter is that
unsigned_char_type_node has TYPE_STRING_FLAG set on it, so it has
DW_AT_encoding DW_ATE_unsigned_char rather than DW_ATE_unsigned and
so the debugger then likes to print it as characters rather than numbers.
That is IMHO in Fortran desirable for character(kind=1) but not for
unsigned(kind=1).  I've made sure TYPE_CANONICAL of the unsigned(kind=1)
type is still character(kind=1), so they are considered compatible by
the middle-end also e.g. for aliasing etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and later on for 15.2?

2025-05-09  Jakub Jelinek  

PR fortran/120193
* trans-types.cc (gfc_init_types): For flag_unsigned use
build_distinct_type_copy or build_variant_type_copy from
gfc_character_types[index_char] if index_char > -1 instead of
gfc_character_types[index_char] or
gfc_build_unsigned_type (&gfc_unsigned_kinds[index]).

* gfortran.dg/guality/pr120193.f90: New test.

--- gcc/fortran/trans-types.cc.jj   2025-02-23 23:35:37.815295413 +0100
+++ gcc/fortran/trans-types.cc  2025-05-09 16:32:42.786931771 +0200
@@ -1140,11 +1140,6 @@ gfc_init_types (void)
 }
   gfc_character1_type_node = gfc_character_types[0];
 
-  /* The middle end only recognizes a single unsigned type.  For
- compatibility of existing test cases, let's just use the
- character type.  The reader of tree dumps is expected to be able
- to deal with this.  */
-
   if (flag_unsigned)
 {
   for (index = 0; gfc_unsigned_kinds[index].kind != 0;++index)
@@ -1159,18 +1154,26 @@ gfc_init_types (void)
  break;
}
}
- if (index_char > 0)
+ if (index_char > -1)
{
- gfc_unsigned_types[index] = gfc_character_types[index_char];
+ type = gfc_character_types[index_char];
+ if (TYPE_STRING_FLAG (type))
+   {
+ type = build_distinct_type_copy (type);
+ TYPE_CANONICAL (type)
+   = TYPE_CANONICAL (gfc_character_types[index_char]);
+   }
+ else
+   type = build_variant_type_copy (type);
+ TYPE_NAME (type) = NULL_TREE;
+ TYPE_STRING_FLAG (type) = 0;
}
  else
-   {
- type = gfc_build_unsigned_type (&gfc_unsigned_kinds[index]);
- gfc_unsigned_types[index] = type;
- snprintf (name_buf, sizeof(name_buf), "unsigned(kind=%d)",
-   gfc_integer_kinds[index].kind);
- PUSH_TYPE (name_buf, type);
-   }
+   type = gfc_build_unsigned_type (&gfc_unsigned_kinds[index]);
+ gfc_unsigned_types[index] = type;
+ snprintf (name_buf, sizeof(name_buf), "unsigned(kind=%d)",
+   gfc_integer_kinds[index].kind);
+ PUSH_TYPE (name_buf, type);
}
 }
 
--- gcc/testsuite/gfortran.dg/guality/pr120193.f90.jj   2025-05-09 
16:47:45.947666575 +0200
+++ gcc/testsuite/gfortran.dg/guality/pr120193.f90  2025-05-09 
16:54:56.139820197 +0200
@@ -0,0 +1,26 @@
+! PR fortran/120193
+! { dg-do run }
+! { dg-options "-g -funsigned" }
+! { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } }
+
+program foo
+  unsigned(kind=1) :: a(2), e
+  unsigned(kind=2) :: b(2), f
+  unsigned(kind=4) :: c(2), g
+  unsigned(kind=8) :: d(2), h
+  character(kind=1, len=1) :: i(2), j
+  character(kind=4, len=1) :: k(2), l
+  a = 97u_1! { dg-final { gdb-test 24 "a" "d" } }
+  b = 97u_2! { dg-final { gdb-test 24 "b" "c" } }
+  c = 97u_4! { dg-final { gdb-test 24 "c" "b" } }
+  d = 97u_8! { dg-final { gdb-test 24 "d" "a" } }
+  e = 97u_1! { dg-final { gdb-test 24 "e" "97" } }
+  f = 97u_2! { dg-final { gdb-test 24 "f" "97" } }
+  g = 97u_4! { dg-final { gdb-test 24 "g" "97" } }
+  h = 97u_8! { dg-final { gdb-test 24 "h" "97" } }
+  i = 'a'  ! { dg-final { gdb-test 24 "i" "('a', 'a')" } }
+  j = 'b'  ! { dg-final { gdb-test 24 "j" "'b'" } }
+  k = 'c'
+  l = 'd'
+  print *, a
+end program

Jakub



[PATCH v2] match: Add a condition to `(ne (cmp) 0)` and `(eq (cmp) 1)` patterns for non-call exceptions

2025-05-09 Thread Andrew Pinski
The problem with these patterns is they will do a copy of the comparison
so for non-call exceptions (and trapping math), we could copy a statement
that can throw internally and then remove the landing pad information.
Or in the case of throwing externally, always create a copy from GIMPLE_COND.

That is if we have:
```
_2 = _1 < 0.0;
if (_2 != 0)
```
this pattern will cause a copy to be produced:
```
_2 = _1 < 0.0;
_3 = _1 < 0.0;
if (_3 != 0)
```

So let's restrict these pattern to only non-call exception throwing comparisons.
There is a pattern below that already handles `bool_name != 0` into `bool_name`
which will be used in the case of the non-call exceptions throwing comparisons.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Restrict to comparisons
that don't throw.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index ab496d923cc..e99ed40fbd1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6900,12 +6900,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* The following bits are handled by fold_binary_op_with_conditional_arg.  */
  (simplify
   (ne (cmp@2 @0 @1) integer_zerop)
-  (if (types_match (type, TREE_TYPE (@2)))
-   (cmp @0 @1)))
+  /* For non-call exceptions, don't copy stmts that might throw (trap). */
+  (if (!flag_exceptions
+   || !(cfun && cfun->can_throw_non_call_exceptions)
+   || !operation_could_trap_p (cmp, FLOAT_TYPE_P (TREE_TYPE (@0)),
+  false, NULL_TREE))
+   (if (types_match (type, TREE_TYPE (@2)))
+(cmp @0 @1
  (simplify
   (eq (cmp@2 @0 @1) integer_truep)
-  (if (types_match (type, TREE_TYPE (@2)))
-   (cmp @0 @1)))
+  /* For non-call exceptions, don't copy stmts that might throw (trap). */
+  (if (!flag_exceptions
+   || !(cfun && cfun->can_throw_non_call_exceptions)
+   || !operation_could_trap_p (cmp, FLOAT_TYPE_P (TREE_TYPE (@0)),
+  false, NULL_TREE))
+   (if (types_match (type, TREE_TYPE (@2)))
+(cmp @0 @1
  (simplify
   (ne (cmp@2 @0 @1) integer_truep)
   (if (types_match (type, TREE_TYPE (@2)))
-- 
2.43.0



Re: [PATCH] [testsuite] [ppc] block-cmp-8 should require powerpc64

2025-05-09 Thread Alexandre Oliva
On Apr 11, 2025, Alexandre Oliva  wrote:

>   * gcc.target/powerpc/block-cmp-8.c: Require powerpc64
>   instruction execution support.
> --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
> -/* { dg-require-effective-target has_arch_ppc64 } */
> +/* { dg-require-effective-target powerpc64 } */

On Apr 16, 2025, Peter Bergner  wrote:

> On 4/16/25 12:27 AM, Alexandre Oliva wrote:
>> ...may I understand your initial response in this thread
>> as approval of that patch?  That wasn't clear either.

> If it were me, I'd give Segher and the others a couple of days to disagree
> and not hearing any objections, I'd push it under the "obvious" rule.

Thanks, I'm proceeding under your advice.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re: [PATCH] match: Remove (ne (cmp) 0) and (eq (cmp) 1) patterns

2025-05-09 Thread Andrew Pinski
On Fri, May 9, 2025 at 6:27 AM Richard Biener
 wrote:
>
> On Fri, May 9, 2025 at 1:34 PM Andrew Pinski  wrote:
> >
> > On Fri, May 9, 2025 at 1:21 AM Richard Biener
> >  wrote:
> > >
> > > On Fri, May 9, 2025 at 4:51 AM Andrew Pinski  
> > > wrote:
> > > >
> > > > These patterns are not needed any more. There were already
> > > > 2 patterns which did `(ne bool_var 0)` into `bool_var` and
> > > > `(eq bool_var 1)` into `bool_var`. Just they were after the
> > > > pattern that did `(cmp (cond @0 @1 @2) @3)` simplification but
> > > > that pattern is now after the ones.
> > > > Also these patterns will cause in some cases a new statement to
> > > > be created for the comparison. In the case of floating point comparison
> > > > wiht non-call exceptions (and trapping math), can cause a new statement
> > > > every time fold_stmt is called.
> > >
> > > Hmm, but do we still fold
> > >
> > >   _1 = _2 < 1;
> > >   if (_1 != 0)
> > >
> > > to
> > >
> > >   if (_2 < 1)
> > >
> > > or does that now again rely on forwprops explicit forwarding into
> > > gcond?  I wanted
> > > to get rid of the latter eventually.
> >
> > Oh. Yes this does rely on forwprop explicitly now.
> >
> > >
> > > I agree that the trapping math thing is bad - I wonder if we can catch 
> > > that more
> > > intelligently (not sure how without following SSA use-def of gconds on 
> > > bools
> > > and see whether they can trap and then not simplifying)
> >
> > I think I know the way to fix the trapping issue without fully
> > removing this. I am going to give it a go later today.
> > Since trapping only depends on the code and the type it should be easy
> > to add an extra condition here and the latter patterns catch the
> > trapping case of removing `bool!=0` already.
>
> Note it's really depending on context.
>
> _1 = _2 < 1.;
> _3 = _1 != 0;
>
> would be OK to fold to
>
> _3 = _2 < 1;
>
> but not with the _1 != 0 in the gcond.  That's because gconds can't
> throw (and I think
> rightfully so).  In principle we should go full steam ahead to have
> single-operand
> gconds, just the boolean value.  Like we now do for COND_EXPRs.  But
> this unfortunately
> has very large fallout :/
>
> Thus the "workaround" for non-call-EH.  I believe any mitigation should be in
> the match-and-simplify plumbing that handles the gcond - which we already do,
> but the side-effect is the ping-pong you are observing.  Maybe we can do
> better in replace_stmt_with_simplification where we should hit(?)
>
>   else if (!inplace)
> {
>   tree res = maybe_push_res_to_seq (res_op, seq);
>   if (!res)
> return false;
>   gimple_cond_set_condition (cond_stmt, NE_EXPR, res,
>  build_zero_cst (TREE_TYPE (res)));
>
> and detect when the cond_stmt is SSA != 0 (or the reverse canonical form)
> and refuse to simplify if the simplification in 'res_op' is the same as the
> current definition of SSA?

I submitted 2 different patches.
One which modifies the match.pd pattern:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683142.html.
The patch to replace_stmt_with_simplification :
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683171.html

In fact I started with the replace_stmt_with_simplification but I was
not 100% sure it was the best idea so I didn't submit it.
I am ok with either one too.

Thanks,
Andrew Pinski



>
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > > gcc.dg/tree-ssa/vrp24.c needed to be adjusted to before 
> > > > r13-322-g7f04b0d786e13f.
> > > > gcc.dg/analyzer/null-deref-pr102671-2.c needs an increased 
> > > > analyzer-max-svalue-depth
> > > > not to get an extra warning.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd (`(ne (cmp) 0)`, `(eq (cmp) 1)`): Remove.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/vrp24.c: Adjust.
> > > > * gcc.dg/analyzer/null-deref-pr102671-2.c: Increase 
> > > > analyzer-max-svalue-depth.
> > > >
> > > > Signed-off-by: Andrew Pinski 
> > > > ---
> > > >  gcc/match.pd  | 8 
> > > >  gcc/testsuite/gcc.dg/analyzer/null-deref-pr102671-2.c | 2 +-
> > > >  gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +-
> > > >  3 files changed, 2 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index ab496d923cc..418efc4230a 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -6898,14 +6898,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >  (if (ic == ncmp)
> > > >   (ncmp @0 @1)
> > > >   /* The following bits are handled by 
> > > > fold_binary_op_with_conditional_arg.  */
> > > > - (simplify
> > > > -  (ne (cmp@2 @0 @1) integer_zerop)
> > > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > > -   (cmp @0 @1)))
> > > > - (simplify
> > > > -  (eq (cmp@2 @0 @1) integer_truep)
> > > > -  (if (types_match (type, TREE_TYPE (@2)))
> > > > -   (cmp @0 @1)))
> > > >   (simp

[pushed] c++: recursive instantiation diagnostic [PR120204]

2025-05-09 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Here tsubst_baselink was returning error_mark_node silently despite
tf_error; we need to actually give an error.

PR c++/120204

gcc/cp/ChangeLog:

* pt.cc (tsubst_baselink): Always error if lookup fails.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-recursion3.C: New test.
---
 gcc/cp/pt.cc | 16 
 .../g++.dg/cpp1y/constexpr-recursion3.C  | 15 +++
 2 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-recursion3.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 09f74a2814b..0d64a1cfb12 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17477,10 +17477,18 @@ tsubst_baselink (tree baselink, tree object_type,
 
   if (!baselink)
{
- if ((complain & tf_error)
- && constructor_name_p (name, qualifying_scope))
-   error ("cannot call constructor %<%T::%D%> directly",
-  qualifying_scope, name);
+ if (complain & tf_error)
+   {
+ if (constructor_name_p (name, qualifying_scope))
+   error ("cannot call constructor %<%T::%D%> directly",
+  qualifying_scope, name);
+ else
+   /* Lookup succeeded at parse time, but failed during
+  instantiation; must be because we're trying to refer to it
+  while forming its declaration (c++/120204).  */
+   error ("declaration of %<%T::%D%> depends on itself",
+  qualifying_scope, name);
+   }
  return error_mark_node;
}
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-recursion3.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-recursion3.C
new file mode 100644
index 000..cadbdf08c97
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-recursion3.C
@@ -0,0 +1,15 @@
+// PR c++/120204
+// { dg-do compile { target c++14 } }
+
+template
+struct array{};
+
+template  struct ILEArglist {
+  using Sizes = array;
+  static constexpr int size() {// { dg-bogus "not usable" }
+Sizes &offsets_c = offsets;// { dg-error "depends on itself" }
+return 0;
+  }
+  array offsets(); // { dg-error "constant expression" }
+};
+auto arglist = ILEArglist<>();

base-commit: d50e08095b57131e6f1a80b45959087e233376e8
-- 
2.49.0



[PATCH] gimple-fold: Canonicalize _Bool == 0 and _Bool != 1

2025-05-09 Thread Andrew Pinski
This move this canonicalization from forwprop 
(forward_propagate_into_gimple_cond)
to gimple-fold.
This is a step in removing forward_propagate_into_gimple_cond from forwprop.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gimple-fold.cc (replace_stmt_with_simplification): Canonicalize
`_Bool == 0` and `_Bool != 1` into `_Bool != 0` with swapping
the edges.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c3a9f6356d4..e6d1384c416 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6079,7 +6079,24 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
 {
   gcc_assert (res_op->code.is_tree_code ());
   auto code = tree_code (res_op->code);
-  if (TREE_CODE_CLASS (code) == tcc_comparison
+  /* Canonicalize _Bool == 0 and _Bool != 1 to _Bool != 0 by swapping 
edges.  */
+  if ((TREE_CODE (TREE_TYPE (ops[0])) == BOOLEAN_TYPE
+  || (INTEGRAL_TYPE_P (TREE_TYPE (ops[0]))
+  && TYPE_PRECISION (TREE_TYPE (ops[0])) == 1))
+  && ((code == EQ_EXPR
+   && integer_zerop (ops[1]))
+  || (code == NE_EXPR
+  && integer_onep (ops[1])))
+   && gimple_bb (stmt))
+   {
+ basic_block bb = gimple_bb (stmt);
+ gimple_cond_set_code (cond_stmt, NE_EXPR);
+ gimple_cond_set_lhs (cond_stmt, ops[0]);
+ gimple_cond_set_rhs (cond_stmt, build_zero_cst (TREE_TYPE (ops[0])));
+ EDGE_SUCC (bb, 0)->flags ^= (EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
+ EDGE_SUCC (bb, 1)->flags ^= (EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
+   }
+  else if (TREE_CODE_CLASS (code) == tcc_comparison
  /* GIMPLE_CONDs condition may not throw.  */
  && (!flag_exceptions
  || !cfun->can_throw_non_call_exceptions
-- 
2.43.0



[PATCH] gimple-fold: Don't replace `tmp = FP0 CMP FP1; if (tmp != 0)` over and over again when comparison can throw

2025-05-09 Thread Andrew Pinski
with -ftrapping-math -fnon-call-exceptions and:
```
tmp = FP0 CMP FP1;

if (tmp != 0) ...
```
a call fold_stmt on the GIMPLE_COND will replace the above with
a new tmp each time and we even lose the eh informatin on the
previous comparison too.

gcc/ChangeLog:

* gimple-fold.cc (replace_stmt_with_simplification): Reject for
noncall exceptions replacing comparison with itself.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 7b3a3d30045..4ff5dbb8d50 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6276,6 +6276,32 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
}
   else if (!inplace)
{
+ /* For throwing comparisons, see if the GIMPLE_COND is the same as
+the comparison would be.
+This can happen due to the match pattern for
+`(ne (cmp @0 @1) integer_zerop)` which creates a new expression
+for the comparison.  */
+ if (TREE_CODE_CLASS (code) == tcc_comparison
+ && flag_exceptions
+ && cfun->can_throw_non_call_exceptions
+ && operation_could_trap_p (code,
+FLOAT_TYPE_P (TREE_TYPE (ops[0])),
+false, NULL_TREE))
+   {
+ tree lhs = gimple_cond_lhs (cond_stmt);
+ if (gimple_cond_code (cond_stmt) == NE_EXPR
+ && TREE_CODE (lhs) == SSA_NAME
+ && TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
+ && integer_zerop (gimple_cond_rhs (cond_stmt)))
+   {
+ gimple *s = SSA_NAME_DEF_STMT (lhs);
+ if (is_gimple_assign (s)
+ && gimple_assign_rhs_code (s) == code
+ && operand_equal_p (gimple_assign_rhs1 (s), ops[0])
+ && operand_equal_p (gimple_assign_rhs2 (s), ops[1]))
+   return false;
+   }
+   }
  tree res = maybe_push_res_to_seq (res_op, seq);
  if (!res)
return false;
-- 
2.43.0



  1   2   >