date:20250516

Re: [PATCH] MATCH: Fix patterns of type (a != b) and (a == b) [PR117760]

2025-05-16 Thread Richard Biener

On Tue, Apr 15, 2025 at 8:27 AM Eikansh Gupta  wrote:
>
> The patterns can be simplified as shown below:
>
> (a != b) & ((a|b) != 0)  -> (a != b)
> (a != b) | ((a|b) != 0)  -> ((a|b) != 0)
>
> The similar simplification can be there for (a == b). This patch adds
> simplification for above patterns. The forwprop pass was modifying the
> patterns to some other form and they were not getting simplified. The
> patch also adds simplification for those patterns.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR 117760
>
> gcc/ChangeLog:
>
> * match.pd ((a != b) and/or ((a | b) != 0)): New pattern.
>   ((a == b) and/or (a | b) == 0): New pattern.
>   ((a == b) & (a | b) == 0): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr117760-1.c: New test.
> * gcc.dg/tree-ssa/pr117760-2.c: New test.
> * gcc.dg/tree-ssa/pr117760.c: New test.
>
> Signed-off-by: Eikansh Gupta 
> ---
>  gcc/match.pd   | 58 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr117760-1.c | 51 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr117760-2.c | 37 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr117760.c   | 51 +++
>  4 files changed, 197 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr117760-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr117760-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr117760.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5c679848bdf..291c08d5882 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4852,6 +4852,64 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && expr_no_side_effects_p (@1))
> @2)))
>
> +(for cmp (eq ne)
> + (for bitop (bit_ior bit_and)
> +  /* ((a != b) & ((a | b) != 0)) -> (a != b) */
> +  /* ((a != b) | ((a | b) != 0)) -> ((a|b) != 0) */
> +  /* ((a == b) & (a | b) == 0) -> ((a|b) == 0) */
> +  /* ((a == b) | (a | b) == 0) -> (a == b) */
> +  (simplify
> +   (bitop:c
> +(cmp@3 @1 @2)
> + (cmp@4 (bit_ior @1 @2) integer_zerop))
> +   (if ((cmp == EQ_EXPR) ^ (bitop == BIT_IOR_EXPR))
> +@4 @3))

These two-level cases look OK to me, can you split this out to
a separate patch please?

> +
> +  /* ((a == b) | (a == 0) | (b == 0)) -> (a == b) */

a = 1, b = 0:  0 | 0 | 1 == 1 != (1 == 0)

so this looks wrong?

> +  /* ((a == b) & ((a == 0) | (b == 0))) -> ((a|b) == 0) */

How often do these three-level cases happen?  I wonder if they
are not better handled in tree-ssa-reassoc.cc given that would
handle an arbitrary number of cases.  In fact it might already
handle some?

> +  (simplify
> +   (bitop:c
> +(eq:c@3 @1 @2)
> + (bit_ior (eq @1 integer_zerop) (eq @2 integer_zerop)))
> +   (if (bitop == BIT_IOR_EXPR)
> +@3 (eq (bit_ior @1 @2) { build_zero_cst (TREE_TYPE (@1)); }
> +
> +  /* (((a == b) & (a == 0)) | (b == 0)) -> ((a|b) == 0) */
> +  /* (((a != b) & (a != 0)) | (b != 0)) -> (a != b) */
> +  (simplify
> +   (bit_ior:c (bit_and (cmp @1 integer_zerop) (cmp:c @1 @2))
> +(cmp @2 integer_zerop))
> +   (if (cmp == EQ_EXPR)
> +(eq (bit_ior @1 @2) { build_zero_cst (TREE_TYPE (@1)); })
> + (ne @1 @2
> +
> +/* ((a != b) | (a != 0) | (b != 0)) -> ((a|b) != 0) */
> +(simplify
> + (bit_ior:c
> +  (ne @1 @2)
> +   (bit_ior:c (ne @1 integer_zerop) (ne @2 integer_zerop)))
> + (ne (bit_ior @1 @2) { build_zero_cst (TREE_TYPE (@1)); }))
> +
> +/* ((a != b) & ((a | b) == 0)) -> false */
> +(simplify
> + (bit_and:c
> +  (ne @1 @2)
> +   (eq (bit_ior @1 @2) integer_zerop))
> + { constant_boolean_node (false, type); })
> +
> +/* ((a != b) & ((a == 0) | (b == 0))) -> false */
> +(simplify
> + (bit_and:c
> +  (ne:c@3 @1 @2)
> +   (bit_ior:c (eq @1 integer_zerop) (eq @2 integer_zerop)))
> + { constant_boolean_node (false, type); })
> +
> +/* (((a != b) & (a == 0)) | (b == 0)) -> false */
> +(simplify
> + (bit_ior:c (bit_and:c (eq @1 integer_zerop) (ne:c @1 @2))
> +  (eq @2 integer_zerop))
> + { constant_boolean_node (false, type); })
> +
>  /* Simplifications of shift and rotates.  */
>
>  (for rotate (lrotate rrotate)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr117760-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr117760-1.c
> new file mode 100644
> index 000..94a05f20ff0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr117760-1.c
> @@ -0,0 +1,51 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> +
> +int f1(int a, int b)
> +{
> +  if (a == 0 || b == 0)
> +return a == b;
> +  return 0;
> +}
> +
> +int f2(int a, int b)
> +{
> +  int c = a == 0;
> +  int d = b == 0;
> +  int e = a == b;
> +  return c|d|e;
> +}
> +
> +int f3(int a, int b)
> +{
> +  int c = a == 0;
> +  int d = b == 0;
> +  int e = a == b;
> +  return (c|d)&e;
> +}
> +
> +int f4(int a, int b)
> +{
> +  int c = a == 0;
> +  int d = b == 0;
> +  int e = a == b;
> +  return c|d&e;
> +}
> +
> +int f5(int a, int b)
> +{
> +  int c = (a|b) == 0;
> +  int e = a == b;
> +  ret

Re: [PATCH] c++: Further simplify the stdlib inline folding

2025-05-16 Thread Jason Merrill


On 5/15/25 4:58 PM, Ville Voutilainen wrote:

On Thu, 15 May 2025 at 18:32, Ville Voutilainen
 wrote:


On Thu, 15 May 2025 at 18:19, Jason Merrill  wrote:


@@ -3347,8 +3347,6 @@ cp_fold (tree x, fold_flags_t flags)
   || id_equal (DECL_NAME (callee), "as_const")))
 {
   r = CALL_EXPR_ARG (x, 0);
- if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r)))
-   r = build_nop (TREE_TYPE (x), r);


This is removing the conversion entirely; I'm rather surprised it didn't
break anything.  I thought you were thinking to make the build_nop
unconditional.


Oops. Yes, that makes more sense. I am confused how that build_nop
actually works, but it indeed should
convert r to x, and not be completely nuked. Re-doing...


So, let's try this again. As discussed privately, the difference of
whether that build_nop is there is hard to the point of
unknown-how-to test-trigger, but this patch makes more sense.

Tested on Linux-PPC64 (gcc112). Ok for trunk?


OK.


   Further simplify the stdlib inline folding

gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold): Do the conversion
unconditionally, even for same-type cases.

gcc/ChangeLog:
* doc/invoke.texi: Add to_underlying to -ffold-simple-inlines.

Re: [PATCH] c++/modules: Clean up importer_interface

2025-05-16 Thread Jason Merrill


On 5/16/25 9:14 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

This patch removes some no longer needed special casing in linkage
determination, and makes the distinction between "always_emit" and
"internal" for better future-proofing.

gcc/cp/ChangeLog:

* module.cc (importer_interface): Adjust flags.
(get_importer_interface): Rename flags.
(trees_out::core_bools): Clean up special casing.
(trees_out::write_function_def): Rename flag.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 50 +---
  1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4f9c3788380..200e1c2deb3 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5546,8 +5546,10 @@ trees_in::start (unsigned code)
  
  enum class importer_interface {

unknown,  /* The definition may or may not need to be emitted.  */
-  always_import,  /* The definition can always be found in another TU.  */
-  always_emit,   /* The definition must be emitted in the importer's TU. */
+  external,  /* The definition can always be found in another TU.  */
+  internal,  /* The definition should be emitted in the importer's TU.  */
+  always_emit,   /* The definition must be emitted in the importer's TU,
+regardless of if it's used or not. */
  };
  
  /* Returns what kind of interface an importer will have of DECL.  */

@@ -5558,13 +5560,13 @@ get_importer_interface (tree decl)
/* Internal linkage entities must be emitted in each importer if
   there is a definition available.  */
if (!TREE_PUBLIC (decl))
-return importer_interface::always_emit;
+return importer_interface::internal;
  
-  /* Entities that aren't vague linkage are either not definitions or

- will be emitted in this TU, so importers can just refer to an
- external definition.  */
+  /* Other entities that aren't vague linkage are either not definitions
+ or will be publicly emitted in this TU, so importers can just refer
+ to an external definition.  */
if (!vague_linkage_p (decl))
-return importer_interface::always_import;
+return importer_interface::external;
  
/* For explicit instantiations, importers can always rely on there

   being a definition in another TU, unless this is a definition
@@ -5574,13 +5576,13 @@ get_importer_interface (tree decl)
&& DECL_EXPLICIT_INSTANTIATION (decl))
  return (header_module_p () && !DECL_EXTERNAL (decl)
? importer_interface::always_emit
-   : importer_interface::always_import);
+   : importer_interface::external);
  
/* A gnu_inline function is never emitted in any TU.  */

if (TREE_CODE (decl) == FUNCTION_DECL
&& DECL_DECLARED_INLINE_P (decl)
&& lookup_attribute ("gnu_inline", DECL_ATTRIBUTES (decl)))
-return importer_interface::always_import;
+return importer_interface::external;
  
/* Everything else has vague linkage.  */

return importer_interface::unknown;
@@ -5722,29 +5724,13 @@ trees_out::core_bools (tree t, bits_out& bits)
   DECL_NOT_REALLY_EXTERN -> base.not_really_extern
 == that was a lie, it is here  */
  
+	/* decl_flag_1 is DECL_EXTERNAL. Things we emit here, might

+  well be external from the POV of an importer.  */
bool is_external = t->decl_common.decl_flag_1;
-   if (!is_external)
- /* decl_flag_1 is DECL_EXTERNAL. Things we emit here, might
-well be external from the POV of an importer.  */
- // FIXME: Do we need to know if this is a TEMPLATE_RESULT --
- // a flag from the caller?
- switch (code)
-   {
-   default:
- break;
-
-   case VAR_DECL:
- if (TREE_PUBLIC (t)
- && DECL_VTABLE_OR_VTT_P (t))
-   /* We handle vtable linkage specially.  */
-   is_external = true;
- gcc_fallthrough ();
-   case FUNCTION_DECL:
- if (get_importer_interface (t)
- == importer_interface::always_import)
-   is_external = true;
- break;
-   }
+   if (!is_external
+   && VAR_OR_FUNCTION_DECL_P (t)
+   && get_importer_interface (t) == importer_interface::external)
+ is_external = true;
WB (is_external);
}
  
@@ -12651,7 +12637,7 @@ trees_out::write_function_def (tree decl)

/* Whether the importer should emit this definition, if used.  */
flags |= 1 * (DECL_NOT_REALLY_EXTERN (decl)
&& (get_importer_interface (decl)
-   != importer_interface::always_import));
+   != importer_interface::external));
  
if (f)

{

Re: [PATCH v3] libstdc++: Implement C++26 function_ref [PR119126]

2025-05-16 Thread Patrick Palka

On Thu, 15 May 2025, Tomasz Kamiński wrote:

> This patch implements C++26 function_ref as specified in P0792R14,
> with correction for constraints for constructor accepting nontype_t
> parameter from LWG 4256.
> 
> As function_ref may store a pointer to the const object, __Ptrs::_M_obj is
> changed to const void*, so again we do not cast away const from const
> objects. To help with necessary casts, a __polyfunc::__cast_to helper is
> added, that accepts reference to or target type direclty.
> 
> The _Invoker now defines additional call methods used by function_ref:
> _S_ptrs() for invoking target passed by reference, and __S_nttp, _S_bind_ptr,
> _S_bind_ref for handling constructors accepting nontype_t. The existing
> _S_call_storage is changed to thin wrapper, that initialies _Ptrs, and 
> forwards
> to _S_call_ptrs.
> 
> This reduced the most uses of _Storage::_M_ptr and _Storage::_M_ref,
> so this functions was removed, and _Manager uses were adjusted.
> 
> Finally we make function_ref available in freestanding mode, as
> move_only_function and copyable_function are currently only available in 
> hosted,
> so we define _Manager and _Mo_base only if either __glibcxx_move_only_function
> or __glibcxx_copyable_function is defined.
> 
>   PR libstdc++/119126
> 
> libstdc++-v3/ChangeLog:
> 
>   * doc/doxygen/stdheader.cc: Added funcref_impl.h file.
>   * include/Makefile.am: Added funcref_impl.h file.
>   * include/Makefile.in: Added funcref_impl.h file.
>   * include/bits/funcref_impl.h: New file.
>   * include/bits/funcwrap.h: (_Ptrs::_M_obj): Const-qualify.
>   (_Storage::_M_ptr, _Storage::_M_ref): Remove.
>   (__polyfunc::__cast_to) Define.
>   (_Base_invoker::_S_ptrs, _Base_invoker::_S_nttp)
>   (_Base_invoker::_S_bind_ptrs, _Base_invoker::_S_bind_ref)
>   (_Base_invoker::_S_call_ptrs): Define.
>   (_Base_invoker::_S_call_storage): Foward to _S_call_ptrs.
>   (_Manager::_S_local, _Manager::_S_ptr): Adjust for _M_obj being
>   const qualified.
>   (__polyfunc::_Manager, __polyfunc::_Mo_base): Guard with
>   __glibcxx_move_only_function || __glibcxx_copyable_function.
>   (__polyfunc::__skip_first_arg, __polyfunc::__deduce_funcref)
>   (std::function_ref) [__glibcxx_function_ref]: Define.
>   * include/bits/utility.h (std::nontype_t, std::nontype)
>   (__is_nontype_v) [__glibcxx_function_ref]: Define.
>   * include/bits/version.def: Define function_ref.
>   * include/bits/version.h: Regenerate.
>   * include/std/functional: Define __cpp_lib_function_ref.
>   * src/c++23/std.cc.in (std::nontype_t, std::nontype)
>   (std::function_ref) [__cpp_lib_function_ref]: Export.
>   * testsuite/20_util/function_ref/assign.cc: New test.
>   * testsuite/20_util/function_ref/call.cc: New test.
>   * testsuite/20_util/function_ref/cons.cc: New test.
>   * testsuite/20_util/function_ref/cons_neg.cc: New test.
>   * testsuite/20_util/function_ref/conv.cc: New test.
>   * testsuite/20_util/function_ref/deduction.cc: New test.
> ---
> This patch handles using nontype with function pointer/reference.
> In function_ref we now use _M_init(ptr) function, that handles
> both function and object pointers. The __polyfunc::__cast_to is
> also expaned to handle function_types.
> 
>  libstdc++-v3/doc/doxygen/stdheader.cc |   1 +
>  libstdc++-v3/include/Makefile.am  |   1 +
>  libstdc++-v3/include/Makefile.in  |   1 +
>  libstdc++-v3/include/bits/funcref_impl.h  | 198 
>  libstdc++-v3/include/bits/funcwrap.h  | 185 +++
>  libstdc++-v3/include/bits/utility.h   |  17 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/functional   |   3 +-
>  libstdc++-v3/src/c++23/std.cc.in  |   7 +
>  .../testsuite/20_util/function_ref/assign.cc  | 108 +
>  .../testsuite/20_util/function_ref/call.cc| 186 +++
>  .../testsuite/20_util/function_ref/cons.cc| 218 ++
>  .../20_util/function_ref/cons_neg.cc  |  30 +++
>  .../testsuite/20_util/function_ref/conv.cc| 152 
>  .../20_util/function_ref/deduction.cc | 103 +
>  16 files changed, 1181 insertions(+), 47 deletions(-)
>  create mode 100644 libstdc++-v3/include/bits/funcref_impl.h
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/assign.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/call.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons_neg.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/conv.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/deduction.cc
> 
> diff --git a/libstdc++-v3/doc/doxygen/stdheader.cc 
> b/libstdc++-v3/doc/doxyge

[PATCH v6 1/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-*warning messages due to code movements from compiler transformation (Part 1) [PR109071, PR85788, PR88771, PR106762, PR1

2025-05-16 Thread Qing Zhao

Control this with a new option -fdiagnostics-details.

$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
  if (*index >= 4)
warn();
  *regs = val;
}
struct nums {int vals[4];};

void sparx5_set (int *ptr, struct nums *sg, int index)
{
  int *val = &sg->vals[index];

  assign(0,ptr, &index);
  assign(*val, ptr, &index);
}

$ gcc -Wall -O2  -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

In the above, Although the warning is correct in theory, the warning message
itself is confusing to the end-user since there is information that cannot
be connected to the source code directly.

It will be a nice improvement to add more information in the warning message
to report where such index value come from.

In order to achieve this, we add a new data structure "move_history" to record
1. the "condition" that triggers the code movement;
2. whether the code movement is on the true path of the "condition";
3. the "compiler transformation" that triggers the code movement.

Whenever there is a code movement along control flow graph due to some
specific transformations, such as jump threading, path isolation, tree
sinking, etc., a move_history structure is created and attached to the
moved gimple statement.

During array out-of-bound checking or -Wstringop-* warning checking, the
"move_history" that was attached to the gimple statement is used to form
a sequence of diagnostic events that are added to the corresponding rich
location to be used to report the warning message.

This behavior is controled by the new option -fdiagnostics-details
which is off by default.

With this change, by adding -fdiagnostics-details,
the warning message for the above testing case is now:

$ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
  ‘sparx5_set’: events 1-2
4 |   if (*index >= 4)
  |  ^
  |  |
  |  (1) when the condition is evaluated to true
..
   12 |   int *val = &sg->vals[index];
  |   ~~~
  |   |
  |   (2) out of array bounds here
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

The change was divided into 3 parts:

Part 1: Add new data structure move_history, record move_history during
transformation;
Part 2: In warning analysis, Use the new move_history to form a rich
location with a sequence of events, to report more context info
of the warnings.
Part 3: Add debugging mechanism for move_history.

PR tree-optimization/109071
PR tree-optimization/85788
PR tree-optimization/88771
PR tree-optimization/106762
PR tree-optimization/108770
PR tree-optimization/115274
PR tree-optimization/117179

gcc/ChangeLog:

* Makefile.in (OBJS): Add diagnostic-move-history.o.
* gcc/common.opt (fdiagnostics-details): New option.
* gcc/doc/invoke.texi (fdiagnostics-details): Add
documentation for the new option.
* gimple-iterator.cc (gsi_remove): (gsi_remove): Remove the move
history when removing the gimple.
* gimple-pretty-print.cc (pp_gimple_stmt_1): Emit MV_H marking
if the gimple has a move_history.
* gimple-ssa-isolate-paths.cc (isolate_path): Set move history
for the gimples of the duplicated blocks.
* tree-ssa-sink.cc (sink_code_in_bb): Create move_history for
stmt when it is sinked.
* toplev.cc (toplev::finalize):  Call move_history_finalize.
* tree-ssa-threadupdate.cc (ssa_redirect_edges): Create move_history
for stmts when they are duplicated.
(back_jt_path_registry::duplicate_thread_path): Likewise.
* diagnostic-move-history.cc: New file.
* diagnostic-move-history.h: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/pr117375.c: New test.
---
 gcc/Makefile.in |   1 +
 gcc/common.opt  |   4 +
 gcc/diagnostic-move-history.cc  | 265 
 gcc/diagnostic-move-history.h   |  92 +++
 gcc/doc/invoke.texi |  11 ++
 gcc/gimple-iterator.cc  |   3 +
 gcc/gimple-pretty-print.cc  |   4 +
 gcc/gimple-ssa-isolate-paths.cc |  11 ++
 gcc/testsuite/gcc.dg/pr117375.c |  13 ++
 gcc/toplev.cc   |   3 +
 gcc/tree-ssa-sink.cc|  57 +++
 gcc/tree-ssa-threadupdate.cc|  25 +++
 12 files changed, 489 insertions(+)
 create mode 100644 gcc/

[PATCH v6 0/3][Middle-end]Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2025-05-16 Thread Qing Zhao

Hi,

This is the 6th version of the patches for fixing PR109071.

Adding -fdiagnotics-details into GCC to provide more hints to the
end users on how the warnings come from, in order to help the user
to locate the exact location in source code on the specific warnings
due to compiler optimizations.

compared to the 5th version, the only change is a slightly adjustment
to the new move-history-rich-location.[h|cc] due to an additional
param to simple_diagnostic_path's in r16-413. 

All the other are kept no change.

bootstrapping and regression testing on both x86 and aarch64. No issues.

Kees and Sam have been using this option for a while in linux kernel
and other applications and both found very helpful.

They asked me several times about the status of this work and hope
the functionality can be available in GCC as soon as possible.

The diagnostic part of the patch had been reviewed and approved by
David already.

Please review the middle-end part of the change.

thanks a lot.

Qing

===

The latest version of(5th version) is:

https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680336.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680337.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680338.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680339.html

The major improvements to this patch compared to version 3 are:

1. Divide the patch into 3 parts:
   Part 1: Add new data structure move_history, record move_history during
   transformation;
   Part 2: In warning analysis, Use the new move_history to form a rich
   location with a sequence of events, to report more context info
   of the warnings.
   Part 3: Add debugging mechanism for move_history.

2. Major change to the above Part 2, completely rewritten based on David's
  new class lazy_diagnostic_path.

3. Fix all issues identied By Sam;
  A. fix PR117375 (Bug in tree-ssa-sink.cc);
  B. documentation clarification;
  C. Add all the duplicated PRs in the commit comments;

4. Bootstrap GCC with the new -fdiagnostics-details on by default (Init (1)).
  exposed some ICE similar as PR117375 in tree-ssa-sink.cc, fixed.

[PATCH v6 2/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-*warning messages due to code movements from compiler transformation (Part 2) [PR109071, PR85788, PR88771, PR106762, PR1

2025-05-16 Thread Qing Zhao

During array out-of-bound checking or -Wstringop-* warning checking, the
"move_history" that was attached to the gimple statement is used to form
a sequence of diagnostic events that are added to the corresponding rich
location to be used to report the warning message.

PR tree-optimization/109071
PR tree-optimization/85788
PR tree-optimization/88771
PR tree-optimization/106762
PR tree-optimization/108770
PR tree-optimization/115274
PR tree-optimization/117179

gcc/ChangeLog:

* Makefile.in (OBJS): Add move-history-rich-location.o.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add
one new parameter. Use rich location with details for warning_at.
(array_bounds_checker::check_array_ref): Use rich location with
ditails for warning_at.
(array_bounds_checker::check_mem_ref): Add one new parameter.
Use rich location with details for warning_at.
(array_bounds_checker::check_addr_expr): Use rich location with
move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_array_bounds): Call check_mem_ref with
one more parameter.
* gimple-array-bounds.h: Update prototype for check_mem_ref.
* gimple-ssa-warn-restrict.cc (maybe_diag_access_bounds): Use
rich location with details for warning_at.
* gimple-ssa-warn-access.cc (warn_string_no_nul): Likewise.
(maybe_warn_nonstring_arg): Likewise.
(maybe_warn_for_bound): Likewise.
(warn_for_access): Likewise.
(check_access): Likewise.
(pass_waccess::check_strncat): Likewise.
(pass_waccess::maybe_check_access_sizes): Likewise.
* move-history-rich-location.cc: New file.
* move-history-rich-location.h: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/pr109071.c: New test.
* gcc.dg/pr109071_1.c: New test.
* gcc.dg/pr109071_2.c: New test.
* gcc.dg/pr109071_3.c: New test.
* gcc.dg/pr109071_4.c: New test.
* gcc.dg/pr109071_5.c: New test.
* gcc.dg/pr109071_6.c: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/gimple-array-bounds.cc|  39 +
 gcc/gimple-array-bounds.h |   2 +-
 gcc/gimple-ssa-warn-access.cc | 131 +-
 gcc/gimple-ssa-warn-restrict.cc   |  25 +++---
 gcc/move-history-rich-location.cc |  57 +
 gcc/move-history-rich-location.h  |  72 
 gcc/testsuite/gcc.dg/pr109071.c   |  43 ++
 gcc/testsuite/gcc.dg/pr109071_1.c |  36 
 gcc/testsuite/gcc.dg/pr109071_2.c |  50 
 gcc/testsuite/gcc.dg/pr109071_3.c |  42 ++
 gcc/testsuite/gcc.dg/pr109071_4.c |  41 ++
 gcc/testsuite/gcc.dg/pr109071_5.c |  33 
 gcc/testsuite/gcc.dg/pr109071_6.c |  49 +++
 14 files changed, 538 insertions(+), 83 deletions(-)
 create mode 100644 gcc/move-history-rich-location.cc
 create mode 100644 gcc/move-history-rich-location.h
 create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071_1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071_2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071_3.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071_4.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071_5.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071_6.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 38dfb688e60..48f2d578019 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1622,6 +1622,7 @@ OBJS = \
mcf.o \
mode-switching.o \
modulo-sched.o \
+   move-history-rich-location.o \
multiple_target.o \
omp-offload.o \
omp-expand.o \
diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 22286cbb4cc..79236e6b9c7 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -31,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "fold-const.h"
 #include "diagnostic-core.h"
+#include "move-history-rich-location.h"
 #include "intl.h"
 #include "tree-vrp.h"
 #include "alloc-pool.h"
@@ -262,6 +264,7 @@ get_up_bounds_for_array_ref (tree ref, tree *decl,
 
 static bool
 check_out_of_bounds_and_warn (location_t location, tree ref,
+ gimple *stmt,
  tree low_sub_org, tree low_sub, tree up_sub,
  tree up_bound, tree up_bound_p1,
  const irange *vr,
@@ -275,12 +278,13 @@ check_out_of_bounds_and_warn (location_t location, tree 
ref,
   bool warned = false;
   *out_of_bound = false;
 
+  rich_location_with_details r

[PATCH v6 3/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation (Part 3) [PR109071, PR85788, PR88771, PR106762, PR

2025-05-16 Thread Qing Zhao

Add debugging for move history.

PR tree-optimization/109071
PR tree-optimization/85788
PR tree-optimization/88771
PR tree-optimization/106762
PR tree-optimization/108770
PR tree-optimization/115274
PR tree-optimization/117179

gcc/ChangeLog:

* diagnostic-move-history.cc (dump_move_history): New routine.
(dump_move_history_for): Likewise.
(debug_mv_h): Likewise.
* diagnostic-move-history.h (dump_move_history): New prototype.
(dump_move_history_for): Likewise.
* gimple-ssa-isolate-paths.cc (isolate_path): Add debugging message
when setting move history for statements.
* tree-ssa-sink.cc (sink_code_in_bb): Likewise.
* tree-ssa-threadupdate.cc (ssa_redirect_edges): Likewise.
(back_jt_path_registry::duplicate_thread_path): Likewise.
---
 gcc/diagnostic-move-history.cc  | 67 +
 gcc/diagnostic-move-history.h   |  2 +
 gcc/gimple-ssa-isolate-paths.cc | 10 +
 gcc/tree-ssa-sink.cc| 10 -
 gcc/tree-ssa-threadupdate.cc| 18 +
 5 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/gcc/diagnostic-move-history.cc b/gcc/diagnostic-move-history.cc
index 83d8a42b577..045b0d3d5d1 100644
--- a/gcc/diagnostic-move-history.cc
+++ b/gcc/diagnostic-move-history.cc
@@ -25,6 +25,7 @@
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
+#include "tree-pretty-print.h"
 #include "gimple-iterator.h"
 #include "cfganal.h"
 #include "diagnostic-move-history.h"
@@ -263,3 +264,69 @@ set_move_history_to_stmts_in_bb (basic_block bb, edge 
entry,
 
   return true;
 }
+
+/* Dump the move_history data structure MV_HISTORY.  */
+
+void
+dump_move_history (FILE *file, move_history_t mv_history)
+{
+  fprintf (file, "The move history is: \n");
+  if (!mv_history)
+{
+  fprintf (file, "No move history.\n");
+  return;
+}
+
+  for (move_history_t cur_ch = mv_history; cur_ch;
+   cur_ch = cur_ch->prev_move)
+{
+  expanded_location exploc_cond = expand_location (cur_ch->condition);
+
+  if (exploc_cond.file)
+   fprintf (file, "[%s:", exploc_cond.file);
+  fprintf (file, "%d, ", exploc_cond.line);
+  fprintf (file, "%d] ", exploc_cond.column);
+
+  fprintf (file, "%s ", cur_ch->is_true_path ? "true" : "false");
+  const char *reason = NULL;
+  switch (cur_ch->reason)
+   {
+   case COPY_BY_THREAD_JUMP:
+ reason = "copy_by_thread_jump";
+ break;
+   case COPY_BY_ISOLATE_PATH:
+ reason = "copy_by_isolate_path";
+ break;
+   case MOVE_BY_SINK:
+ reason = "move_by_sink";
+ break;
+   default:
+ reason = "UNKNOWN";
+ break;
+   }
+  fprintf (file, "%s \n", reason);
+}
+}
+
+/* Dump the move_history date structure attached to the gimple STMT.  */
+void
+dump_move_history_for (FILE *file, const gimple *stmt)
+{
+  move_history_t mv_history = get_move_history (stmt);
+  if (!mv_history)
+fprintf (file, "No move history.\n");
+  else
+dump_move_history (file, mv_history);
+}
+
+DEBUG_FUNCTION void
+debug_mv_h (const move_history_t mv_history)
+{
+  dump_move_history (stderr, mv_history);
+}
+
+DEBUG_FUNCTION void
+debug_mv_h (const gimple * stmt)
+{
+  dump_move_history_for (stderr, stmt);
+}
diff --git a/gcc/diagnostic-move-history.h b/gcc/diagnostic-move-history.h
index 9a58766d544..0c56974119d 100644
--- a/gcc/diagnostic-move-history.h
+++ b/gcc/diagnostic-move-history.h
@@ -88,5 +88,7 @@ extern bool set_move_history_to_stmt (gimple *, edge,
of the entry edge.  */
 extern bool set_move_history_to_stmts_in_bb (basic_block, edge,
 bool, enum move_reason);
+extern void dump_move_history (FILE *, move_history_t);
+extern void dump_move_history_for (FILE *, const gimple *);
 
 #endif // DIAGNOSTIC_MOVE_HISTORY_H
diff --git a/gcc/gimple-ssa-isolate-paths.cc b/gcc/gimple-ssa-isolate-paths.cc
index 14c86590b17..bbaba09e192 100644
--- a/gcc/gimple-ssa-isolate-paths.cc
+++ b/gcc/gimple-ssa-isolate-paths.cc
@@ -176,6 +176,16 @@ isolate_path (basic_block bb, basic_block duplicate,
  incoming edge.  */
   if (flag_diagnostics_details)
 {
+  if (dump_file)
+   {
+ fprintf (dump_file, "Set move history for stmts of B[%d]"
+  " as not on the destination of the edge\n",
+  bb->index);
+ fprintf (dump_file, "Set move history for stmts of B[%d]"
+  " as on the destination of the edge\n",
+  duplicate->index);
+   }
+
   set_move_history_to_stmts_in_bb (bb, e, false, COPY_BY_ISOLATE_PATH);
   set_move_history_to_stmts_in_bb (duplicate, e,
   true, COPY_BY_ISOLATE_PATH);
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 0b3441e894c..cc096e178df 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc

Re: 2nd Ping: Re: [Stage 1][Middle-end][PATCH v5 0/3] Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2025-05-16 Thread Qing Zhao

Hi, Richard,

> On May 14, 2025, at 09:47, Richard Biener  wrote:
> 
> On Wed, May 14, 2025 at 3:24 PM Qing Zhao  wrote:
>> 
>> Hi,
>> 
>> This patch set has been waiting for the Middle-end review for a very long 
>> time since last year.
>> 
>> Could you Please take a look and let me know whether it’s ready for GCC16?
> 
> I plan to look at this next week while idling on a train.

Thanks a lot!

 FYI, I just sent the 6th version of the patch to the gcc alias, the only 
change compared to 5th
Version is a slightly adjustment to the new move-history-rich-location.[h|cc] 
due to an additional
param to simple_diagnostic_path's in r16-413. And has been approved by David.

All the Middle-end changes are kept the same as before.

Please review this 6th version. 

Looking forward to your review.

Qing

> Richard.
> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> On May 1, 2025, at 10:02, Qing Zhao  wrote:
>>> 
>>> Hi,
>>> 
>>> A gentle ping on review of the Middle-end of change of this patch set.
>>> The diagnostic part has been reviewed and approved by David last year 
>>> already.
>>> 
>>> The 4th version of the patch set has been sent for review since Nov 5, 2024.
>>> Pinged 5 times since then.
>>> 
>>> Linux Kernel has been using this feature for a while, and found it very 
>>> useful.
>>> Kees has been asking for the status of this patch many many times.  -:)
>>> 
>>> We are hoping to make this into GCC16, should be a nice improvement in 
>>> general.
>>> 
>>> Please take a look and let me know whether it’s ready for GCC16?
>>> 
>>> Thanks a lot.
>>> 
>>> Qing.
>>> 
>>> For your convenience, the links to the latest version are:
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680336.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680337.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680339.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680338.html
>>> 
>>> 
 On Apr 7, 2025, at 11:04, Qing Zhao  wrote:

 Hi,

 These are the patches for fixing PR109071 for GCC16 stage1:

 Adding -fdiagnotics-details into GCC to provide more hints to the
 end users on how the warnings come from, in order to help the user
 to locate the exact location in source code on the specific warnings
 due to compiler optimizations.

 They base on the the following 4th version of the patch and rebased
 on the latest trunk.

 bootstrapping and regression testing on both x86 and aarch64.

 Kees and Sam have been using this option for a while in linux kernel
 and other applications and both found very helpful.

 They asked me several times about the status of this work and hope
 the functionality can be available in GCC as soon as possible.

 The diagnostic part of the patch had been reviewed and approved by
 David already last year.

 Please review the middle-end part of the change.

 thanks a lot.

 Qing

 ===

 The latest version of(4th version) is:
 https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667613.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667614.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667615.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667616.html

 The major improvements to this patch compared to version 3 are:

 1. Divide the patch into 3 parts:
  Part 1: Add new data structure move_history, record move_history during
  transformation;
  Part 2: In warning analysis, Use the new move_history to form a rich
  location with a sequence of events, to report more context info
  of the warnings.
  Part 3: Add debugging mechanism for move_history.

 2. Major change to the above Part 2, completely rewritten based on David's
 new class lazy_diagnostic_path.

 3. Fix all issues identied By Sam;
 A. fix PR117375 (Bug in tree-ssa-sink.cc);
 B. documentation clarification;
 C. Add all the duplicated PRs in the commit comments;

 4. Bootstrap GCC with the new -fdiagnostics-details on by default (Init 
 (1)).
 exposed some ICE similar as PR117375 in tree-ssa-sink.cc, fixed.

 Qing Zhao (3):
 Provide more contexts for -Warray-bounds, -Wstringop-*warning messages
  due to code movements from compiler transformation (Part 1)
  [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
 Provide more contexts for -Warray-bounds, -Wstringop-*warning messages
  due to code movements from compiler transformation (Part 2)
  [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
 Provide more contexts for -Warray-bounds, -Wstringop-* warning
  messages due to code movements from compiler transformation (Part 3)
  [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]

 gcc/Makefil

[PATCH v2 0/2] tree-optimization: extend scalar comparison folding to vectors [PR119196]

2025-05-16 Thread Icen Zeyada

This patch generalizes existing scalar bitwise comparison simplifications to 
vector types by matching patterns of the form

```
(cmp x y) bit_and (cmp x y)
(cmp x y) bit_ior (cmp x y)
```

for vector operands, it also enables contradictory comparisons like `(x < y) && 
(x > y)` to fold to `false` in vector contexts or always true comparisons like 
`(x <= y) || (x > y)`to fold to `true`.

Icen Zeyada (2):
  tree-simplify: unify simple_comparison ops in vec_cond for bit and/or
[PR119196]
  gimple-fold: extend vector simplification to match scalar bitwise
optimizations [PR119196]

 gcc/match.pd  | 21 +++-
 .../gcc.target/aarch64/vector-compare-5.c | 49 +++
 2 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vector-compare-5.c

-- 
2.43.0

[PATCH v2 1/2] tree-simplify: unify simple_comparison ops in vec_cond for bit and/or [PR119196]

2025-05-16 Thread Icen Zeyada

Merge simple_comparison patterns under a single vec_cond_expr for bit_and
and bit_ior in the simplify pass.

Ensure that when both operands of a bit-and or bit-or are simple_comparison
results, they reside within the same vec_cond_expr rather than separate ones.
This prepares the AST so that subsequent transformations (e.g., folding the
comparisons if possible) can take effect.

PR tree-optimization/119196

gcc/ChangeLog:

* match.pd: Merge multiple vec_cond_expr in a single one for bit_and
and bit_ior.

Signed-off-by: Icen Zeyada 
---
 gcc/match.pd | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 79485f9678a0..da60d6a22290 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6524,6 +6524,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
{ build_int_cst (integer_type_node, prec - 1);}))
 #endif
 
+(for op1 (simple_comparison)
+ (for op2 (simple_comparison)
+  (for lop (bit_and bit_ior)
+(simplify
+  (lop
+   (vec_cond @0 integer_minus_onep@2 integer_zerop@3)
+   (vec_cond @1 @2 @3))
+   (if (expand_vec_cond_expr_p (type, TREE_TYPE (@0)))
+   (vec_cond (lop @0 @1) @2 @3))
+
 (for cnd (cond vec_cond)
  /* (a != b) ? (a - b) : 0 -> (a - b) */
  (simplify
-- 
2.43.0

[PATCH v2 2/2] gimple-fold: extend vector simplification to match scalar bitwise optimizations [PR119196]

2025-05-16 Thread Icen Zeyada

Generalize existing scalar gimple_fold rules to apply the same
bitwise comparison simplifications to vector types.  Previously, an
expression like

(x < y) && (x > y)

would fold to `false` if x and y are scalars, but equivalent vector
comparisons were left untouched.  This patch enables folding of
patterns of the form

(cmp x y) bit_and (cmp x y)
(cmp x y) bit_ior (cmp x y)

for vector operands as well, ensuring consistent optimization across
all data types.

PR tree-optimization/119196

gcc/ChangeLog:

  * match.pd: Allow scalar optimizations with bitwise AND/OR to apply to 
vectors.

gcc/testsuite/ChangeLog:

  * gcc.target/aarch64/vector-compare-5.c: Add new test for vector compare 
simplification.

Signed-off-by: Icen Zeyada 
---
 gcc/match.pd  | 11 -
 .../gcc.target/aarch64/vector-compare-5.c | 49 +++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vector-compare-5.c

diff --git a/gcc/match.pd b/gcc/match.pd
index da60d6a22290..cf1bf3749853 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3635,6 +3635,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if ((TREE_CODE (@1) == INTEGER_CST
 && TREE_CODE (@2) == INTEGER_CST)
|| ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || (VECTOR_TYPE_P (TREE_TYPE (@1))
+   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
 || POINTER_TYPE_P (TREE_TYPE (@1)))
&& bitwise_equal_p (@1, @2)))
 (with
@@ -3712,6 +3714,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if ((TREE_CODE (@1) == INTEGER_CST
&& TREE_CODE (@2) == INTEGER_CST)
|| ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || (VECTOR_TYPE_P (TREE_TYPE (@1))
+   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
|| POINTER_TYPE_P (TREE_TYPE (@1)))
   && operand_equal_p (@1, @2)))
(with
@@ -3762,6 +3766,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if ((TREE_CODE (@1) == INTEGER_CST
 && TREE_CODE (@2) == INTEGER_CST)
|| ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || (VECTOR_TYPE_P (TREE_TYPE (@1))
+   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
|| POINTER_TYPE_P (TREE_TYPE (@1)))
&& bitwise_equal_p (@1, @2)))
 (with
@@ -3894,7 +3900,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  rcmp (eq gt le eq ge lt)
  (simplify
   (eq:c (cmp1:c @0 @1) (cmp2 @0 @1))
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   || POINTER_TYPE_P (TREE_TYPE (@0))
+  || (VECTOR_TYPE_P (TREE_TYPE (@0))
+  && expand_vec_cmp_expr_p (TREE_TYPE (@0), type,  rcmp)))
 (rcmp @0 @1
 
 /* (type)([0,1]@a != 0) -> (type)a
diff --git a/gcc/testsuite/gcc.target/aarch64/vector-compare-5.c 
b/gcc/testsuite/gcc.target/aarch64/vector-compare-5.c
new file mode 100644
index ..59ab56c4e255
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vector-compare-5.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-original-all" } */
+
+typedef int v4i __attribute__((vector_size(4*sizeof(int;
+
+/* Ensure we can simplify `VEC_COND_EXPR(a OP1 b) OP2 VEC_COND_EXPR(a OP3 b)`
+ * into `VEC_COND_EXPR(a OP4 b)`
+ */
+
+void use (v4i const *z);
+
+void
+g (v4i *x, v4i const *y, v4i *z, v4i *t)
+{
+  *z = *x > *y | *x == *y; // expect >=
+  *t = *x > *y | *x <= *y; // expect true
+}
+
+void
+h (v4i *x, v4i const *y, v4i *z, v4i *t)
+{
+  *z = *x <= *y & *x >= *y; // expect x == y
+  *t = *x <= *y & *x != *y; // expect x *y;   // expect false
+}
+
+
+/* { dg-final { scan-tree-dump 
".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*>=\\s*VIEW_CONVERT_EXPR\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
 "original" } } */
+/* { dg-final { scan-tree-dump 
".*\\*tD\\.\\d+\\s*=\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*;" "original" } } */
+/* { dg-final { scan-tree-dump 
".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*==\\s*VIEW_CONVERT_EXPR\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
 "original" } } */
+/* { dg-final { scan-tree-dump 
".*\\*tD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*<\\s*VIEW_CONVERT_EXPR\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
 "original" } } */
+/* { dg-final { scan-tree-dump 
".*\\*zD\\.\\d+\\s*=\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*;" "original" } } */
+/* { dg-final { scan-tree-dump 
".*\\*tD\\.\\d+\\s*=\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*;" "original" } } */
+/* { dg-final { scan-tree-dump 
".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*<=\\s*VIEW_CONVERT_EXPR\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"

Re: [PATCH v2 1/2] tree-simplify: unify simple_comparison ops in vec_cond for bit and/or [PR119196]

2025-05-16 Thread Andrew Pinski

On Fri, May 16, 2025 at 9:32 AM Icen Zeyada  wrote:
>
> Merge simple_comparison patterns under a single vec_cond_expr for bit_and
> and bit_ior in the simplify pass.
>
> Ensure that when both operands of a bit-and or bit-or are simple_comparison
> results, they reside within the same vec_cond_expr rather than separate ones.
> This prepares the AST so that subsequent transformations (e.g., folding the
> comparisons if possible) can take effect.
>
> PR tree-optimization/119196
>
> gcc/ChangeLog:
>
> * match.pd: Merge multiple vec_cond_expr in a single one for bit_and
> and bit_ior.
>
> Signed-off-by: Icen Zeyada 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 79485f9678a0..da60d6a22290 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6524,6 +6524,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> { build_int_cst (integer_type_node, prec - 1);}))
>  #endif
>
> +(for op1 (simple_comparison)
> + (for op2 (simple_comparison)
> +  (for lop (bit_and bit_ior)
> +(simplify
> +  (lop
> +   (vec_cond @0 integer_minus_onep@2 integer_zerop@3)
> +   (vec_cond @1 @2 @3))
> +   (if (expand_vec_cond_expr_p (type, TREE_TYPE (@0)))
> +   (vec_cond (lop @0 @1) @2 @3))

I am trying to understand why you need op1/op2 here since they seem to
be unused?
Other than that it might make sense to extend `(a?0:-1) lop (b?0:-1)` too.
I am not sure but xor might show up; though I don't think it is as
important as &/| as you handle today.

Thanks,
Andrew

> +
>  (for cnd (cond vec_cond)
>   /* (a != b) ? (a - b) : 0 -> (a - b) */
>   (simplify
> --
> 2.43.0
>

Re: [PATCH v2] c++: Only reject virtual base data member access in __builtin_offsetof [PR118346]

2025-05-16 Thread Jason Merrill


On 5/10/25 11:41 AM, Simon Martin wrote:

The following test case highlights two issues - see
https://godbolt.org/z/7E1KGYreh:
  1. We error out at both L4 and L5, while (at least) clang, EDG and MSVC
 only reject L5
  2. Our error message for L5 incorrectly mentions using a null pointer

=== cut here ===
struct A { int i; };
struct C : virtual public A { };
void foo () {
   auto res = ((C*)0)->i;  // L4
   __builtin_offsetof (C, i);  // L5
}
=== cut here ===

Even though L4 is UB, it's technically not invalid, and this patch
transforms the error into a warning categorized under -Waddress (so that
it can be controlled by the user, and also deactivated for offsetof).

It also fixes the error message for L5 to not be confused by the
artificial null pointer created by cp_parser_builtin_offsetof.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118346

gcc/cp/ChangeLog:

* parser.cc (cp_parser_builtin_offsetof): Temporarily disable
-Waddress warnings.
* semantics.cc (finish_offsetof): Reject accesses to members in
virtual bases.
* typeck.cc (build_class_member_access_expr): Don't error but
warn about accesses to members in virtual bases.

gcc/testsuite/ChangeLog:

* g++.dg/other/offsetof8.C: Avoid -Wnarrowing warning.
* g++.dg/other/offsetof9.C: Adjust test expectations.
* g++.dg/other/offsetof10.C: New test.

---
  gcc/cp/parser.cc| 13 ++---
  gcc/cp/semantics.cc | 28 ++-
  gcc/cp/typeck.cc| 13 +++--
  gcc/testsuite/g++.dg/other/offsetof10.C | 36 +
  gcc/testsuite/g++.dg/other/offsetof8.C  |  6 ++---
  gcc/testsuite/g++.dg/other/offsetof9.C  |  6 ++---
  6 files changed, 76 insertions(+), 26 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/other/offsetof10.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1fb9e7fd872..43bbc69b196 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11521,10 +11521,15 @@ cp_parser_builtin_offsetof (cp_parser *parser)
tree object_ptr
  = build_static_cast (input_location, build_pointer_type (type),
 null_pointer_node, tf_warning_or_error);
-
-  /* Parse the offsetof-member-designator.  We begin as if we saw "expr->".  */
-  expr = cp_parser_postfix_dot_deref_expression (parser, CPP_DEREF, object_ptr,
-true, &dummy, token->location);
+  {
+/* PR c++/118346: don't complain about object_ptr being null.  */
+warning_sentinel s(warn_address);
+/* Parse the offsetof-member-designator.  We begin as if we saw
+   "expr->".  */
+expr = cp_parser_postfix_dot_deref_expression (parser, CPP_DEREF,
+  object_ptr, true, &dummy,
+  token->location);
+  }
while (true)
  {
token = cp_lexer_peek_token (parser->lexer);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 43a0eabfa12..9ba00196542 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5324,13 +5324,29 @@ finish_offsetof (tree object_ptr, tree expr, location_t 
loc)
  expr = TREE_OPERAND (expr, 0);
if (!complete_type_or_else (TREE_TYPE (TREE_TYPE (object_ptr)), object_ptr))
  return error_mark_node;
+
if (warn_invalid_offsetof
-  && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr)))
-  && CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr)))
-  && cp_unevaluated_operand == 0)
-warning_at (loc, OPT_Winvalid_offsetof, "% within "
-   "non-standard-layout type %qT is conditionally-supported",
-   TREE_TYPE (TREE_TYPE (object_ptr)));
+  && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr
+{
+  bool member_in_base_p = false;
+  if (TREE_CODE (expr) == COMPONENT_REF)
+   {
+ tree member = expr;
+ do {
+ member = TREE_OPERAND (member, 1);
+ } while (TREE_CODE (member) == COMPONENT_REF);
+ member_in_base_p = !same_type_p (TREE_TYPE (TREE_TYPE (object_ptr)),
+  DECL_CONTEXT (member));
+   }
+  if (member_in_base_p
+ && CLASSTYPE_VBASECLASSES (TREE_TYPE (TREE_TYPE (object_ptr
+   error_at (loc, "invalid % to data member in virtual base");


This will error if the class has virtual bases even if the member 
doesn't belong to one of them.  Instead of trying to duplicate the 
virtual base logic here, maybe we could mark the cast from null to 
indicate that it's coming from offsetof (i.e. with a TREE_LANG_FLAG_), 
and then check that in build_class_member_access_expr?


Jason

[PATCH] gimplify: Add -Wuse-before-shadow [PR92386]

2025-05-16 Thread Matthew Sotoudeh

This is a small patch to address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92386

When a variable is used before being shadowed in the same scope, GCC
outputs incorrect/misleading debug information. The Bugzilla report has
a minimal example and explains why a direct fix was not desirable (debug
info size would blow up).

In the Bugzilla discussion, Richard Biener suggests "this probably
deserves a warning and should be considered bad style?"

This patch implements such a "-Wuse-before-shadow" warning for these
scenarios. When gimplifying a named var, we throw a warning if it isn't
the most-recently bound declaration for that name. I wasn't able to find
a way to get the current stack of bindings for a given name, so I added
a new hash map (binding_stacks) to keep track of this (only when
-Wuse-before-shadow).

I tested on x86-64-linux-gnu (Debian 12) with make -k check and saw no
testsuite regressions using ./contrib/compare_tests. I also built Git
and Linux with the flag: each triggered one 'true positive' warning,
neither of which was an actual bug.

I have not tested on major projects in other languages (C++, etc.) so
there might be more false positives when -Wuse-before-shadow is enabled
for those languages (could restrict the warning to just C if this is a
concern?).

Any comments would be much appreciated; I'm new to the codebase.

(Also, sorry for any duplicate sends --- I had some DMARC issues
originally.)

PR debug/92386 - gdb issue with variable-shadowing

PR debug/92386

gcc/ChangeLog:

* common.opt: Added Wuse-before-shadow option; defaults to off.

* gimplify.cc (gimplify_bind_expr): when gimplifying a bind,
  keep track of the stack of variables bound to each identifier.
  (gimplify_var_or_parm_decl): when gimplifying a named var,
  check if it is the most-recently bound variable for that
  identifier.

gcc/testsuite/ChangeLog:

* gcc.dg/warn-use-before-shadow.c: New test.
---
 gcc/common.opt|  5 ++
 gcc/gimplify.cc   | 70 +++
 gcc/testsuite/gcc.dg/warn-use-before-shadow.c | 28 
 3 files changed, 103 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/warn-use-before-shadow.c

diff --git a/gcc/common.opt b/gcc/common.opt
index e3fa0dacec4..cb4a2754912 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -880,6 +880,11 @@ Wunused-variable
 Common Var(warn_unused_variable) Warning EnabledBy(Wunused)
 Warn when a variable is unused.
 
+Wuse-before-shadow
+Common Var(warn_use_before_shadow) Warning Init(0)
+Warn if a variable from an outer scope is used before it is shadowed in the
+current scope.
+
 Wcoverage-mismatch
 Common Var(warn_coverage_mismatch) Init(1) Warning
 Warn in case profiles in -fprofile-use do not match.
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 4f385b1b779..6854ced7b06 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -120,6 +120,17 @@ tree_associate_condition_with_expr (tree stmt, unsigned 
uid)
 /* Hash set of poisoned variables in a bind expr.  */
 static hash_set *asan_poisoned_variables = NULL;
 
+/* Maps a var name to a stack of variables currently bound to that name. This
+   global is only used if -Wuse-before-shadow is passed, in which case every
+   time a variable is gimplified we check this stack and throw a warning if
+   that variable is not the innermost binding to its name.  */
+static hash_map > binding_stacks;
+
+/* A list of all of the currently bound named variables. Basically a flattened
+   version of @binding_stacks. Used to pop bindings off @binding_stacks when
+   gimplify_bind_expr returns.  */
+static auto_vec  all_bindings;
+
 /* Hash set of already-resolved calls to OpenMP "declare variant"
functions.  A call can resolve to the original function and
we don't want to repeat the resolution multiple times.  */
@@ -1417,6 +1428,25 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
 
   tree temp = voidify_wrapper_expr (bind_expr, NULL);
 
+  /* Update the stack of variables bound to a given name. Used for
+   -Wuse-before-shadow to detect when a variable is used in a scope where it is
+   not the innermost bound instance of that name. Note that gimplification will
+   sometimes change the DECL_NAME associated with a tree (e.g., promoting an
+   initialized const array to a static) so we store the names in a separate
+   vector @all_bindings to pop from after exiting the bind expr.  */
+  size_t all_bindings_reset_idx = all_bindings.length ();
+  if (warn_use_before_shadow)
+{
+  for (tree t = BIND_EXPR_VARS (bind_expr); t ; t = DECL_CHAIN (t))
+{
+  if (VAR_P (t) && DECL_NAME (t) != NULL_TREE)
+{
+  binding_stacks.get_or_insert (DECL_NAME (t)).safe_push (t);
+  all_bindings.safe_push (DECL_NAME (t));
+}
+}
+}
+
   /* Mark variables seen in this bind expr.  */
   for (t = BIND_EXPR_VARS (bind_e

Re: [PATCH] gimplify: Add -Wuse-before-shadow [PR92386]

2025-05-16 Thread Andrew Pinski

On Fri, May 16, 2025 at 5:13 PM Matthew Sotoudeh
 wrote:
>
> This is a small patch to address
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92386
>
> When a variable is used before being shadowed in the same scope, GCC
> outputs incorrect/misleading debug information. The Bugzilla report has
> a minimal example and explains why a direct fix was not desirable (debug
> info size would blow up).
>
> In the Bugzilla discussion, Richard Biener suggests "this probably
> deserves a warning and should be considered bad style?"
>
> This patch implements such a "-Wuse-before-shadow" warning for these
> scenarios. When gimplifying a named var, we throw a warning if it isn't
> the most-recently bound declaration for that name. I wasn't able to find
> a way to get the current stack of bindings for a given name, so I added
> a new hash map (binding_stacks) to keep track of this (only when
> -Wuse-before-shadow).
>
> I tested on x86-64-linux-gnu (Debian 12) with make -k check and saw no
> testsuite regressions using ./contrib/compare_tests. I also built Git
> and Linux with the flag: each triggered one 'true positive' warning,
> neither of which was an actual bug.
>
> I have not tested on major projects in other languages (C++, etc.) so
> there might be more false positives when -Wuse-before-shadow is enabled
> for those languages (could restrict the warning to just C if this is a
> concern?).

Maybe then the gimplifier is not the right place to do this then.
-Wshadow is handled in warn_if_shadowing inside c-decl.cc which is
called from pushdecl.
Maybe you could do something inside there where you search the current
statement list (cur_stmt_list) for previous usages of the name?

Thanks,
Andrew

>
> Any comments would be much appreciated; I'm new to the codebase.
>
> (Also, sorry for any duplicate sends --- I had some DMARC issues
> originally.)
>
> PR debug/92386 - gdb issue with variable-shadowing
>
> PR debug/92386
>
> gcc/ChangeLog:
>
> * common.opt: Added Wuse-before-shadow option; defaults to off.
>
> * gimplify.cc (gimplify_bind_expr): when gimplifying a bind,
>   keep track of the stack of variables bound to each identifier.
>   (gimplify_var_or_parm_decl): when gimplifying a named var,
>   check if it is the most-recently bound variable for that
>   identifier.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/warn-use-before-shadow.c: New test.
> ---
>  gcc/common.opt|  5 ++
>  gcc/gimplify.cc   | 70 +++
>  gcc/testsuite/gcc.dg/warn-use-before-shadow.c | 28 
>  3 files changed, 103 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/warn-use-before-shadow.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e3fa0dacec4..cb4a2754912 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -880,6 +880,11 @@ Wunused-variable
>  Common Var(warn_unused_variable) Warning EnabledBy(Wunused)
>  Warn when a variable is unused.
>
> +Wuse-before-shadow
> +Common Var(warn_use_before_shadow) Warning Init(0)
> +Warn if a variable from an outer scope is used before it is shadowed in the
> +current scope.
> +
>  Wcoverage-mismatch
>  Common Var(warn_coverage_mismatch) Init(1) Warning
>  Warn in case profiles in -fprofile-use do not match.
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 4f385b1b779..6854ced7b06 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -120,6 +120,17 @@ tree_associate_condition_with_expr (tree stmt, unsigned 
> uid)
>  /* Hash set of poisoned variables in a bind expr.  */
>  static hash_set *asan_poisoned_variables = NULL;
>
> +/* Maps a var name to a stack of variables currently bound to that name. This
> +   global is only used if -Wuse-before-shadow is passed, in which case every
> +   time a variable is gimplified we check this stack and throw a warning if
> +   that variable is not the innermost binding to its name.  */
> +static hash_map > binding_stacks;
> +
> +/* A list of all of the currently bound named variables. Basically a 
> flattened
> +   version of @binding_stacks. Used to pop bindings off @binding_stacks when
> +   gimplify_bind_expr returns.  */
> +static auto_vec  all_bindings;
> +
>  /* Hash set of already-resolved calls to OpenMP "declare variant"
> functions.  A call can resolve to the original function and
> we don't want to repeat the resolution multiple times.  */
> @@ -1417,6 +1428,25 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
>
>tree temp = voidify_wrapper_expr (bind_expr, NULL);
>
> +  /* Update the stack of variables bound to a given name. Used for
> +   -Wuse-before-shadow to detect when a variable is used in a scope where it 
> is
> +   not the innermost bound instance of that name. Note that gimplification 
> will
> +   sometimes change the DECL_NAME associated with a tree (e.g., promoting an
> +   initialized const array to a static) so we store the names in a separate
> +   v

[PATCH 7/9] genemit: Add a generator struct

2025-05-16 Thread Richard Sandiford

gen_exp now has quite a few arguments that need to be passed
to each recursive call.  This patch turns it and related routines
into member functions of a new generator class, so that the shared
information can be stored in member variables.

This also helps to make later patches less noisy.

gcc/
* genemit.cc (generator): New structure.
(gen_rtx_scratch, gen_exp, gen_emit_seq): Turn into member
functions of generator.
(gen_insn, gen_expand, gen_split, output_add_clobbers): Update
users accordingly.
---
 gcc/genemit.cc | 76 --
 1 file changed, 55 insertions(+), 21 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index cdc098f19b8..df2b319fd23 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -66,8 +66,40 @@ print_code (RTX_CODE code, FILE *file)
 fprintf (file, "%c", TOUPPER (*p1));
 }
 
-static void
-gen_rtx_scratch (rtx x, enum rtx_code subroutine_type, FILE *file)
+/* A structure used to generate code for a particular expansion.  */
+struct generator
+{
+  generator (rtx_code, char *, const md_rtx_info &, FILE *);
+
+  void gen_rtx_scratch (rtx);
+  void gen_exp (rtx);
+  void gen_emit_seq (rtvec);
+
+  /* The type of subroutine that we're expanding.  */
+  rtx_code subroutine_type;
+
+  /* If nonnull, index N indicates that the original operand N has already
+ been used to replace a MATCH_OPERATOR or MATCH_DUP, and so any further
+ replacements must make a copy.  */
+  char *used;
+
+  /* The construct that we're expanding.  */
+  const md_rtx_info info;
+
+  /* The output file.  */
+  FILE *file;
+};
+
+generator::generator (rtx_code subroutine_type, char *used,
+ const md_rtx_info &info, FILE *file)
+  : subroutine_type (subroutine_type),
+used (used),
+info (info),
+file (file)
+{}
+
+void
+generator::gen_rtx_scratch (rtx x)
 {
   if (subroutine_type == DEFINE_PEEPHOLE2)
 {
@@ -82,9 +114,8 @@ gen_rtx_scratch (rtx x, enum rtx_code subroutine_type, FILE 
*file)
 /* Print a C expression to construct an RTX just like X,
substituting any operand references appearing within.  */
 
-static void
-gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
-const md_rtx_info &info, FILE *file)
+void
+generator::gen_exp (rtx x)
 {
   RTX_CODE code;
   int i;
@@ -128,7 +159,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
   for (i = 0; i < XVECLEN (x, 1); i++)
{
  fprintf (file, ",\n\t\t");
- gen_exp (XVECEXP (x, 1, i), subroutine_type, used, info, file);
+ gen_exp (XVECEXP (x, 1, i));
}
   fprintf (file, ")");
   return;
@@ -142,7 +173,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
   for (i = 0; i < XVECLEN (x, 2); i++)
{
  fprintf (file, ",\n\t\t");
- gen_exp (XVECEXP (x, 2, i), subroutine_type, used, info, file);
+ gen_exp (XVECEXP (x, 2, i));
}
   fprintf (file, ")");
   return;
@@ -153,7 +184,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
   return;
 
 case MATCH_SCRATCH:
-  gen_rtx_scratch (x, subroutine_type, file);
+  gen_rtx_scratch (x);
   return;
 
 case PC:
@@ -234,7 +265,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
   switch (fmt[i])
{
case 'e': case 'u':
- gen_exp (XEXP (x, i), subroutine_type, used, info, file);
+ gen_exp (XEXP (x, i));
  break;
 
case 'i':
@@ -266,7 +297,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
for (j = 0; j < XVECLEN (x, i); j++)
  {
fprintf (file, ",\n\t\t");
-   gen_exp (XVECEXP (x, i, j), subroutine_type, used, info, file);
+   gen_exp (XVECEXP (x, i, j));
  }
fprintf (file, ")");
break;
@@ -281,10 +312,10 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
 }
 
 /* Output code to emit the instruction patterns in VEC, with each element
-   becoming a separate instruction.  USED is as for gen_exp.  */
+   becoming a separate instruction.  */
 
-static void
-gen_emit_seq (rtvec vec, char *used, const md_rtx_info &info, FILE *file)
+void
+generator::gen_emit_seq (rtvec vec)
 {
   for (int i = 0, len = GET_NUM_ELEM (vec); i < len; ++i)
 {
@@ -293,7 +324,7 @@ gen_emit_seq (rtvec vec, char *used, const md_rtx_info 
&info, FILE *file)
   if (const char *name = get_emit_function (next))
{
  fprintf (file, "  %s (", name);
- gen_exp (next, DEFINE_EXPAND, used, info, file);
+ gen_exp (next);
  fprintf (file, ");\n");
  if (!last_p && needs_barrier_p (next))
fprintf (file, "  emit_barrier ();");
@@ -301,7 +332,7 @@ gen_emit_seq (rtvec vec, char *used, const md_rtx_info 
&info, FILE *file)
   else
{
  fprintf (file, "  emit (");
- gen_exp (next, DEFINE_EXPAND,

[PATCH 8/9] genemit: Always track multiple uses of operands

2025-05-16 Thread Richard Sandiford

gen_exp has code to detect when the same operand is used multiple
times.  It ensures that second and subsequent uses call copy_rtx,
to enforce correct unsharing.

However, for historical reasons that aren't clear to me, this was
skipped for a define_insn unless the define_insn was a parallel.
It was also skipped for a single define_expand instruction,
regardless of its contents.

This meant that a single parallel instruction was treated differently
between define_insn (where sharing rules were followed) and
define_expand (where sharing rules weren't followed).  define_splits
and define_peephole2s followed the sharing rules in all cases.

This patch makes everything follow the sharing rules.  The code
it touches will be removed by the proposed bytecode-based expansion,
which will use its own tracking when enforcing sharing rules.
However, it seemed better for staging and bisection purposes
to make this change first.

gcc/
* genemit.cc (generator::used): Update comment.
(generator::gen_exp): Remove handling of null unused arrays.
(gen_insn, gen_expand): Always pass a used array.
(output_add_clobbers): Note why the used array is null here.
---
 gcc/genemit.cc | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index df2b319fd23..a7e49a24506 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -78,9 +78,9 @@ struct generator
   /* The type of subroutine that we're expanding.  */
   rtx_code subroutine_type;
 
-  /* If nonnull, index N indicates that the original operand N has already
- been used to replace a MATCH_OPERATOR or MATCH_DUP, and so any further
- replacements must make a copy.  */
+  /* Index N indicates that the original operand N has already been used to
+ replace a MATCH_OPERATOR or MATCH_DUP, and so any further replacements
+ must make a copy.  */
   char *used;
 
   /* The construct that we're expanding.  */
@@ -135,15 +135,12 @@ generator::gen_exp (rtx x)
 {
 case MATCH_OPERAND:
 case MATCH_DUP:
-  if (used)
+  if (used[XINT (x, 0)])
{
- if (used[XINT (x, 0)])
-   {
- fprintf (file, "copy_rtx (operands[%d])", XINT (x, 0));
- return;
-   }
- used[XINT (x, 0)] = 1;
+ fprintf (file, "copy_rtx (operands[%d])", XINT (x, 0));
+ return;
}
+  used[XINT (x, 0)] = 1;
   fprintf (file, "operands[%d]", XINT (x, 0));
   return;
 
@@ -505,10 +502,7 @@ gen_insn (const md_rtx_info &info, FILE *file)
   /* Output code to construct and return the rtl for the instruction body.  */
 
   rtx pattern = add_implicit_parallel (XVEC (insn, 1));
-  /* ??? This is the traditional behavior, but seems suspect.  */
-  char *used = (XVECLEN (insn, 1) == 1
-   ? NULL
-   : XCNEWVEC (char, stats.num_generator_args));
+  char *used = XCNEWVEC (char, stats.num_generator_args);
   fprintf (file, "  return ");
   generator (DEFINE_INSN, used, info, file).gen_exp (pattern);
   fprintf (file, ";\n}\n\n");
@@ -555,10 +549,12 @@ gen_expand (const md_rtx_info &info, FILE *file)
   && stats.max_opno >= stats.max_dup_opno
   && XVECLEN (expand, 1) == 1)
 {
+  used = XCNEWVEC (char, stats.num_operand_vars);
   fprintf (file, "  return ");
-  generator (DEFINE_EXPAND, NULL, info, file)
+  generator (DEFINE_EXPAND, used, info, file)
.gen_exp (XVECEXP (expand, 1, 0));
   fprintf (file, ";\n}\n\n");
+  XDELETEVEC (used);
   return;
 }
 
@@ -717,6 +713,7 @@ output_add_clobbers (const md_rtx_info &info, FILE *file)
{
  fprintf (file, "  XVECEXP (pattern, 0, %d) = ", i);
  rtx clobbered_value = RTVEC_ELT (clobber->pattern, i);
+ /* Pass null for USED since there are no operands.  */
  generator (clobber->code, NULL, info, file)
.gen_exp (clobbered_value);
  fprintf (file, ";\n");
-- 
2.43.0

Re: [PATCH 1/9] nds32: Avoid accessing beyond the operands[] array

2025-05-16 Thread Jeff Law





On 5/16/25 11:21 AM, Richard Sandiford wrote:

This pattern used operands[2] to hold the shift amount, even though
the pattern doesn't have an operand 2 (not even as a match_dup).
This caused a build failure with -Werror:

   array subscript 2 is above array bounds of ‘rtx_def* [2]’

gcc/
* config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use
a local variable instead of operands[2].

Obviously OK.  IMHO you should just commit this kind of fix.

jeff

Re: [PATCH 2/9] xstormy16: Avoid accessing beyond the operands[] array

2025-05-16 Thread Jeff Law





On 5/16/25 11:21 AM, Richard Sandiford wrote:

The negsi2 C++ code writes to operands[2] even though the pattern
has no operand 2.

gcc/
* config/stormy16/stormy16.md (negsi2): Remove unused assignment.

Also obviously OK.
jeff

RE: [PATCH v2 2/3] aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]

2025-05-16 Thread quic_pzheng

> Pengxuan Zheng  writes:
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 15f08cebeb1..98ce85dfdae 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -23621,6 +23621,36 @@ aarch64_simd_valid_and_imm (rtx op)
> >return aarch64_simd_valid_imm (op, NULL, AARCH64_CHECK_AND);  }
> >
> > +/* Return true if OP is a valid SIMD and immediate which allows the
> > +and be
> 
> s/and be/and to be/
> 
> > +   optimized as fmov.  If ELT_SIZE is nonnull, it represents the size
of the
> > +   register for fmov.  */
> 
> Maybe rename this to ELT_BITSIZE (see below), and say:
> 
>   If ELT_BITSIZE is nonnull, use it to return the number of bits to move.
> 
> > +bool
> > +aarch64_simd_valid_and_imm_fmov (rtx op, unsigned int *elt_size) {
> > +  machine_mode mode = GET_MODE (op);
> > +  gcc_assert (!aarch64_sve_mode_p (mode));
> > +
> > +  auto_vec buffer;
> > +  unsigned int n_bytes = GET_MODE_SIZE (mode).to_constant ();
> > + buffer.reserve (n_bytes);
> > +
> > +  bool ok = native_encode_rtx (mode, op, buffer, 0, n_bytes);
> > + gcc_assert (ok);
> > +
> > +  auto mask = native_decode_int (buffer, 0, n_bytes, n_bytes *
> > +BITS_PER_UNIT);
> > +  int set_bit = wi::exact_log2 (mask + 1);
> > +  if ((set_bit == 16 && TARGET_SIMD_F16INST)
> > +  || set_bit == 32
> > +  || set_bit == 64)
> > +{
> > +  if (elt_size)
> > +   *elt_size = set_bit / BITS_PER_UNIT;
> 
> I didn't notice last time that the only consumer multiplies by
BITS_PER_UNIT
> again, so how about making this:
> 
>   *elt_bitsize = set_bit;
> 
> and removing the later multiplication.
> 
> Please leave 24 hours for other to comment, but otherwise the patch is ok
> with those changes, thanks.
> 
> Richard

Thanks, Richard! I've updated the patch accordingly and pushed it as
r16-703-g0417a630811404.

Pengxuan

RE: [PATCH v2 3/3] aarch64: Add more vector permute tests for the FMOV optimization [PR100165]

2025-05-16 Thread quic_pzheng

> Pengxuan Zheng  writes:
> > diff --git a/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c
> > b/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c
> > new file mode 100644
> > index 000..adbf87243f6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c
> > @@ -0,0 +1,130 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mlittle-endian" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +typedef short v4hi __attribute__ ((vector_size (8))); typedef char
> > +v8qi __attribute__ ((vector_size (8))); typedef int v4si
> > +__attribute__ ((vector_size (16))); typedef float v4sf __attribute__
> > +((vector_size (16))); typedef short v8hi __attribute__ ((vector_size
> > +(16))); typedef char v16qi __attribute__ ((vector_size (16)));
> > +
> > +/*
> > +** f_v4hi:
> > +** fmovs0, s0
> > +** ret
> > +*/
> > +v4hi
> > +f_v4hi (v4hi x)
> > +{
> > +  return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 0, 1, 4,
> > +5 }); }
> > +
> > +/*
> > +** g_v4hi:
> > +** uzp1v([0-9]+).2d, v0.2d, v0.2d
> > +** adrpx([0-9]+), .LC0
> > +** ldr d([0-9]+), \[x\2, #:lo12:.LC0\]
> > +** tbl v0.8b, {v\1.16b}, v\3.8b
> > +** ret
> 
> The important thing here is that we don't generate FMOV, rather than that
we
> generate the sequence above.  The test could therefore be:
> 
> **(?:(?!fmov).)*
> **ret
> 
> However, the test might be run with a newer architecture that has fp16
> enabled, so it would be safer to add:
> 
>   #pragma GCC target "armv8-a"
> 
> before the typedefs at the top of the file.
> 
> LGTM otherwise, thanks.  As with the other patches, please leave 24 hours
for
> others to comment, but otherwise the patch is ok with the changes above.
> 
> Richard

Thanks, Richard! I've updated the patch accordingly and pushed it as
r16-704-g265fdb3fa91346.

Pengxuan

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-16 Thread Jeff Law





On 5/16/25 11:21 AM, Richard Sandiford wrote:

One slightly awkward part about emitting the generator function
bodies is that:

* define_insn and define_expand routines have a separate argument for
   each operand, named "operand0" upwards.

* define_split and define_peephole2 routines take a pointer to an array,
   named "operands".

* the C++ preparation code for expands, splits and peephole2s uses an
   array called "operands" to refer to the operands.

* the automatically-generated code uses individual "operand"
   variables to refer to the operands.

So define_expands have to store the incoming arguments into an operands
array before the md file's C++ code, then copy the operands array back
to the individual variables before the automatically-generated code.
splits and peephole2s have to copy the incoming operands array to
individual variables after the md file's C++ code, creating more
local variables that are live across calls to rtx_alloc.

This patch tries to simplify things by making the whole function
body use the operands array in preference to individual variables.
define_insns and define_expands store their arguments to the array
on entry.

This would have pros and cons on its own, but having a single array
helps with future efforts to reduce the duplication between gen_*
functions.

Doing this tripped a warning in stormy16.md about writing beyond
the end of the array.  The negsi2 C++ code writes to operands[2]
even though the pattern has no operand 2.

gcc/
* genemit.cc (gen_rtx_scratch, gen_exp): Use operands[%d] rather than
operand%d.
(start_gen_insn): Store the incoming arguments to an operands array.
(gen_expand, gen_split): Remove copies into and out of the operands
array.
* config/stormy16/stormy16.md (negsi): Remove redundant assignment.
So two questions.  Is there any meanginful performance impact expected 
here using the array form rather than locals?   And does this impact how 
folks write their C/C++ fragments in the expanders and such?


Jeff

Re: [PATCH v2 6/8] RISC-V: Drop riscv_implied_info and riscv_combine_info in favor of riscv_ext_info_t data

2025-05-16 Thread Andreas Schwab

../../gcc/common/config/riscv/riscv-common.cc: In member function 'bool riscv_ex
t_info_t::apply_implied_ext(riscv_subset_list*) const':
../../gcc/common/config/riscv/riscv-common.cc:248:31: error: possibly dangling r
eference to a temporary [-Werror=dangling-reference]
  248 |   const riscv_ext_info_t &implied_ext_info
  |   ^~~~
../../gcc/common/config/riscv/riscv-common.cc:249:44: note: 'const std::string' 
{aka 'const std::__cxx11::basic_string'} temporary created here
  249 | = get_riscv_ext_info (implied_info.implied_ext);
  |   ~^~~
../../gcc/common/config/riscv/riscv-common.cc: In member function 'void 
riscv_subset_list::handle_implied_ext(const char*)':
../../gcc/common/config/riscv/riscv-common.cc:1092:27: error: possibly dangling 
reference to a temporary [-Werror=dangling-reference]
 1092 |   const riscv_ext_info_t &ext_info = get_riscv_ext_info (ext);
  |   ^~~~
../../gcc/common/config/riscv/riscv-common.cc:1092:58: note: 'const 
std::string' {aka 'const std::__cxx11::basic_string'} temporary created 
here
 1092 |   const riscv_ext_info_t &ext_info = get_riscv_ext_info (ext);
  |  ^~~
../../gcc/common/config/riscv/riscv-common.cc: In function 'bool 
riscv_minimal_hwprobe_feature_bits(const char*, riscv_feature_bits*, 
location_t)':
../../gcc/common/config/riscv/riscv-common.cc:1645:17: error: possibly dangling 
reference to a temporary [-Werror=dangling-reference]
 1645 |   auto &ext_info = get_riscv_ext_info (search_ext);
  | ^~~~
../../gcc/common/config/riscv/riscv-common.cc:1645:48: note: 'const 
std::string' {aka 'const std::__cxx11::basic_string'} temporary created 
here
 1645 |   auto &ext_info = get_riscv_ext_info (search_ext);
  |^~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2746: riscv-common.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH 8/9] genemit: Always track multiple uses of operands

2025-05-16 Thread Jeff Law





On 5/16/25 11:22 AM, Richard Sandiford wrote:

gen_exp has code to detect when the same operand is used multiple
times.  It ensures that second and subsequent uses call copy_rtx,
to enforce correct unsharing.

However, for historical reasons that aren't clear to me, this was
skipped for a define_insn unless the define_insn was a parallel.
It was also skipped for a single define_expand instruction,
regardless of its contents.

This meant that a single parallel instruction was treated differently
between define_insn (where sharing rules were followed) and
define_expand (where sharing rules weren't followed).  define_splits
and define_peephole2s followed the sharing rules in all cases.

This patch makes everything follow the sharing rules.  The code
it touches will be removed by the proposed bytecode-based expansion,
which will use its own tracking when enforcing sharing rules.
However, it seemed better for staging and bisection purposes
to make this change first.

gcc/
* genemit.cc (generator::used): Update comment.
(generator::gen_exp): Remove handling of null unused arrays.
(gen_insn, gen_expand): Always pass a used array.
(output_add_clobbers): Note why the used array is null here.

OK
jeff

Re: [PATCH 1/3] genemit: Remove support for string operands

2025-05-16 Thread Jeff Law





On 5/16/25 11:32 AM, Richard Sandiford wrote:

gen_exp currently supports the 's' (string) operand type.  It would
certainly be possible to make the upcoming bytecode patch support
that too.  However, the rtx codes that have string operands should
be very rarely used in hard-coded define_insn/expand/split/peephole2
rtx templates (as opposed to things like attribute expressions,
where const_string is commonplace).  And AFAICT, no current target
does use them like that.

This patch therefore reports an error for these rtx codes,
rather than adding code that would be unused and untested.

gcc/
* genemmit.cc (generator::gen_exp): Report an error for 's' operands.
OK.  And we'll get a pretty good sense if a port is doing something 
really weird in this space from my tester once this patch goes in.


jeff

Re: [PATCH 2/3] genemit: Avoid using gen_exp in output_add_clobbers

2025-05-16 Thread Jeff Law





On 5/16/25 11:32 AM, Richard Sandiford wrote:

output_add_clobbers emits code to add:

   (clobber (scratch:M))

and/or:

   (clobber (reg:M R))

expressions to the end of a PARALLEL.  At the moment, it does this
using the general gen_exp function.  That makes sense with the code
in its current form, but with later patches it's more convenient to
handle the two cases directly.

This also avoids having to pass an md_rtx_info that is unrelated
to the clobber expressions.

gcc/
* genemit.cc (clobber_pat::code): Delete.
(maybe_queue_insn): Don't set clobber_pat::code.
(output_add_clobbers): Remove info argument and output the two
REG and SCRATCH cases directly.
(main): Update call accordingly.

OK
jeff

[PATCH 4/9] genemit: Add an internal queue

2025-05-16 Thread Richard Sandiford

An earlier version of this series wanted to collect information
about all the gen_* functions that are going to be generated.
The current version no longer does that, but the queue seemed
worth keeping anyway, since it gives a more consistent structure.

gcc/
* genemit.cc (queue): New static variable.
(maybe_queue_insn): New function, split out from...
(gen_insn): ...here.
(queue_expand): New function, split out from...
(gen_expand): ...here.
(gen_split): New function, split out from...
(queue_split): ...here.
(main): Queue definitions for later processing rather than
emitting them on the fly.
---
 gcc/genemit.cc | 97 --
 1 file changed, 71 insertions(+), 26 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index cb4ae47294d..b73a45a0412 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -55,6 +55,9 @@ static void output_peephole2_scratches(rtx, FILE*);
 /* True for _optab if that optab isn't allowed to fail.  */
 static bool nofail_optabs[NUM_OPTABS];
 
+/* A list of the md constructs that need a gen_* function.  */
+static vec queue;
+
 static void
 print_code (RTX_CODE code, FILE *file)
 {
@@ -326,14 +329,12 @@ emit_c_code (const char *code, bool can_fail_p, const 
char *name, FILE *file)
   fprintf (file, "#undef FAIL\n");
 }
 
-/* Generate the `gen_...' function for a DEFINE_INSN.  */
+/* Process the DEFINE_INSN in LOC, and queue it if it needs a gen_*
+   function.  */
 
 static void
-gen_insn (const md_rtx_info &info, FILE *file)
+maybe_queue_insn (const md_rtx_info &info)
 {
-  struct pattern_stats stats;
-  int i;
-
   /* See if the pattern for this insn ends with a group of CLOBBERs of (hard)
  registers or MATCH_SCRATCHes.  If so, store away the information for
  later.  */
@@ -349,6 +350,7 @@ gen_insn (const md_rtx_info &info, FILE *file)
  && GET_CODE (RTVEC_ELT (pattern, 0)) == PARALLEL)
pattern = XVEC (RTVEC_ELT (pattern, 0), 0);
 
+  int i;
   for (i = GET_NUM_ELEM (pattern) - 1; i > 0; i--)
{
  if (GET_CODE (RTVEC_ELT (pattern, i)) != CLOBBER)
@@ -422,9 +424,19 @@ gen_insn (const md_rtx_info &info, FILE *file)
   if (XSTR (insn, 0)[0] == 0 || XSTR (insn, 0)[0] == '*')
 return;
 
-  fprintf (file, "/* %s:%d */\n", info.loc.filename, info.loc.lineno);
+  queue.safe_push (info);
+}
+
+/* Generate the `gen_...' function for a DEFINE_INSN.  */
+
+static void
+gen_insn (const md_rtx_info &info, FILE *file)
+{
+  struct pattern_stats stats;
+  int i;
 
   /* Find out how many operands this function has.  */
+  rtx insn = info.def;
   get_pattern_stats (&stats, XVEC (insn, 1));
   if (stats.max_dup_opno > stats.max_opno)
 fatal_at (info.loc, "match_dup operand number has no match_operand");
@@ -455,23 +467,31 @@ gen_insn (const md_rtx_info &info, FILE *file)
   XDELETEVEC (used);
 }
 
-/* Generate the `gen_...' function for a DEFINE_EXPAND.  */
+/* Process and queue the DEFINE_EXPAND in INFO.  */
 
 static void
-gen_expand (const md_rtx_info &info, FILE *file)
+queue_expand (const md_rtx_info &info)
 {
-  struct pattern_stats stats;
-  int i;
-  char *used;
-
   rtx expand = info.def;
   if (strlen (XSTR (expand, 0)) == 0)
 fatal_at (info.loc, "define_expand lacks a name");
   if (XVEC (expand, 1) == 0)
 fatal_at (info.loc, "define_expand for %s lacks a pattern",
  XSTR (expand, 0));
+  queue.safe_push (info);
+}
+
+/* Generate the `gen_...' function for a DEFINE_EXPAND.  */
+
+static void
+gen_expand (const md_rtx_info &info, FILE *file)
+{
+  struct pattern_stats stats;
+  int i;
+  char *used;
 
   /* Find out how many operands this function has.  */
+  rtx expand = info.def;
   get_pattern_stats (&stats, XVEC (expand, 1));
   if (stats.min_scratch_opno != -1
   && stats.min_scratch_opno <= MAX (stats.max_opno, stats.max_dup_opno))
@@ -564,7 +584,24 @@ gen_expand (const md_rtx_info &info, FILE *file)
   fprintf (file, "  return _val;\n}\n\n");
 }
 
-/* Like gen_expand, but generates insns resulting from splitting SPLIT.  */
+/* Process and queue the DEFINE_SPLIT or DEFINE_PEEPHOLE2 in INFO.  */
+
+static void
+queue_split (const md_rtx_info &info)
+{
+  rtx split = info.def;
+
+  if (XVEC (split, 0) == 0)
+fatal_at (info.loc, "%s lacks a pattern",
+ GET_RTX_NAME (GET_CODE (split)));
+  if (XVEC (split, 2) == 0)
+fatal_at (info.loc, "%s lacks a replacement pattern",
+ GET_RTX_NAME (GET_CODE (split)));
+
+  queue.safe_push (info);
+}
+
+/* Generate the `gen_...' function for a DEFINE_SPLIT or DEFINE_PEEPHOLE2.  */
 
 static void
 gen_split (const md_rtx_info &info, FILE *file)
@@ -577,13 +614,6 @@ gen_split (const md_rtx_info &info, FILE *file)
   const char *unused;
   char *used;
 
-  if (XVEC (split, 0) == 0)
-fatal_at (info.loc, "%s lacks a pattern",
- GET_RTX_NAME (GET_CODE (split)));
-  else if (XVEC (split, 2) == 0)
-fata

[PATCH 0/9] Some tweaks to genemit.cc

2025-05-16 Thread Richard Sandiford

This series tweaks various aspects of genemit.cc so that they are
easier to change later.  There should be no functional change.

Bootstrapped & regression-tested on aarch64-linux-gnu.  Also tested
using config-list.mk to get cross-target coverage.

The series contains a couple of obvious target changes.  Otherwise it's
stuff that I could self-approve, but I'll leave a few days for comments.

Richard Sandiford (9):
  nds32: Avoid accessing beyond the operands[] array
  xstormy16: Avoid accessing beyond the operands[] array
  genemit: Use references rather than pointers
  genemit: Add an internal queue
  genemit: Factor out code common to insns and expands
  genemit: Consistently use operand arrays in gen_* functions
  genemit: Add a generator struct
  genemit: Always track multiple uses of operands
  genemit: Remove purported handling of location_ts

 gcc/config/nds32/nds32-intrinsic.md |  11 +-
 gcc/config/stormy16/stormy16.md |   3 +-
 gcc/genemit.cc  | 327 
 3 files changed, 197 insertions(+), 144 deletions(-)

-- 
2.43.0

[PATCH 1/9] nds32: Avoid accessing beyond the operands[] array

2025-05-16 Thread Richard Sandiford

This pattern used operands[2] to hold the shift amount, even though
the pattern doesn't have an operand 2 (not even as a match_dup).
This caused a build failure with -Werror:

  array subscript 2 is above array bounds of ‘rtx_def* [2]’

gcc/
* config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use
a local variable instead of operands[2].
---
 gcc/config/nds32/nds32-intrinsic.md | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nds32/nds32-intrinsic.md 
b/gcc/config/nds32/nds32-intrinsic.md
index e05dce10509..85acea330f0 100644
--- a/gcc/config/nds32/nds32-intrinsic.md
+++ b/gcc/config/nds32/nds32-intrinsic.md
@@ -333,30 +333,31 @@ (define_expand "unspec_get_pending_int"
   ""
 {
   rtx system_reg = NULL_RTX;
+  rtx shift_amt = NULL_RTX;
 
   /* Set system register form nds32_intrinsic_register_names[].  */
   if ((INTVAL (operands[1]) >= NDS32_INT_H0)
   && (INTVAL (operands[1]) <= NDS32_INT_H15))
 {
   system_reg = GEN_INT (__NDS32_REG_INT_PEND__);
-  operands[2] = GEN_INT (31 - INTVAL (operands[1]));
+  shift_amt = GEN_INT (31 - INTVAL (operands[1]));
 }
   else if (INTVAL (operands[1]) == NDS32_INT_SWI)
 {
   system_reg = GEN_INT (__NDS32_REG_INT_PEND__);
-  operands[2] = GEN_INT (15);
+  shift_amt = GEN_INT (15);
 }
   else if ((INTVAL (operands[1]) >= NDS32_INT_H16)
   && (INTVAL (operands[1]) <= NDS32_INT_H31))
 {
   system_reg = GEN_INT (__NDS32_REG_INT_PEND2__);
-  operands[2] = GEN_INT (31 - INTVAL (operands[1]));
+  shift_amt = GEN_INT (31 - INTVAL (operands[1]));
 }
   else if ((INTVAL (operands[1]) >= NDS32_INT_H32)
   && (INTVAL (operands[1]) <= NDS32_INT_H63))
 {
   system_reg = GEN_INT (__NDS32_REG_INT_PEND3__);
-  operands[2] = GEN_INT (31 - (INTVAL (operands[1]) - 32));
+  shift_amt = GEN_INT (31 - (INTVAL (operands[1]) - 32));
 }
   else
 error ("% not support %,"
@@ -366,7 +367,7 @@ (define_expand "unspec_get_pending_int"
   if (system_reg != NULL_RTX)
 {
   emit_insn (gen_unspec_volatile_mfsr (operands[0], system_reg));
-  emit_insn (gen_ashlsi3 (operands[0], operands[0], operands[2]));
+  emit_insn (gen_ashlsi3 (operands[0], operands[0], shift_amt));
   emit_insn (gen_lshrsi3 (operands[0], operands[0], GEN_INT (31)));
   emit_insn (gen_unspec_dsb ());
 }
-- 
2.43.0

[PATCH 2/9] xstormy16: Avoid accessing beyond the operands[] array

2025-05-16 Thread Richard Sandiford

The negsi2 C++ code writes to operands[2] even though the pattern
has no operand 2.

gcc/
* config/stormy16/stormy16.md (negsi2): Remove unused assignment.
---
 gcc/config/stormy16/stormy16.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/stormy16/stormy16.md b/gcc/config/stormy16/stormy16.md
index 70c82827a4a..15c60ad0388 100644
--- a/gcc/config/stormy16/stormy16.md
+++ b/gcc/config/stormy16/stormy16.md
@@ -702,8 +702,7 @@ (define_expand "negsi2"
   [(parallel [(set (match_operand:SI 0 "register_operand" "")
   (neg:SI (match_operand:SI 1 "register_operand" "")))
  (clobber (reg:BI CARRY_REG))])]
-  ""
-  { operands[2] = gen_reg_rtx (HImode); })
+  "")
 
 (define_insn_and_split "*negsi2_internal"
   [(set (match_operand:SI 0 "register_operand" "=&r")
-- 
2.43.0

[PATCH 9/9] genemit: Remove purported handling of location_ts

2025-05-16 Thread Richard Sandiford

gen_exp had code to handle the 'L' operand format.  But this format
is specifically for location_ts, which are only used in RTX_INSNs.
Those should never occur in this context, where the input is always
an md file rather than an __RTL function.  Any hard-coded raw
location value would be meaningless anyway.

It seemed safer to turn this into an error rather than a gcc_unreachable.

gcc/
* genemit.cc (generator::gen_exp): Raise an error if we see
an 'L' operand.
---
 gcc/genemit.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index a7e49a24506..3636a555aad 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -270,7 +270,8 @@ generator::gen_exp (rtx x)
  break;
 
case 'L':
- fprintf (file, "%llu", (unsigned long long) XLOC (x, i));
+ fatal_at (info.loc, "'%s' rtxes are not supported in this context",
+   GET_RTX_NAME (code));
  break;
 
case 'r':
-- 
2.43.0

[PATCH 3/3] genemit: Use a byte encoding to generate insns

2025-05-16 Thread Richard Sandiford

genemit has traditionally used open-coded gen_rtx_FOO sequences
to build up the instruction pattern.  This is now the source of
quite a bit of bloat in the binary, and also a source of slow
compile times.

Two obvious ways of trying to deal with this are:

(1) Try to identify rtxes that have a similar form and use shared
routines to generate rtxes of that form.

(2) Use a static table to encode the rtx and call a common routine
to expand it.

I did briefly look at (1).  However, it's more complex than (2),
and I think suffers from being the worst of both worlds, for reasons
that I'll explain below.  This patch therefore does (2).

In theory, one of the advantages of open-coding the calls to
gen_rtx_FOO is that the rtx can be populated using stores of known
constants (for the rtx code, mode, unspec number, etc).  However,
the time spent constructing an rtx is likely to be dominated by
the call to rtx_alloc, rather than by the stores to the fields.

Option (1) above loses this advantage of storing constants.
The shared routines would parameterise an rtx according to things
like the modes on the rtx and its suboperands, so the code would
need to fetch the parameters.  In a sense, the rtx structure would
be open-coded but the parameters would be table-encoded (albeit
in a simple way).

The expansion code also shouldn't be particularly hot.  Anything that
treats expand/discard cycles as very cheap would be misconceived,
since each discarded expansion generates garbage memory that needs
to be cleaned up later.

Option (2) turns out to be pretty simple -- certainly simpler
than (1) -- and seems to give a reasonable saving.  Some numbers,
all for --enable-checking=yes,rtl,extra:

[A] size of the @progbits sections in insn-emit-*.o, new / old
[B] size of the load segments in cc1, new / old
[C] time to compile a typical insn-emit*.cc, new / old

Target [A]  [B]  [C]

native aarch64  0.5627   0.9585   0.5677
native x86_64   0.5925   0.9467   0.6377
aarch64-x-riscv64   0.   0.9066   0.2762

To get an idea of the effect on the final compiler, I tried compiling
fold-const.ii with -O0 (no -g), since that should give any slowdown
less room to hide.  I couldn't measure any difference in compile time
before or after the patch for any of the three variants above.

gcc/
* gensupport.h (needs_barrier_p): Delete.
* gensupport.cc (needs_barrier_p): Likewise.
* rtl.h (always_void_p): Return true for PC, RETURN and SIMPLE_RETURN.
(expand_opcode): New enum class.
(expand_rtx, complete_seq): Declare.
* emit-rtl.cc (rtx_expander): New class.
(expand_rtx, complete_seq): New functions.
* gengenrtl.cc (special_rtx, excluded_rtx): Add a cross-reference
comment.
* genemit.cc (FIRST_CODE): New constant.
(print_code): Delete.
(generator::file, generator::used, generator::sequence_type): Delete.
(generator::bytes): New member variable.
(generator::generator): Update accordingly.
(generator::gen_rtx_scratch): Delete.
(generator::add_uint, generator::add_opcode, generator::add_code)
(generator::add_match_operator, generator::add_exp)
(generator::add_vec, generator::gen_table): New member functions.
(generator::gen_exp): Rewrite to use a bytecode expansion.
(generator::gen_emit_seq): Likewise.
(start_gen_insn): Return the C++ expression for the operands array.
(gen_insn, gen_expand, gen_split): Update callers accordingly.
(emit_c_code): Remove use of _val.
---
 gcc/emit-rtl.cc   | 292 ++
 gcc/genemit.cc| 346 +++---
 gcc/gengenrtl.cc  |  10 +-
 gcc/gensupport.cc |  10 --
 gcc/gensupport.h  |   1 -
 gcc/rtl.h |  42 +-
 6 files changed, 480 insertions(+), 221 deletions(-)

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index fc7b6c7e297..57021fbb5ff 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "gimple-ssa.h"
 #include "gimplify.h"
+#include "bbitmap.h"
 
 struct target_rtl default_target_rtl;
 #if SWITCHABLE_TARGET
@@ -6777,6 +6778,297 @@ gen_int_shift_amount (machine_mode, poly_int64 value)
   return gen_int_mode (value, shift_mode);
 }
 
+namespace {
+/* Helper class for expanding an rtx using the encoding generated by
+   genemit.cc.  The code needs to be kept in sync with there.  */
+
+class rtx_expander
+{
+public:
+  rtx_expander (const uint8_t *, rtx *);
+
+  rtx get_rtx ();
+  rtvec get_rtvec ();
+  void expand_seq ();
+
+protected:
+  uint64_t get_uint ();
+  machine_mode get_mode () { return machine_mode (get_uint ()); }
+  char *get_string ();
+  rtx get_shared_operand ();
+  rtx get_unshared_operand ();
+
+  rtx get_rtx (expand_opcode);
+  rtx get_rtx (rtx_code, machine_mode

[PATCH 2/3] genemit: Avoid using gen_exp in output_add_clobbers

2025-05-16 Thread Richard Sandiford

output_add_clobbers emits code to add:

  (clobber (scratch:M))

and/or:

  (clobber (reg:M R))

expressions to the end of a PARALLEL.  At the moment, it does this
using the general gen_exp function.  That makes sense with the code
in its current form, but with later patches it's more convenient to
handle the two cases directly.

This also avoids having to pass an md_rtx_info that is unrelated
to the clobber expressions.

gcc/
* genemit.cc (clobber_pat::code): Delete.
(maybe_queue_insn): Don't set clobber_pat::code.
(output_add_clobbers): Remove info argument and output the two
REG and SCRATCH cases directly.
(main): Update call accordingly.
---
 gcc/genemit.cc | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 00f7a920ce9..7cdd9eb1d37 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -39,7 +39,6 @@ struct clobber_pat
   int first_clobber;
   struct clobber_pat *next;
   int has_hard_reg;
-  rtx_code code;
 } *clobber_list;
 
 /* Records one insn that uses the clobber list.  */
@@ -435,7 +434,6 @@ maybe_queue_insn (const md_rtx_info &info)
  p->first_clobber = i + 1;
  p->next = clobber_list;
  p->has_hard_reg = has_hard_reg;
- p->code = GET_CODE (insn);
  clobber_list = p;
}
 
@@ -691,7 +689,7 @@ gen_split (const md_rtx_info &info, FILE *file)
the end of the vector.  */
 
 static void
-output_add_clobbers (const md_rtx_info &info, FILE *file)
+output_add_clobbers (FILE *file)
 {
   struct clobber_pat *clobber;
   struct clobber_ent *ent;
@@ -709,12 +707,16 @@ output_add_clobbers (const md_rtx_info &info, FILE *file)
 
   for (i = clobber->first_clobber; i < GET_NUM_ELEM (clobber->pattern); 
i++)
{
- fprintf (file, "  XVECEXP (pattern, 0, %d) = ", i);
- rtx clobbered_value = RTVEC_ELT (clobber->pattern, i);
- /* Pass null for USED since there are no operands.  */
- generator (clobber->code, NULL, info, file)
-   .gen_exp (clobbered_value);
- fprintf (file, ";\n");
+ fprintf (file, "XVECEXP (pattern, 0, %d) ="
+  " gen_rtx_CLOBBER (VOIDmode, ", i);
+ rtx x = XEXP (RTVEC_ELT (clobber->pattern, i), 0);
+ if (REG_P (x))
+   fprintf (file, "gen_rtx_REG (%smode, %d)",
+GET_MODE_NAME (GET_MODE (x)), REGNO (x));
+ else
+   fprintf (file, "gen_rtx_SCRATCH (%smode)",
+GET_MODE_NAME (GET_MODE (x)));
+ fprintf (file, ");\n");
}
 
   fprintf (file, "  break;\n\n");
@@ -1034,7 +1036,7 @@ main (int argc, const char **argv)
 
   /* Write out the routines to add CLOBBERs to a pattern and say whether they
  clobber a hard reg.  */
-  output_add_clobbers (info, file);
+  output_add_clobbers (file);
   output_added_clobbers_hard_reg_p (file);
 
   for (overloaded_name *oname = rtx_reader_ptr->get_overloads ();
-- 
2.43.0

Re: [PATCH v2 1/2] tree-simplify: unify simple_comparison ops in vec_cond for bit and/or [PR119196]

2025-05-16 Thread Andrew Pinski

On Fri, May 16, 2025 at 9:49 AM Andrew Pinski  wrote:
>
> On Fri, May 16, 2025 at 9:32 AM Icen Zeyada  wrote:
> >
> > Merge simple_comparison patterns under a single vec_cond_expr for bit_and
> > and bit_ior in the simplify pass.
> >
> > Ensure that when both operands of a bit-and or bit-or are simple_comparison
> > results, they reside within the same vec_cond_expr rather than separate 
> > ones.
> > This prepares the AST so that subsequent transformations (e.g., folding the
> > comparisons if possible) can take effect.
> >
> > PR tree-optimization/119196
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Merge multiple vec_cond_expr in a single one for bit_and
> > and bit_ior.
> >
> > Signed-off-by: Icen Zeyada 
> > ---
> >  gcc/match.pd | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 79485f9678a0..da60d6a22290 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6524,6 +6524,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > { build_int_cst (integer_type_node, prec - 1);}))
> >  #endif
> >
> > +(for op1 (simple_comparison)
> > + (for op2 (simple_comparison)
> > +  (for lop (bit_and bit_ior)
> > +(simplify
> > +  (lop
> > +   (vec_cond @0 integer_minus_onep@2 integer_zerop@3)
> > +   (vec_cond @1 @2 @3))
> > +   (if (expand_vec_cond_expr_p (type, TREE_TYPE (@0)))
> > +   (vec_cond (lop @0 @1) @2 @3))
>
> I am trying to understand why you need op1/op2 here since they seem to
> be unused?
> Other than that it might make sense to extend `(a?0:-1) lop (b?0:-1)` too.
> I am not sure but xor might show up; though I don't think it is as
> important as &/| as you handle today.

I missed something in my review.
The transformation is valid for any value of @2 and not just -1.

And you should be able to handle `cond` (scalar) and not just vec_cond
as a follow up if you need.
```
int f1(bool a, bool b, int c, int d)
{
  return (a ? c : 0) & (b ? c : 0);
}
```

>
> Thanks,
> Andrew
>
> > +
> >  (for cnd (cond vec_cond)
> >   /* (a != b) ? (a - b) : 0 -> (a - b) */
> >   (simplify
> > --
> > 2.43.0
> >

Re: [PATCH v2 2/2] emit-rtl: Validate mode for paradoxical hardware subregs [PR119966]

2025-05-16 Thread Richard Sandiford

Dimitar Dimitrov  writes:
> After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
> following RTL for pru-unknown-elf:
>
>   (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])
>   (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855])
>   (const_int 3 [0x3])))
>(nil))
>   ...
>   (insn 3961 7067 3962 255 (set (reg:SI 56 r14.b0)
>   (zero_extend:SI (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])))
>(nil))
>
> into:
>
>   (insn 3961 7067 3962 255 (set (reg:SI 56 r14.b0)
>   (and:SI (subreg:SI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855]) 0)
>   (const_int 3 [0x3])))
>(nil))
>
> That caused libbacktrace build to break for pru-unknown-elf.  Register
> r0.b1 (regno 1) is not valid for SImode, which validate_subreg failed to
> reject.
>
> Fix by calling HARD_REGNO_MODE_OK to ensure that both inner and outer
> modes are valid for the hardware subreg.
>
> This patch fixes the broken PRU toolchain build.  It leaves only two
> test case regressions for PRU, caused by rnreg pass renaming a valid
> paradoxical subreg into an invalid one.
>   gcc.c-torture/execute/20040709-1.c
>   gcc.c-torture/execute/20040709-2.c
>
>   PR target/119966
>
> gcc/ChangeLog:
>
>   * emit-rtl.cc (validate_subreg): Validate inner
>   and outer mode for paradoxical hardware subregs.
>
> Co-authored-by: Andrew Pinski 
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/emit-rtl.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index e46b0f9eac4..6c5d9b55508 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -983,6 +983,9 @@ validate_subreg (machine_mode omode, machine_mode imode,
>if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> && GET_MODE_INNER (imode) == omode)
>   ;
> +  else if (!targetm.hard_regno_mode_ok (regno, imode)
> +|| !targetm.hard_regno_mode_ok (regno, omode))
> + return false;

It isn't meaningful to test regno against omode, since that isn't
necessarily the register that would be produced by the subreg.

ISTR that this is a sensitive part of the codebase.  I think there
are/were targets that create unfoldable subregs for argument passing
and return.  And I think e500 had unfoldable subregs of FP registers,
although that port is gone now.

So I suppose the question is: when given a hard register, should
validate_subreg test whether the subreg can be folded to a hard
register?  Or is it more relaxed than that?  Do we need different
rules before LRA (which could fix up subregs through reloading)
and after LRA (where unfoldable subregs stay unfoldable).

If validate_subreg should test whether a subreg of a hard register
can be folded to a hard register, the fix would be to use
simplify_subreg_regno instead of the current tests.  But it looks
like that was deliberately not done.

It might still be worth trying to use simplify_subreg_regno and
seeing what breaks.  Any fallaout would at least let us expand
the comments to explain the constraints.

Thanks,
Richard

Contents of PO file 'cpplib-15.1-b20250316.es.po'

2025-05-16 Thread Translation Project Robot



cpplib-15.1-b20250316.es.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

RE: [PATCH v2 1/3] aarch64: Recognize vector permute patterns which can be interpreted as AND [PR100165]

2025-05-16 Thread quic_pzheng

> Pengxuan Zheng  writes:
> **...
> **and v0.8b, (?:v0.8b, v[0-9]+.8b|v[0-9]+.8b, v0.8b)
> **ret
> 
> Same for other tests that can't use a move immediate.
> 
> Please leave 24 hours for others to comment on the target-independent
part,
> but otherwise the patch is ok with the changes above.  Thanks again for
doing
> this: it's a really nice improvement.
> 
> Richard

Thanks, Richard! I've updated the patch accordingly and pushed it as
r16-702-gdc501cb0dc8576.

Pengxuan

Re: [PATCH 7/9] genemit: Add a generator struct

2025-05-16 Thread Jeff Law





On 5/16/25 11:22 AM, Richard Sandiford wrote:

gen_exp now has quite a few arguments that need to be passed
to each recursive call.  This patch turns it and related routines
into member functions of a new generator class, so that the shared
information can be stored in member variables.

This also helps to make later patches less noisy.

gcc/
* genemit.cc (generator): New structure.
(gen_rtx_scratch, gen_exp, gen_emit_seq): Turn into member
functions of generator.
(gen_insn, gen_expand, gen_split, output_add_clobbers): Update
users accordingly.
OK.  And thanks for doing this.  Too many function arguments is one of 
those signs that some C++-ification may be helpful.  IT's not the only 
factor, but one that's pretty consistently led to simpler code in the end.


jeff

Re: [PATCH 9/9] genemit: Remove purported handling of location_ts

2025-05-16 Thread Jeff Law





On 5/16/25 11:22 AM, Richard Sandiford wrote:

gen_exp had code to handle the 'L' operand format.  But this format
is specifically for location_ts, which are only used in RTX_INSNs.
Those should never occur in this context, where the input is always
an md file rather than an __RTL function.  Any hard-coded raw
location value would be meaningless anyway.

It seemed safer to turn this into an error rather than a gcc_unreachable.

gcc/
* genemit.cc (generator::gen_exp): Raise an error if we see
an 'L' operand.

OK
jeff

[PATCH 0/3] Make genemit.cc use a byte encoding of the rtx pattern

2025-05-16 Thread Richard Sandiford

[Thanks Jeff and Richard for the reviews of the end_sequence series.
This is the series that that one was written for.]

genemit has traditionally used open-coded gen_rtx_FOO sequences
to build up the instruction pattern.  This is now the source of
quite a bit of bloat in the binary, and also a source of slow
compile times.

This series switches over to using a byte encoding of the rtx
pattern instead.  See the description of the final patch for details
and performance measurements.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu.  Also tested using config-list.mk.  OK to install?

Richard


Richard Sandiford (3):
  genemit: Remove support for string operands
  genemit: Avoid using gen_exp in output_add_clobbers
  genemit: Use a byte encoding to generate insns

 gcc/emit-rtl.cc   | 292 
 gcc/genemit.cc| 373 +++---
 gcc/gengenrtl.cc  |  10 +-
 gcc/gensupport.cc |  10 --
 gcc/gensupport.h  |   1 -
 gcc/rtl.h |  42 +-
 6 files changed, 493 insertions(+), 235 deletions(-)

-- 
2.43.0

[PATCH] libstdc++: Fix std::format of chrono::local_days with {} [PR120293]

2025-05-16 Thread Jonathan Wakely

Formatting of chrono::local_days with an empty chrono-specs should be
equivalent to inserting it into an ostream, which should use the
overload for inserting chrono::sys_days into an ostream. The
implementation of empty chrono-specs in _M_format_to_ostream takes some
short cuts, and that wasn't being done correctly for chrono::local_days.

libstdc++-v3/ChangeLog:

PR libstdc++/120293
* include/bits/chrono_io.h (_M_format_to_ostream): Add special
case for local_time convertible to local_days.
* testsuite/std/time/clock/local/io.cc: Check formatting of
chrono::local_days.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/chrono_io.h | 3 +++
 libstdc++-v3/testsuite/std/time/clock/local/io.cc | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index ace8b9f26292..92a3098e808c 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -766,6 +766,9 @@ namespace __format
  // sys_time with period greater or equal to days:
  if constexpr (is_convertible_v<_Tp, chrono::sys_days>)
__os << _S_date(__t);
+ // Or a local_time with period greater or equal to days:
+ else if constexpr (is_convertible_v<_Tp, chrono::local_days>)
+   __os << _S_date(__t);
  else // Or it's formatted as "{:L%F %T}":
{
  auto __days = chrono::floor(__t);
diff --git a/libstdc++-v3/testsuite/std/time/clock/local/io.cc 
b/libstdc++-v3/testsuite/std/time/clock/local/io.cc
index b4d562f36d12..67818e876497 100644
--- a/libstdc++-v3/testsuite/std/time/clock/local/io.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/local/io.cc
@@ -89,6 +89,9 @@ test_format()
 
   s = std::format("{}", local_seconds{});
   VERIFY( s == "1970-01-01 00:00:00" );
+
+  s = std::format("{}", local_days{}); // PR libstdc++/120293
+  VERIFY( s == "1970-01-01" );
 }
 
 void
-- 
2.49.0

[PATCH] libstdc++: Fix some Clang -Wsystem-headers warnings in

2025-05-16 Thread Jonathan Wakely

libstdc++-v3/ChangeLog:

* include/std/ranges (_ZipTransform::operator()): Remove name of
unused parameter.
(chunk_view::_Iterator, stride_view::_Iterator): Likewise.
(join_with_view): Declare _Iterator and _Sentinel as class
instead of struct.
(repeat_view): Declare _Iterator as class instead of struct.
---

Tested x86_64-linux.

 libstdc++-v3/include/std/ranges | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 9300c364a165..210ac8274fc1 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -5336,7 +5336,7 @@ namespace views::__adaptor
requires move_constructible> && 
regular_invocable&>
  && is_object_v&>>>
constexpr auto
-   operator() [[nodiscard]] (_Fp&& __f) const
+   operator() [[nodiscard]] (_Fp&&) const
{
  return views::empty&>>>;
}
@@ -6598,7 +6598,7 @@ namespace views::__adaptor
 }
 
 friend constexpr difference_type
-operator-(default_sentinel_t __y, const _Iterator& __x)
+operator-(default_sentinel_t, const _Iterator& __x)
   requires sized_sentinel_for, iterator_t<_Base>>
 { return __detail::__div_ceil(__x._M_end - __x._M_current, __x._M_n); }
 
@@ -7287,8 +7287,8 @@ namespace views::__adaptor
using iterator_category = decltype(_S_iter_cat());
 };
 
-template struct _Iterator;
-template struct _Sentinel;
+template class _Iterator;
+template class _Sentinel;
 
   public:
 join_with_view() requires (default_initializable<_Vp>
@@ -7743,7 +7743,7 @@ namespace views::__adaptor
 __detail::__box<_Tp> _M_value;
 [[no_unique_address]] _Bound _M_bound = _Bound();
 
-struct _Iterator;
+class _Iterator;
 
 template
 friend constexpr auto
@@ -8303,7 +8303,7 @@ namespace views::__adaptor
 }
 
 friend constexpr difference_type
-operator-(default_sentinel_t __y, const _Iterator& __x)
+operator-(default_sentinel_t, const _Iterator& __x)
   requires sized_sentinel_for, iterator_t<_Base>>
 { return __detail::__div_ceil(__x._M_end - __x._M_current, __x._M_stride); 
}
 
-- 
2.49.0

[PATCH 1/3] genemit: Remove support for string operands

2025-05-16 Thread Richard Sandiford

gen_exp currently supports the 's' (string) operand type.  It would
certainly be possible to make the upcoming bytecode patch support
that too.  However, the rtx codes that have string operands should
be very rarely used in hard-coded define_insn/expand/split/peephole2
rtx templates (as opposed to things like attribute expressions,
where const_string is commonplace).  And AFAICT, no current target
does use them like that.

This patch therefore reports an error for these rtx codes,
rather than adding code that would be unused and untested.

gcc/
* genemmit.cc (generator::gen_exp): Report an error for 's' operands.
---
 gcc/genemit.cc | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 3636a555aad..00f7a920ce9 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -270,6 +270,7 @@ generator::gen_exp (rtx x)
  break;
 
case 'L':
+   case 's':
  fatal_at (info.loc, "'%s' rtxes are not supported in this context",
GET_RTX_NAME (code));
  break;
@@ -284,10 +285,6 @@ generator::gen_exp (rtx x)
  fprintf (file, "%d", SUBREG_BYTE (x).to_constant ());
  break;
 
-   case 's':
- fprintf (file, "\"%s\"", XSTR (x, i));
- break;
-
case 'E':
  {
int j;
-- 
2.43.0

Re: [PATCH v22 0/3] c: Add _Countof and

2025-05-16 Thread Alejandro Colomar

Hi Joseph,

On Fri, May 16, 2025 at 05:01:36PM +, Joseph Myers wrote:
> On Fri, 16 May 2025, Alejandro Colomar wrote:
> 
> > Hmmm, I've been trying to find a compromise between readability and
> > simplicity, and I think I have something.  I've seen some tests that
> > define assert() themselves.  I like assert(3) because it's more
> > readable compared to a conditional plus abort(3).
> > 
> > So, how do you feel about the following change?
> > 
> > diff --git i/gcc/testsuite/gcc.dg/countof-stdcountof.c 
> > w/gcc/testsuite/gcc.dg/countof-stdcountof.c
> > index a7fe4079c69..2fb0c6306ef 100644
> > --- i/gcc/testsuite/gcc.dg/countof-stdcountof.c
> > +++ w/gcc/testsuite/gcc.dg/countof-stdcountof.c
> > @@ -3,8 +3,7 @@
> >  
> >  #include 
> >  
> > -#undef NDEBUG
> > -#include 
> > +#define assert(e)  ((e) ? (void) 0 : __builtin_abort ())
> 
> Yes, I think that's a reasonable way for a test to do its assertions with 
> assert syntax but without depending unnecessarily on libc headers.

If there are any other issues, I'll apply that change for v23.  If this
is the only one, would you mind amending yourself with that while
committing?  Thanks!


Cheers,
Alex

-- 



signature.asc
Description: PGP signature

[PATCH 3/9] genemit: Use references rather than pointers

2025-05-16 Thread Richard Sandiford

This patch makes genemit.cc pass the md_rtx_info around by constant
reference rather than pointer.  It's somewhat of a cosmetic change
on its own, but it makes later changes less noisy.

gcc/
* genemit.cc (gen_exp): Make the info argument a constant reference.
(gen_emit_seq, gen_insn, gen_expand, gen_split): Likewise.
(output_add_clobbers): Likewise.
(main): Update calls accordingly.
---
 gcc/genemit.cc | 60 +-
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 9f92364d906..cb4ae47294d 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -80,8 +80,8 @@ gen_rtx_scratch (rtx x, enum rtx_code subroutine_type, FILE 
*file)
substituting any operand references appearing within.  */
 
 static void
-gen_exp (rtx x, enum rtx_code subroutine_type, char *used, md_rtx_info *info,
-FILE *file)
+gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
+const md_rtx_info &info, FILE *file)
 {
   RTX_CODE code;
   int i;
@@ -281,7 +281,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used, 
md_rtx_info *info,
becoming a separate instruction.  USED is as for gen_exp.  */
 
 static void
-gen_emit_seq (rtvec vec, char *used, md_rtx_info *info, FILE *file)
+gen_emit_seq (rtvec vec, char *used, const md_rtx_info &info, FILE *file)
 {
   for (int i = 0, len = GET_NUM_ELEM (vec); i < len; ++i)
 {
@@ -329,7 +329,7 @@ emit_c_code (const char *code, bool can_fail_p, const char 
*name, FILE *file)
 /* Generate the `gen_...' function for a DEFINE_INSN.  */
 
 static void
-gen_insn (md_rtx_info *info, FILE *file)
+gen_insn (const md_rtx_info &info, FILE *file)
 {
   struct pattern_stats stats;
   int i;
@@ -338,7 +338,7 @@ gen_insn (md_rtx_info *info, FILE *file)
  registers or MATCH_SCRATCHes.  If so, store away the information for
  later.  */
 
-  rtx insn = info->def;
+  rtx insn = info.def;
   if (XVEC (insn, 1))
 {
   int has_hard_reg = 0;
@@ -366,7 +366,7 @@ gen_insn (md_rtx_info *info, FILE *file)
  struct clobber_ent *link = XNEW (struct clobber_ent);
  int j;
 
- link->code_number = info->index;
+ link->code_number = info.index;
 
  /* See if any previous CLOBBER_LIST entry is the same as this
 one.  */
@@ -422,12 +422,12 @@ gen_insn (md_rtx_info *info, FILE *file)
   if (XSTR (insn, 0)[0] == 0 || XSTR (insn, 0)[0] == '*')
 return;
 
-  fprintf (file, "/* %s:%d */\n", info->loc.filename, info->loc.lineno);
+  fprintf (file, "/* %s:%d */\n", info.loc.filename, info.loc.lineno);
 
   /* Find out how many operands this function has.  */
   get_pattern_stats (&stats, XVEC (insn, 1));
   if (stats.max_dup_opno > stats.max_opno)
-fatal_at (info->loc, "match_dup operand number has no match_operand");
+fatal_at (info.loc, "match_dup operand number has no match_operand");
 
   /* Output the function name and argument declarations.  */
   fprintf (file, "rtx\ngen_%s (", XSTR (insn, 0));
@@ -458,25 +458,25 @@ gen_insn (md_rtx_info *info, FILE *file)
 /* Generate the `gen_...' function for a DEFINE_EXPAND.  */
 
 static void
-gen_expand (md_rtx_info *info, FILE *file)
+gen_expand (const md_rtx_info &info, FILE *file)
 {
   struct pattern_stats stats;
   int i;
   char *used;
 
-  rtx expand = info->def;
+  rtx expand = info.def;
   if (strlen (XSTR (expand, 0)) == 0)
-fatal_at (info->loc, "define_expand lacks a name");
+fatal_at (info.loc, "define_expand lacks a name");
   if (XVEC (expand, 1) == 0)
-fatal_at (info->loc, "define_expand for %s lacks a pattern",
+fatal_at (info.loc, "define_expand for %s lacks a pattern",
  XSTR (expand, 0));
 
   /* Find out how many operands this function has.  */
   get_pattern_stats (&stats, XVEC (expand, 1));
   if (stats.min_scratch_opno != -1
   && stats.min_scratch_opno <= MAX (stats.max_opno, stats.max_dup_opno))
-fatal_at (info->loc, "define_expand for %s needs to have match_scratch "
-"numbers above all other operands", XSTR (expand, 0));
+fatal_at (info.loc, "define_expand for %s needs to have match_scratch "
+ "numbers above all other operands", XSTR (expand, 0));
 
   /* Output the function name and argument declarations.  */
   fprintf (file, "rtx\ngen_%s (", XSTR (expand, 0));
@@ -567,21 +567,21 @@ gen_expand (md_rtx_info *info, FILE *file)
 /* Like gen_expand, but generates insns resulting from splitting SPLIT.  */
 
 static void
-gen_split (md_rtx_info *info, FILE *file)
+gen_split (const md_rtx_info &info, FILE *file)
 {
   struct pattern_stats stats;
   int i;
-  rtx split = info->def;
+  rtx split = info.def;
   const char *const name =
 ((GET_CODE (split) == DEFINE_PEEPHOLE2) ? "peephole2" : "split");
   const char *unused;
   char *used;
 
   if (XVEC (split, 0) == 0)
-fatal_at (info->loc, "%s lacks a pattern",
+fatal_at (info.loc, "%s lacks a pattern

[PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-16 Thread Richard Sandiford

One slightly awkward part about emitting the generator function
bodies is that:

* define_insn and define_expand routines have a separate argument for
  each operand, named "operand0" upwards.

* define_split and define_peephole2 routines take a pointer to an array,
  named "operands".

* the C++ preparation code for expands, splits and peephole2s uses an
  array called "operands" to refer to the operands.

* the automatically-generated code uses individual "operand"
  variables to refer to the operands.

So define_expands have to store the incoming arguments into an operands
array before the md file's C++ code, then copy the operands array back
to the individual variables before the automatically-generated code.
splits and peephole2s have to copy the incoming operands array to
individual variables after the md file's C++ code, creating more
local variables that are live across calls to rtx_alloc.

This patch tries to simplify things by making the whole function
body use the operands array in preference to individual variables.
define_insns and define_expands store their arguments to the array
on entry.

This would have pros and cons on its own, but having a single array
helps with future efforts to reduce the duplication between gen_*
functions.

Doing this tripped a warning in stormy16.md about writing beyond
the end of the array.  The negsi2 C++ code writes to operands[2]
even though the pattern has no operand 2.

gcc/
* genemit.cc (gen_rtx_scratch, gen_exp): Use operands[%d] rather than
operand%d.
(start_gen_insn): Store the incoming arguments to an operands array.
(gen_expand, gen_split): Remove copies into and out of the operands
array.
* config/stormy16/stormy16.md (negsi): Remove redundant assignment.
---
 gcc/genemit.cc | 61 --
 1 file changed, 19 insertions(+), 42 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 90f36e293b4..cdc098f19b8 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -71,7 +71,7 @@ gen_rtx_scratch (rtx x, enum rtx_code subroutine_type, FILE 
*file)
 {
   if (subroutine_type == DEFINE_PEEPHOLE2)
 {
-  fprintf (file, "operand%d", XINT (x, 0));
+  fprintf (file, "operands[%d]", XINT (x, 0));
 }
   else
 {
@@ -108,21 +108,21 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
{
  if (used[XINT (x, 0)])
{
- fprintf (file, "copy_rtx (operand%d)", XINT (x, 0));
+ fprintf (file, "copy_rtx (operands[%d])", XINT (x, 0));
  return;
}
  used[XINT (x, 0)] = 1;
}
-  fprintf (file, "operand%d", XINT (x, 0));
+  fprintf (file, "operands[%d]", XINT (x, 0));
   return;
 
 case MATCH_OP_DUP:
   fprintf (file, "gen_rtx_fmt_");
   for (i = 0; i < XVECLEN (x, 1); i++)
fprintf (file, "e");
-  fprintf (file, " (GET_CODE (operand%d), ", XINT (x, 0));
+  fprintf (file, " (GET_CODE (operands[%d]), ", XINT (x, 0));
   if (GET_MODE (x) == VOIDmode)
-   fprintf (file, "GET_MODE (operand%d)", XINT (x, 0));
+   fprintf (file, "GET_MODE (operands[%d])", XINT (x, 0));
   else
fprintf (file, "%smode", GET_MODE_NAME (GET_MODE (x)));
   for (i = 0; i < XVECLEN (x, 1); i++)
@@ -137,7 +137,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
   fprintf (file, "gen_rtx_fmt_");
   for (i = 0; i < XVECLEN (x, 2); i++)
fprintf (file, "e");
-  fprintf (file, " (GET_CODE (operand%d)", XINT (x, 0));
+  fprintf (file, " (GET_CODE (operands[%d])", XINT (x, 0));
   fprintf (file, ", %smode", GET_MODE_NAME (GET_MODE (x)));
   for (i = 0; i < XVECLEN (x, 2); i++)
{
@@ -149,7 +149,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used,
 
 case MATCH_PARALLEL:
 case MATCH_PAR_DUP:
-  fprintf (file, "operand%d", XINT (x, 0));
+  fprintf (file, "operands[%d]", XINT (x, 0));
   return;
 
 case MATCH_SCRATCH:
@@ -437,14 +437,22 @@ start_gen_insn (FILE *file, const char *name, const 
pattern_stats &stats)
   fprintf (file, "rtx\ngen_%s (", name);
   if (stats.num_generator_args)
 for (int i = 0; i < stats.num_generator_args; i++)
-  if (i)
-   fprintf (file, ",\n\trtx operand%d ATTRIBUTE_UNUSED", i);
-  else
-   fprintf (file, "rtx operand%d ATTRIBUTE_UNUSED", i);
+  fprintf (file, "%srtx operand%d", i == 0 ? "" : ", ", i);
   else
 fprintf (file, "void");
   fprintf (file, ")\n");
   fprintf (file, "{\n");
+  if (stats.num_generator_args)
+{
+  fprintf (file, "   rtx operands[%d] ATTRIBUTE_UNUSED = {",
+  stats.num_operand_vars);
+  for (int i = 0; i < stats.num_generator_args; i++)
+   fprintf (file, "%s operand%d", i == 0 ? "" : ",", i);
+  fprintf (file, " };\n");
+}
+  else if (stats.num_operand_vars != 0)
+fprintf (file, "  rtx operands[%d] ATTRIBUTE_UNUSED;\n",
+stats.n

[PATCH 5/9] genemit: Factor out code common to insns and expands

2025-05-16 Thread Richard Sandiford

Mostly to reduce cut-&-paste.

gcc/
* genemit.cc (start_gen_insn): New function, split out from...
(gen_insn, gen_expand): ...here.
---
 gcc/genemit.cc | 45 ++---
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index b73a45a0412..90f36e293b4 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -427,13 +427,32 @@ maybe_queue_insn (const md_rtx_info &info)
   queue.safe_push (info);
 }
 
+/* Output the function name, argument declarations, and initial function
+   body for a pattern called NAME, given that it has the properties
+   in STATS.  */
+
+static void
+start_gen_insn (FILE *file, const char *name, const pattern_stats &stats)
+{
+  fprintf (file, "rtx\ngen_%s (", name);
+  if (stats.num_generator_args)
+for (int i = 0; i < stats.num_generator_args; i++)
+  if (i)
+   fprintf (file, ",\n\trtx operand%d ATTRIBUTE_UNUSED", i);
+  else
+   fprintf (file, "rtx operand%d ATTRIBUTE_UNUSED", i);
+  else
+fprintf (file, "void");
+  fprintf (file, ")\n");
+  fprintf (file, "{\n");
+}
+
 /* Generate the `gen_...' function for a DEFINE_INSN.  */
 
 static void
 gen_insn (const md_rtx_info &info, FILE *file)
 {
   struct pattern_stats stats;
-  int i;
 
   /* Find out how many operands this function has.  */
   rtx insn = info.def;
@@ -442,17 +461,7 @@ gen_insn (const md_rtx_info &info, FILE *file)
 fatal_at (info.loc, "match_dup operand number has no match_operand");
 
   /* Output the function name and argument declarations.  */
-  fprintf (file, "rtx\ngen_%s (", XSTR (insn, 0));
-  if (stats.num_generator_args)
-for (i = 0; i < stats.num_generator_args; i++)
-  if (i)
-   fprintf (file, ",\n\trtx operand%d ATTRIBUTE_UNUSED", i);
-  else
-   fprintf (file, "rtx operand%d ATTRIBUTE_UNUSED", i);
-  else
-fprintf (file, "void");
-  fprintf (file, ")\n");
-  fprintf (file, "{\n");
+  start_gen_insn (file, XSTR (insn, 0), stats);
 
   /* Output code to construct and return the rtl for the instruction body.  */
 
@@ -499,17 +508,7 @@ gen_expand (const md_rtx_info &info, FILE *file)
  "numbers above all other operands", XSTR (expand, 0));
 
   /* Output the function name and argument declarations.  */
-  fprintf (file, "rtx\ngen_%s (", XSTR (expand, 0));
-  if (stats.num_generator_args)
-for (i = 0; i < stats.num_generator_args; i++)
-  if (i)
-   fprintf (file, ",\n\trtx operand%d", i);
-  else
-   fprintf (file, "rtx operand%d", i);
-  else
-fprintf (file, "void");
-  fprintf (file, ")\n");
-  fprintf (file, "{\n");
+  start_gen_insn (file, XSTR (expand, 0), stats);
 
   /* If we don't have any C code to write, only one insn is being written,
  and no MATCH_DUPs are present, we can just return the desired insn
-- 
2.43.0

Re: [PATCH 3/9] genemit: Use references rather than pointers

2025-05-16 Thread Jeff Law





On 5/16/25 11:21 AM, Richard Sandiford wrote:

This patch makes genemit.cc pass the md_rtx_info around by constant
reference rather than pointer.  It's somewhat of a cosmetic change
on its own, but it makes later changes less noisy.

gcc/
* genemit.cc (gen_exp): Make the info argument a constant reference.
(gen_emit_seq, gen_insn, gen_expand, gen_split): Likewise.
(output_add_clobbers): Likewise.
(main): Update calls accordingly.

OK
jeff

Re: [PATCH 4/9] genemit: Add an internal queue

2025-05-16 Thread Jeff Law





On 5/16/25 11:21 AM, Richard Sandiford wrote:

An earlier version of this series wanted to collect information
about all the gen_* functions that are going to be generated.
The current version no longer does that, but the queue seemed
worth keeping anyway, since it gives a more consistent structure.

gcc/
* genemit.cc (queue): New static variable.
(maybe_queue_insn): New function, split out from...
(gen_insn): ...here.
(queue_expand): New function, split out from...
(gen_expand): ...here.
(gen_split): New function, split out from...
(queue_split): ...here.
(main): Queue definitions for later processing rather than
emitting them on the fly.

OK.
jeff

Re: [PATCH 5/9] genemit: Factor out code common to insns and expands

2025-05-16 Thread Jeff Law





On 5/16/25 11:21 AM, Richard Sandiford wrote:

Mostly to reduce cut-&-paste.

gcc/
* genemit.cc (start_gen_insn): New function, split out from...
(gen_insn, gen_expand): ...here.

OK
jeff

New Spanish PO file for 'cpplib' (version 15.1-b20250316)

2025-05-16 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Spanish team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/es.po

(This file, 'cpplib-15.1-b20250316.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH v3] Extend vect_recog_cond_expr_convert_pattern to handle REAL_CST

2025-05-16 Thread Richard Biener

On Wed, May 14, 2025 at 7:28 AM liuhongt  wrote:
>
> So it won't do the unsafe truncation for double(1.001) to 
> float(1.0)
> since there's precision loss.
> It's guarded by testcase pr103771-6.c
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> REAL_CST is handled if it can be represented in different floating
> point types without loss of precision or under fast math.
>
> gcc/ChangeLog:
>
> PR tree-optimization/103771
> * match.pd (cond_expr_convert_p): Extend the match to handle
> REAL_CST.
> * tree-vect-patterns.cc
> (vect_recog_cond_expr_convert_pattern): Handle REAL_CST.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103771-5.c: New test.
> * gcc.target/i386/pr103771-6.c: New test.
> ---
>  gcc/match.pd   | 33 +
>  gcc/testsuite/gcc.target/i386/pr103771-5.c | 54 ++
>  gcc/testsuite/gcc.target/i386/pr103771-6.c | 16 +++
>  gcc/tree-vect-patterns.cc  | 31 +
>  4 files changed, 126 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-6.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 789e3d33326..0c966675a3f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -11346,6 +11346,39 @@ and,
> && single_use (@4)
> && single_use (@5
>
> +/* Floating point or integer comparison and floating point conversion
> +   with REAL_CST.  */
> +(match (cond_expr_convert_p @0 @2 @3 @6)
> + (cond (simple_comparison@6 @0 @1) (REAL_CST@2) (convert@5 @3))
> +  (if (!flag_trapping_math
> +   && SCALAR_FLOAT_TYPE_P (type)
> +   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@3))
> +   && !operand_equal_p (TYPE_SIZE (type),
> +   TYPE_SIZE (TREE_TYPE (@0)))
> +   && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
> +  TYPE_SIZE (TREE_TYPE (@3)))
> +   && single_use (@5)
> +   && (flag_unsafe_math_optimizations
> +  || exact_real_truncate (TYPE_MODE (TREE_TYPE (@3)),
> +  &TREE_REAL_CST (@2)))

So this looks good now.  I don't like the const_unop check, can you
instead fail the pattern in the consumer when the const_unop you
repeat there returns NULL?

OK with that change.
Richard.

> +   && const_unop (CONVERT_EXPR, TREE_TYPE (@3), @2
> +
> +/* Floating point or integer comparison and floating point conversion
> +   with REAL_CST.  */
> +(match (cond_expr_convert_p @0 @2 @3 @6)
> + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (REAL_CST@3))
> +  (if (!flag_trapping_math
> +   && SCALAR_FLOAT_TYPE_P (type)
> +   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> +   && !operand_equal_p (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0)))
> +   && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
> +  TYPE_SIZE (TREE_TYPE (@2)))
> +   && single_use (@4)
> +   && (flag_unsafe_math_optimizations
> +  || exact_real_truncate (TYPE_MODE (TREE_TYPE (@2)),
> +  &TREE_REAL_CST (@3)))
> +   && const_unop (CONVERT_EXPR, TREE_TYPE (@2), @3
> +
>  (for bit_op (bit_and bit_ior bit_xor)
>   (match (bitwise_induction_p @0 @2 @3)
>(bit_op:c
> diff --git a/gcc/testsuite/gcc.target/i386/pr103771-5.c 
> b/gcc/testsuite/gcc.target/i386/pr103771-5.c
> new file mode 100644
> index 000..bf94f53b88c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103771-5.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64-v4 -O3 -fno-trapping-math 
> -fdump-tree-vect-details" } */
> +/* { dg-final { scan-assembler-not "kshift" { target { ! ia32 } } } } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized using 64 byte vectors" 
> 4 "vect" { target { ! ia32 } } } } */
> +
> +void
> +foo (float* a, float* b, float* c, float* d, double* __restrict e, int n)
> +{
> +  for (int i = 0 ; i != n; i++)
> +{
> +  float tmp = c[i] + d[i];
> +  if (a[i] < b[i])
> +   tmp = 0.0;
> +  e[i] = tmp;
> +}
> +}
> +
> +void
> +foo1 (int* a, int* b, float* c, float* d, double* __restrict e, int n)
> +{
> +  for (int i = 0 ; i != n; i++)
> +{
> +  float tmp = c[i] + d[i];
> +  if (a[i] < b[i])
> +   tmp = 0.0;
> +  e[i] = tmp;
> +}
> +}
> +
> +
> +void
> +foo2 (double* a, double* b, double* c, double* d, float* __restrict e, int n)
> +{
> +  for (int i = 0 ; i != n; i++)
> +{
> +  float tmp = c[i] + d[i];
> +  if (a[i] < b[i])
> +   tmp = 0.0;
> +  e[i] = tmp;
> +}
> +}
> +
> +void
> +foo3 (long long* a, long long* b, double* c, double* d, float* __restrict e, 
> int n)
> +{
> +  for (int i = 0 ; i != n; i++)
> +{
> +  float tmp = c[i] + d[i];
> +  if (a[i] < b[i])
> +   tmp = 0.0;
> +  e[i] = tmp;
> +}
> +}
> +
> diff --git a/gcc/testsu

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-16 Thread Richard Sandiford

Richard Biener  writes:
> Targets recently got the ability to request the vector mode to be
> used for a vector epilogue (or the epilogue of a vector epilogue).  The
> following adds the ability for it to indicate the epilogue should use
> loop masking, irrespective of the --param vect-partial-vector-usage
> setting.

Is overriding the --param a good idea?  I can imagine we'd eventually
want a --param to override the override of the --param. :)

> The simple prototype below uses a separate flag from the epilogue
> mode, but I wonder how we want to more generally want to handle
> whether to use masking or not when iterating over modes.  Currently
> we mostly rely on --param vect-partial-vector-usage.  aarch64
> and riscv have both variable-length modes but also fixed-size modes
> where for the latter, like on x86, the target couldn't request
> a mode specifically with or without masking.  It seems both
> aarch64 and riscv fully rely on cost comparison and fully
> exploiting the mode iteration space (but not masked vs. non-masked?!)
> here?
>
> I was thinking of adding a vectorization_mode class that would
> encapsulate the mode and whether to allow masking or alternatively
> to make the vector_modes array (and the m_suggested_epilogue_mode)
> a std::pair of mode and mask flag?

Predicated vs. non-predicated SVE is interesting for the main loop.
The class sounds like it would be useful for that.

I suppose predicated vs. non-predicated SVE is also potentially
interesting for an unrolled epilogue, although there, it would in
theory be better to predicate only the last vector iteration
(i.e. part predicated, part unpredicated).

So I suppose unpredicated SVE epilogue loops might be interesting
until that partial predication is implemented, but I'm not sure how
useful unpredicated SVE epilogue loops would be "once" the partial
predication is supported.

I don't imagine we'll often know a priori for AArch64 which type
of vector epilogue is best.  Since switching between SVE and
Advanced SIMD is assumed to be essentially free, I think we'll
still rely on the current approach of costing both and seeing
which is cheaper.

Thanks,
Richard

>
> For the x86 case going the prototype way would be sufficient, we
> wouldn't want to say use a masked AVX epilogue for a AVX512 loop,
> so any further iteration on epilogue modes if the requested mode
> would fail to vectorize is OK to be unmasked.
>
> Any comments on this?  You are not yet using m_suggested_epilogue_mode
> to get more than one vector epilogue, this might be a way to add
> heuristics when to use a masked epilogue.
>
> Thanks,
> Richard.
>
>   * tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
>   Add masked output parameter and return m_masked_epilogue.
>   (vector_costs::m_masked_epilogue): New tristate flag.
>   (vector_costs::vector_costs): Initialize m_masked_epilogue.
>   * tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
>   flag to optionally initialize can_use_partial_vectors_p.
>   (vect_analyze_loop): For epilogues also get whether to use
>   a masked epilogue for this loop from the target and use
>   that for the first epilogue mode we try.
> ---
>  gcc/tree-vect-loop.cc | 29 +
>  gcc/tree-vectorizer.h | 12 +---
>  2 files changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 2d1a6883e6b..4af510ff20c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
> *shared,
>const vect_loop_form_info *loop_form_info,
>loop_vec_info orig_loop_vinfo,
>const vector_modes &vector_modes, unsigned &mode_i,
> +  int masked_p,
>machine_mode &autodetected_vector_mode,
>bool &fatal)
>  {
> @@ -3415,6 +3416,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
> *shared,
>  
>machine_mode vector_mode = vector_modes[mode_i];
>loop_vinfo->vector_mode = vector_mode;
> +  if (masked_p != -1)
> +loop_vinfo->can_use_partial_vectors_p = masked_p;
>unsigned int suggested_unroll_factor = 1;
>unsigned slp_done_for_suggested_uf = 0;
>  
> @@ -3580,7 +3583,7 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>cached_vf_per_mode[last_mode_i] = -1;
>opt_loop_vec_info loop_vinfo
>   = vect_analyze_loop_1 (loop, shared, &loop_form_info,
> -NULL, vector_modes, mode_i,
> +NULL, vector_modes, mode_i, -1,
>  autodetected_vector_mode, fatal);
>if (fatal)
>   break;
> @@ -3665,19 +3668,24 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>   array may contain length-agnostic and length-specific modes.  Their
>   ordering is not guaranteed, so we cou

[PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-16 Thread Richard Biener

Targets recently got the ability to request the vector mode to be
used for a vector epilogue (or the epilogue of a vector epilogue).  The
following adds the ability for it to indicate the epilogue should use
loop masking, irrespective of the --param vect-partial-vector-usage
setting.

The simple prototype below uses a separate flag from the epilogue
mode, but I wonder how we want to more generally want to handle
whether to use masking or not when iterating over modes.  Currently
we mostly rely on --param vect-partial-vector-usage.  aarch64
and riscv have both variable-length modes but also fixed-size modes
where for the latter, like on x86, the target couldn't request
a mode specifically with or without masking.  It seems both
aarch64 and riscv fully rely on cost comparison and fully
exploiting the mode iteration space (but not masked vs. non-masked?!)
here?

I was thinking of adding a vectorization_mode class that would
encapsulate the mode and whether to allow masking or alternatively
to make the vector_modes array (and the m_suggested_epilogue_mode)
a std::pair of mode and mask flag?

For the x86 case going the prototype way would be sufficient, we
wouldn't want to say use a masked AVX epilogue for a AVX512 loop,
so any further iteration on epilogue modes if the requested mode
would fail to vectorize is OK to be unmasked.

Any comments on this?  You are not yet using m_suggested_epilogue_mode
to get more than one vector epilogue, this might be a way to add
heuristics when to use a masked epilogue.

Thanks,
Richard.

* tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
Add masked output parameter and return m_masked_epilogue.
(vector_costs::m_masked_epilogue): New tristate flag.
(vector_costs::vector_costs): Initialize m_masked_epilogue.
* tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
flag to optionally initialize can_use_partial_vectors_p.
(vect_analyze_loop): For epilogues also get whether to use
a masked epilogue for this loop from the target and use
that for the first epilogue mode we try.
---
 gcc/tree-vect-loop.cc | 29 +
 gcc/tree-vectorizer.h | 12 +---
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2d1a6883e6b..4af510ff20c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 const vect_loop_form_info *loop_form_info,
 loop_vec_info orig_loop_vinfo,
 const vector_modes &vector_modes, unsigned &mode_i,
+int masked_p,
 machine_mode &autodetected_vector_mode,
 bool &fatal)
 {
@@ -3415,6 +3416,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 
   machine_mode vector_mode = vector_modes[mode_i];
   loop_vinfo->vector_mode = vector_mode;
+  if (masked_p != -1)
+loop_vinfo->can_use_partial_vectors_p = masked_p;
   unsigned int suggested_unroll_factor = 1;
   unsigned slp_done_for_suggested_uf = 0;
 
@@ -3580,7 +3583,7 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
   cached_vf_per_mode[last_mode_i] = -1;
   opt_loop_vec_info loop_vinfo
= vect_analyze_loop_1 (loop, shared, &loop_form_info,
-  NULL, vector_modes, mode_i,
+  NULL, vector_modes, mode_i, -1,
   autodetected_vector_mode, fatal);
   if (fatal)
break;
@@ -3665,19 +3668,24 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
  array may contain length-agnostic and length-specific modes.  Their
  ordering is not guaranteed, so we could end up picking a mode for the main
  loop that is after the epilogue's optimal mode.  */
+  int masked_p = -1;
   if (!unlimited_cost_model (loop)
-  && first_loop_vinfo->vector_costs->suggested_epilogue_mode () != 
VOIDmode)
+  && (first_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p)
+ != VOIDmode))
 {
   vector_modes[0]
-   = first_loop_vinfo->vector_costs->suggested_epilogue_mode ();
+   = first_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p);
   cached_vf_per_mode[0] = 0;
 }
   else
 vector_modes[0] = autodetected_vector_mode;
   mode_i = 0;
 
-  bool supports_partial_vectors =
-partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
+  /* ???  If we'd compute LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P unconditionally
+ for the main loop we could re-use that as short-cut here.  Not if
+ the suggested epilogue mode is different, of course.  */
+  bool supports_partial_vectors = partial_vectors_supported_p () && 
((param_vect_partial_vector_usage != 0 && masked_p != 0)
+  || masked_p == 1);
   poly

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Richard Sandiford

Jennifer Schmitz  writes:
> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
> partial_subreg_p in the function copy_value during the RTL pass
> regcprop, failing the assertion in
>
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
>   /* Modes involved in a subreg must be ordered.  In particular, we must
>  always know at compile time whether the subreg is paradoxical.  */
>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));
>   return maybe_lt (outer_prec, inner_prec);
> }
>
> Replacing the call to partial_subreg_p by ordered_p && maybe_lt resolves
> the ICE and passes bootstrap and testing without regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   PR middle-end/120276
>   * regcprop.cc (copy_value): Replace call to partial_subreg_p by
>   ordered_p && maybe_lt.
>
> gcc/testsuite/
>   PR middle-end/120276
>   * gcc.target/aarch64/sve/pr120276.c: New test.
> ---
>  gcc/regcprop.cc   |  5 -
>  .../gcc.target/aarch64/sve/pr120276.c | 20 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>
> diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
> index 4fa1305526c..7f4a25a4718 100644
> --- a/gcc/regcprop.cc
> +++ b/gcc/regcprop.cc
> @@ -326,6 +326,9 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>|| (sr > dr && sr < dr + dn))
>  return;
>  
> +  poly_int64 sr_prec = GET_MODE_PRECISION (vd->e[sr].mode);
> +  poly_int64 src_prec = GET_MODE_PRECISION (GET_MODE (src));
> +
>/* If SRC had no assigned mode (i.e. we didn't know it was live)
>   assign it now and assume the value came from an input argument
>   or somesuch.  */
> @@ -371,7 +374,7 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>   (set (reg:DI dx) (reg:DI bp))
>   The last set is not redundant, while the low 8 bits of dx are already
>   equal to low 8 bits of bp, the other bits are undefined.  */
> -  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
> +  else if (ordered_p (sr_prec, src_prec) && maybe_lt (sr_prec, src_prec))
>  {
>if (!REG_CAN_CHANGE_MODE_P (sr, GET_MODE (src), vd->e[sr].mode)
> || !REG_CAN_CHANGE_MODE_P (dr, vd->e[sr].mode, GET_MODE (dest)))

I think we should instead add:

   else if (!ordered_p (GET_MODE_PRECISION (vd->e[sr].mode),
GET_MODE_PRECISION (GET_MODE (src
 return;

after:

  if (vd->e[sr].mode == VOIDmode)
set_value_regno (sr, vd->e[dr].mode, vd);

The ICE in paradoxical_subreg is just the canary.  The register
replacement isn't valid if we don't know at compile time how the bytes
are distributed between the two modes.

Thanks,
Richard

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
> new file mode 100644
> index 000..1b29c90f69b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a;
> +char b[1];
> +int c[18];
> +void d(char *);
> +void e() {
> +  int f;
> +  char *g;
> +  a = 0;
> +  for (; a < 18; a++) {
> +int h = f = 0;
> +for (; f < 4; f++) {
> +  g[a * 4 + f] = c[a] >> h;
> +  h += 8;
> +}
> +  }
> +  d(b);
> +}
> \ No newline at end of file

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Kyrylo Tkachov



> On 16 May 2025, at 12:35, Richard Sandiford  wrote:
> 
> Jennifer Schmitz  writes:
>> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
>> partial_subreg_p in the function copy_value during the RTL pass
>> regcprop, failing the assertion in
>> 
>> inline bool
>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>> {
>>  /* Modes involved in a subreg must be ordered.  In particular, we must
>> always know at compile time whether the subreg is paradoxical.  */
>>  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>  gcc_checking_assert (ordered_p (outer_prec, inner_prec));
>>  return maybe_lt (outer_prec, inner_prec);
>> }
>> 
>> Replacing the call to partial_subreg_p by ordered_p && maybe_lt resolves
>> the ICE and passes bootstrap and testing without regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>> PR middle-end/120276
>> * regcprop.cc (copy_value): Replace call to partial_subreg_p by
>> ordered_p && maybe_lt.
>> 
>> gcc/testsuite/
>> PR middle-end/120276
>> * gcc.target/aarch64/sve/pr120276.c: New test.
>> ---
>> gcc/regcprop.cc   |  5 -
>> .../gcc.target/aarch64/sve/pr120276.c | 20 +++
>> 2 files changed, 24 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>> 
>> diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
>> index 4fa1305526c..7f4a25a4718 100644
>> --- a/gcc/regcprop.cc
>> +++ b/gcc/regcprop.cc
>> @@ -326,6 +326,9 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>>   || (sr > dr && sr < dr + dn))
>> return;
>> 
>> +  poly_int64 sr_prec = GET_MODE_PRECISION (vd->e[sr].mode);
>> +  poly_int64 src_prec = GET_MODE_PRECISION (GET_MODE (src));
>> +
>>   /* If SRC had no assigned mode (i.e. we didn't know it was live)
>>  assign it now and assume the value came from an input argument
>>  or somesuch.  */
>> @@ -371,7 +374,7 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>>  (set (reg:DI dx) (reg:DI bp))
>>  The last set is not redundant, while the low 8 bits of dx are already
>>  equal to low 8 bits of bp, the other bits are undefined.  */
>> -  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
>> +  else if (ordered_p (sr_prec, src_prec) && maybe_lt (sr_prec, src_prec))
>> {
>>   if (!REG_CAN_CHANGE_MODE_P (sr, GET_MODE (src), vd->e[sr].mode)
>>  || !REG_CAN_CHANGE_MODE_P (dr, vd->e[sr].mode, GET_MODE (dest)))
> 
> I think we should instead add:
> 
>   else if (!ordered_p (GET_MODE_PRECISION (vd->e[sr].mode),
> GET_MODE_PRECISION (GET_MODE (src
> return;
> 
> after:
> 
>  if (vd->e[sr].mode == VOIDmode)
>set_value_regno (sr, vd->e[dr].mode, vd);
> 
> The ICE in paradoxical_subreg is just the canary.  The register
> replacement isn't valid if we don't know at compile time how the bytes
> are distributed between the two modes.
> 
> Thanks,
> Richard
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>> new file mode 100644
>> index 000..1b29c90f69b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */

The test case is better-placed in a generic directory like gcc.dg/torture with 
a directive like:
/* { dg-additional-options “-march=armv8.2-a+sve" { target aarch64*-*-* } } */
to force the SVE testing.
Thanks,
Kyrill

>> +
>> +int a;
>> +char b[1];
>> +int c[18];
>> +void d(char *);
>> +void e() {
>> +  int f;
>> +  char *g;
>> +  a = 0;
>> +  for (; a < 18; a++) {
>> +int h = f = 0;
>> +for (; f < 4; f++) {
>> +  g[a * 4 + f] = c[a] >> h;
>> +  h += 8;
>> +}
>> +  }
>> +  d(b);
>> +}
>> \ No newline at end of file

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Jennifer Schmitz



> On 16 May 2025, at 13:11, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 16 May 2025, at 12:35, Richard Sandiford  
>> wrote:
>> 
>> Jennifer Schmitz  writes:
>>> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
>>> partial_subreg_p in the function copy_value during the RTL pass
>>> regcprop, failing the assertion in
>>> 
>>> inline bool
>>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>>> {
>>> /* Modes involved in a subreg must be ordered.  In particular, we must
>>>always know at compile time whether the subreg is paradoxical.  */
>>> poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>> poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>> gcc_checking_assert (ordered_p (outer_prec, inner_prec));
>>> return maybe_lt (outer_prec, inner_prec);
>>> }
>>> 
>>> Replacing the call to partial_subreg_p by ordered_p && maybe_lt resolves
>>> the ICE and passes bootstrap and testing without regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
>>> gcc/
>>> PR middle-end/120276
>>> * regcprop.cc (copy_value): Replace call to partial_subreg_p by
>>> ordered_p && maybe_lt.
>>> 
>>> gcc/testsuite/
>>> PR middle-end/120276
>>> * gcc.target/aarch64/sve/pr120276.c: New test.
>>> ---
>>> gcc/regcprop.cc   |  5 -
>>> .../gcc.target/aarch64/sve/pr120276.c | 20 +++
>>> 2 files changed, 24 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>>> 
>>> diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
>>> index 4fa1305526c..7f4a25a4718 100644
>>> --- a/gcc/regcprop.cc
>>> +++ b/gcc/regcprop.cc
>>> @@ -326,6 +326,9 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>>>  || (sr > dr && sr < dr + dn))
>>>return;
>>> 
>>> +  poly_int64 sr_prec = GET_MODE_PRECISION (vd->e[sr].mode);
>>> +  poly_int64 src_prec = GET_MODE_PRECISION (GET_MODE (src));
>>> +
>>>  /* If SRC had no assigned mode (i.e. we didn't know it was live)
>>> assign it now and assume the value came from an input argument
>>> or somesuch.  */
>>> @@ -371,7 +374,7 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>>> (set (reg:DI dx) (reg:DI bp))
>>> The last set is not redundant, while the low 8 bits of dx are already
>>> equal to low 8 bits of bp, the other bits are undefined.  */
>>> -  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
>>> +  else if (ordered_p (sr_prec, src_prec) && maybe_lt (sr_prec, src_prec))
>>>{
>>>  if (!REG_CAN_CHANGE_MODE_P (sr, GET_MODE (src), vd->e[sr].mode)
>>> || !REG_CAN_CHANGE_MODE_P (dr, vd->e[sr].mode, GET_MODE (dest)))
>> 
>> I think we should instead add:
>> 
>>  else if (!ordered_p (GET_MODE_PRECISION (vd->e[sr].mode),
>> GET_MODE_PRECISION (GET_MODE (src
>>return;
>> 
>> after:
>> 
>> if (vd->e[sr].mode == VOIDmode)
>>   set_value_regno (sr, vd->e[dr].mode, vd);
Thanks, I made the change.
>> 
>> The ICE in paradoxical_subreg is just the canary.  The register
>> replacement isn't valid if we don't know at compile time how the bytes
>> are distributed between the two modes.
>> 
>> Thanks,
>> Richard
>> 
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>>> new file mode 100644
>>> index 000..1b29c90f69b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr120276.c
>>> @@ -0,0 +1,20 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
> 
> The test case is better-placed in a generic directory like gcc.dg/torture 
> with a directive like:
> /* { dg-additional-options “-march=armv8.2-a+sve" { target aarch64*-*-* } } */
> to force the SVE testing.
I moved the test and removed /* { dg-options “-O3” } */, because that is added 
as one of the optimization levels in gcc.torture, correct?
Thanks,
Jennifer

[PATCH] [PR120276] regcprop: Return from copy_value for unordered modes

The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
partial_subreg_p in the function copy_value during the RTL pass
regcprop, failing the assertion in

inline bool
partial_subreg_p (machine_mode outermode, machine_mode innermode)
{
  /* Modes involved in a subreg must be ordered.  In particular, we must
 always know at compile time whether the subreg is paradoxical.  */
  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
  gcc_checking_assert (ordered_p (outer_prec, inner_prec));
  return maybe_lt (outer_prec, inner_prec);
}

Returning from the function if the modes are not ordered before reaching
the call to partial_subreg_p resolves the ICE and passes bootstrap and
testing without regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
PR middle-end/120276
* regcprop.cc (copy_value): Return in case of unordered modes.

gcc/testsuite/
PR middle-end/120276
* gcc.dg/torture/pr120276.c: New

[PATCH v4] libstdc++: Implement C++26 function_ref [PR119126]

2025-05-16 Thread Tomasz Kamiński

This patch implements C++26 function_ref as specified in P0792R14,
with correction for constraints for constructor accepting nontype_t
parameter from LWG 4256.

As function_ref may store a pointer to the const object, __Ptrs::_M_obj is
changed to const void*, so again we do not cast away const from const
objects. To help with necessary casts, a __polyfunc::__cast_to helper is
added, that accepts reference to or target type direclty.

The _Invoker now defines additional call methods used by function_ref:
_S_ptrs() for invoking target passed by reference, and __S_nttp, _S_bind_ptr,
_S_bind_ref for handling constructors accepting nontype_t. The existing
_S_call_storage is changed to thin wrapper, that initialies _Ptrs, and forwards
to _S_call_ptrs.

This reduced the most uses of _Storage::_M_ptr and _Storage::_M_ref,
so this functions was removed, and _Manager uses were adjusted.

Finally we make function_ref available in freestanding mode, as
move_only_function and copyable_function are currently only available in hosted,
so we define _Manager and _Mo_base only if either __glibcxx_move_only_function
or __glibcxx_copyable_function is defined.

PR libstdc++/119126

libstdc++-v3/ChangeLog:

* doc/doxygen/stdheader.cc: Added funcref_impl.h file.
* include/Makefile.am: Added funcref_impl.h file.
* include/Makefile.in: Added funcref_impl.h file.
* include/bits/funcref_impl.h: New file.
* include/bits/funcwrap.h: (_Ptrs::_M_obj): Const-qualify.
(_Storage::_M_ptr, _Storage::_M_ref): Remove.
(__polyfunc::__cast_to) Define.
(_Base_invoker::_S_ptrs, _Base_invoker::_S_nttp)
(_Base_invoker::_S_bind_ptrs, _Base_invoker::_S_bind_ref)
(_Base_invoker::_S_call_ptrs): Define.
(_Base_invoker::_S_call_storage): Foward to _S_call_ptrs.
(_Manager::_S_local, _Manager::_S_ptr): Adjust for _M_obj being
const qualified.
(__polyfunc::_Manager, __polyfunc::_Mo_base): Guard with
__glibcxx_move_only_function || __glibcxx_copyable_function.
(__polyfunc::__skip_first_arg, __polyfunc::__deduce_funcref)
(std::function_ref) [__glibcxx_function_ref]: Define.
* include/bits/utility.h (std::nontype_t, std::nontype)
(__is_nontype_v) [__glibcxx_function_ref]: Define.
* include/bits/version.def: Define function_ref.
* include/bits/version.h: Regenerate.
* include/std/functional: Define __cpp_lib_function_ref.
* src/c++23/std.cc.in (std::nontype_t, std::nontype)
(std::function_ref) [__cpp_lib_function_ref]: Export.
* testsuite/20_util/function_ref/assign.cc: New test.
* testsuite/20_util/function_ref/call.cc: New test.
* testsuite/20_util/function_ref/cons.cc: New test.
* testsuite/20_util/function_ref/cons_neg.cc: New test.
* testsuite/20_util/function_ref/conv.cc: New test.
* testsuite/20_util/function_ref/deduction.cc: New test.
---
Removed unnecessary this-> qualification in function_ref.
Use auto* to store result of __polyfunc::__cast_to.
Added comment for about additional partial specializations for
__skip_first_arg.

 libstdc++-v3/doc/doxygen/stdheader.cc |   1 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/funcref_impl.h  | 198 
 libstdc++-v3/include/bits/funcwrap.h  | 188 +++
 libstdc++-v3/include/bits/utility.h   |  17 ++
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/functional   |   3 +-
 libstdc++-v3/src/c++23/std.cc.in  |   7 +
 .../testsuite/20_util/function_ref/assign.cc  | 108 +
 .../testsuite/20_util/function_ref/call.cc| 186 +++
 .../testsuite/20_util/function_ref/cons.cc| 218 ++
 .../20_util/function_ref/cons_neg.cc  |  30 +++
 .../testsuite/20_util/function_ref/conv.cc| 152 
 .../20_util/function_ref/deduction.cc | 103 +
 16 files changed, 1184 insertions(+), 47 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/funcref_impl.h
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/assign.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/call.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons_neg.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/conv.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/deduction.cc

diff --git a/libstdc++-v3/doc/doxygen/stdheader.cc 
b/libstdc++-v3/doc/doxygen/stdheader.cc
index 839bfc81bc0..938b2b04a26 100644
--- a/libstdc++-v3/doc/doxygen/stdheader.cc
+++ b/libstdc++-v3/doc/doxygen/stdheader.cc
@@ -55,6 +55,7 @@ void init_map()
 headers["functional_hash.h"]

[PATCH 10/10] Use HS/LO instead of CS/CC

2025-05-16 Thread Karl Meakin

The CB family of instructions does not support using the CS or CC
condition codes; instead the synonyms HS and LO must be used. GCC has
traditionally used the CS and CC names. To work around this while
avoiding test churn, add new `j` and `J` format specifiers and use them
when generating CB instructions.

Also reformat the definition of the `aarch64_cond_code` enum while we're
in the same neighbourhood, to make the relationship between each code
and its inverse more obvious.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_cond_code): Reformat.
(aarch64_print_operand): Add new 'j' and 'J' format specifiers.
* config/aarch64/aarch64.md: Use new specifiers.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.cc| 35 ++--
 gcc/config/aarch64/aarch64.md|  4 +--
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 16 +--
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5000672e4a2..9ab324cb2ec 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -913,11 +913,17 @@ static const scoped_attribute_specs *const 
aarch64_attribute_table[] =
   &aarch64_arm_attribute_table
 };
 
-typedef enum aarch64_cond_code
-{
-  AARCH64_EQ = 0, AARCH64_NE, AARCH64_CS, AARCH64_CC, AARCH64_MI, AARCH64_PL,
-  AARCH64_VS, AARCH64_VC, AARCH64_HI, AARCH64_LS, AARCH64_GE, AARCH64_LT,
-  AARCH64_GT, AARCH64_LE, AARCH64_AL, AARCH64_NV
+/* The condition codes come in opposing pairs.  To get the inverse of a given
+   condition, simply flip the LSB.  */
+typedef enum aarch64_cond_code {
+  AARCH64_EQ = 0x00, AARCH64_NE = 0x01,
+  AARCH64_CS = 0x02, AARCH64_CC = 0x03,
+  AARCH64_MI = 0x04, AARCH64_PL = 0x05,
+  AARCH64_VS = 0x06, AARCH64_VC = 0x07,
+  AARCH64_HI = 0x08, AARCH64_LS = 0x09,
+  AARCH64_GE = 0x0A, AARCH64_LT = 0x0B,
+  AARCH64_GT = 0x0C, AARCH64_LE = 0x0D,
+  AARCH64_AL = 0x0E, AARCH64_NV = 0x0F,
 }
 aarch64_cc;
 
@@ -12289,64 +12295,67 @@ static char
 sizetochar (int size)
 {
   switch (size)
 {
 case 64: return 'd';
 case 32: return 's';
 case 16: return 'h';
 case 8:  return 'b';
 default: gcc_unreachable ();
 }
 }
 
 /* Print operand X to file F in a target specific manner according to CODE.
The acceptable formatting commands given by CODE are:
  'c':  An integer or symbol address without a preceding #
sign.
  'C':  Take the duplicated element in a vector constant
and print it in hex.
  'D':  Take the duplicated element in a vector constant
and print it as an unsigned integer, in decimal.
  'e':  Print the sign/zero-extend size as a character 8->b,
16->h, 32->w.  Can also be used for masks:
0xff->b, 0x->h, 0x->w.
  'I':  If the operand is a duplicated vector constant,
replace it with the duplicated scalar.  If the
operand is then a floating-point constant, replace
it with the integer bit representation.  Print the
transformed constant as a signed decimal number.
  'p':  Prints N such that 2^N == X (X must be power of 2 and
const int).
  'P':  Print the number of non-zero bits in X (a const_int).
  'H':  Print the higher numbered register of a pair (TImode)
of regs.
  'm':  Print a condition (eq, ne, etc).
  'M':  Same as 'm', but invert condition.
+ 'j':  Same as 'm', but use use `hs` and `lo`
+   instead of `cs` and `cc`.
+ 'J':  Same as 'j', but invert condition.
  'N':  Take the duplicated element in a vector constant
and print the negative of it in decimal.
  'b/h/s/d/q':  Print a scalar FP/SIMD register name.
  'Z':  Same for SVE registers.  ('z' was already taken.)
Note that it is not necessary to use %Z for operands
that have SVE modes.  The convention is to use %Z
only for non-SVE (or potentially non-SVE) modes.
  'S/T/U/V':Print a FP/SIMD register name for a register 
list.
The register printed is the FP/SIMD register name
of X + 0/1/2/3 for S/T/U/V.
  'R':  Print a scalar Integer/FP/SIMD register name + 1.
  'X':  Print bottom 16 bits of integer constant in hex.
  'w/x':Print a general register name or the zero register
(32-bit or 64-bit).
  '0':  Print a normal operand, if it's

[PATCH 03/10] AArch64: rename branch instruction rules

2025-05-16 Thread Karl Meakin

Give the `define_insn` rules used in lowering `cbranch4` to RTL
more descriptive and consistent names: from now on, each rule is named
after the AArch64 instruction that it generates. Also add comments to
document each rule.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Rename to ...
(aarch64_bcond): ...here.
(*compare_condjump): Rename to ...
(*aarch64_bcond_wide_imm): ...here.
(aarch64_cb): Rename to ...
(aarch64_cbz1): ...here.
(*cb1): Rename to ...
(*aarch64_tbz1): ...here.
(@aarch64_tb): Rename to ...
(@aarch64_tbz): ...here.
(restore_stack_nonlocal): Handle rename.
(stack_protect_combined_test): Likewise.
* config/aarch64/aarch64-simd.md (cbranch4): Likewise.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
---
 gcc/config/aarch64/aarch64-simd.md |  2 +-
 gcc/config/aarch64/aarch64-sme.md  |  2 +-
 gcc/config/aarch64/aarch64.cc  |  4 ++--
 gcc/config/aarch64/aarch64.md  | 21 -
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 1099e742cbf..a816f6a3cea 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3915,41 +3915,41 @@ (define_expand "vcond_mask_"
 (define_expand "cbranch4"
   [(set (pc)
 (if_then_else
   (match_operator 0 "aarch64_equality_operator"
 [(match_operand:VDQ_I 1 "register_operand")
  (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
   (label_ref (match_operand 3 ""))
   (pc)))]
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
   rtx tmp = operands[1];
 
   /* If comparing against a non-zero vector we have to do a comparison first
  so we can have a != 0 comparison with the result.  */
   if (operands[2] != CONST0_RTX (mode))
 {
   tmp = gen_reg_rtx (mode);
   emit_insn (gen_xor3 (tmp, operands[1], operands[2]));
 }
 
   /* For 64-bit vectors we need no reductions.  */
   if (known_eq (128, GET_MODE_BITSIZE (mode)))
 {
   /* Always reduce using a V4SI.  */
   rtx reduc = gen_lowpart (V4SImode, tmp);
   rtx res = gen_reg_rtx (V4SImode);
   emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
   emit_move_insn (tmp, gen_lowpart (mode, res));
 }
 
   rtx val = gen_reg_rtx (DImode);
   emit_move_insn (val, gen_lowpart (DImode, tmp));
 
   rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
   rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[3]));
   DONE;
 })
 
 ;; Patterns comparing two vectors to produce a mask.
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index c49affd0dd3..4e4ac71c5a3 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -366,42 +366,42 @@ (define_insn "aarch64_tpidr2_restore"
 ;; Check whether a lazy save set up by aarch64_save_za was committed
 ;; and restore the saved contents if so.
 ;;
 ;; Operand 0 is the address of the current function's TPIDR2 block.
 (define_insn_and_split "aarch64_restore_za"
   [(set (reg:DI ZA_SAVED_REGNUM)
(unspec:DI [(match_operand 0 "pmode_register_operand" "r")
(reg:DI SME_STATE_REGNUM)
(reg:DI TPIDR2_SETUP_REGNUM)
(reg:DI ZA_SAVED_REGNUM)] UNSPEC_RESTORE_ZA))
(clobber (reg:DI R0_REGNUM))
(clobber (reg:DI R14_REGNUM))
(clobber (reg:DI R15_REGNUM))
(clobber (reg:DI R16_REGNUM))
(clobber (reg:DI R17_REGNUM))
(clobber (reg:DI R18_REGNUM))
(clobber (reg:DI R30_REGNUM))
(clobber (reg:CC CC_REGNUM))]
   ""
   "#"
   "&& epilogue_completed"
   [(const_int 0)]
   {
 auto label = gen_label_rtx ();
 auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM);
 emit_insn (gen_aarch64_read_tpidr2 (tpidr2));
-auto jump = emit_likely_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label));
+auto jump = emit_likely_jump_insn (gen_aarch64_cbznedi1 (tpidr2, label));
 JUMP_LABEL (jump) = label;
 
 aarch64_restore_za (operands[0]);
 emit_label (label);
 DONE;
   }
 )
 
 ;; This instruction is emitted after asms that alter ZA, in order to model
 ;; the effect on dataflow.  The asm itself can't have ZA as an input or
 ;; an output, since there is no associated data type.  Instead it retains
 ;; the original "za" clobber, which on its own would indicate that ZA
 ;; is dead.
 ;;
 ;; The operand is a unique identifier.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9e3f2885bcc..5000672e4a2 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2872,44 +2872,44 @@ static rt

[pushed] c++: one more coro test tweak

2025-05-16 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

After my r16-670, running the testsuite with explicit --stds didn't run this
one in C++17 mode, but the default did.  Let's remove the { target c++17 }
so it doesn't by default, either.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C:
Remove { target c++17 }.
---
 .../coroutines/pr94760-mismatched-traits-and-promise-prev.C | 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/gcc/testsuite/g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C 
b/gcc/testsuite/g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C
index 90a558d0fe2..5e3608b109b 100644
--- 
a/gcc/testsuite/g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C
+++ 
b/gcc/testsuite/g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C
@@ -1,5 +1,3 @@
-// { dg-do compile  { target c++17 } }
-
 #include "coro.h"
 
 // Test that we get matching types to traits and promise param

base-commit: de3cbcf9730b60db76c31c5b628f4bf2ebd6b284
-- 
2.49.0

[PATCH 09/10] AArch64: make rules for CBZ/TBZ higher priority

2025-05-16 Thread Karl Meakin

Move the rules for CBZ/TBZ to be above the rules for
CBB/CBH/CB. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz1): Move
above rules for CBB/CBH/CB.
(*aarch64_tbz1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.md| 162 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c |  32 ++---
 2 files changed, 104 insertions(+), 90 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0b708f8b2f6..d3514ff1ef9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -697,27 +697,38 @@ (define_insn "jump"
 ;; Maximum PC-relative positive/negative displacements for various branching
 ;; instructions.
 (define_constants
   [
 ;; +/- 128MiB.  Used by B, BL.
 (BRANCH_LEN_P_128MiB  134217724)
 (BRANCH_LEN_N_128MiB -134217728)
 
 ;; +/- 1MiB.  Used by B., CBZ, CBNZ.
 (BRANCH_LEN_P_1MiB  1048572)
 (BRANCH_LEN_N_1MiB -1048576)
 
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
 
 ;; +/- 1KiB.  Used by CBB, CBH, CB.
 (BRANCH_LEN_P_1Kib  1020)
 (BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
 ;; ---
 ;; Conditional jumps
+;; The order of the rules below is important.
+;; Higher priority rules are preferred because they can express larger
+;; displacements.
+;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
+;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
+;; 3) When the CMPBR extension is enabled:
+;;   a) Comparisons between two registers are handled by
+;;  CBB/CBH/CB.
+;;   b) Comparisons between a GP register and an immediate in the range 0-63 
are
+;;  handled by CB (immediate).
+;; 4) Otherwise, emit a CMP+B sequence.
 ;; ---
 
 (define_expand "cbranch4"
@@ -770,14 +781,91 @@ (define_expand "cbranch4"
 (define_expand "cbranchcc4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (match_operand 2 "const0_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
   ""
 )
 
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
+  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
+(const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))
+   (clobber (reg:CC CC_REGNUM))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  {
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+ return aarch64_gen_far_branch (operands, 1, "Ltb",
+"\\t%0, , ");
+   else
+ {
+   char buf[64];
+   uint64_t val = ((uint64_t) 1)
+   << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
+   sprintf (buf, "tst\t%%0, %" PRId64, val);
+   output_asm_insn (buf, operands);
+   return "\t%l1";
+ }
+  }
+else
+  return "\t%0, , %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
+ (const_int 4)
+

[PATCH 08/10] AArch64: rules for CMPBR instructions

2025-05-16 Thread Karl Meakin

Add rules for lowering `cbranch4` to CBB/CBH/CB when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.md (BRANCH_LEN_P_1Kib): New constant.
(BRANCH_LEN_N_1Kib): Likewise.
(cbranch4): Emit CMPBR instructions if possible.
(cbranch4): New expand rule.
(*aarch64_cb): Likewise.
(*aarch64_cb): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
* config/aarch64/predicates.md (const_0_to_63_operand): New
predicate.
(aarch64_cb_immediate): Likewise.
(aarch64_cb_operand): Likewise.
(aarch64_cb_short_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: update tests.
---
 gcc/config/aarch64/aarch64.md|  87 +++-
 gcc/config/aarch64/iterators.md  |   5 +
 gcc/config/aarch64/predicates.md |  15 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 598 ---
 4 files changed, 311 insertions(+), 394 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b61e3e5a72f..0b708f8b2f6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -697,37 +697,60 @@ (define_insn "jump"
 ;; Maximum PC-relative positive/negative displacements for various branching
 ;; instructions.
 (define_constants
   [
 ;; +/- 128MiB.  Used by B, BL.
 (BRANCH_LEN_P_128MiB  134217724)
 (BRANCH_LEN_N_128MiB -134217728)
 
 ;; +/- 1MiB.  Used by B., CBZ, CBNZ.
 (BRANCH_LEN_P_1MiB  1048572)
 (BRANCH_LEN_N_1MiB -1048576)
 
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
+
+;; +/- 1KiB.  Used by CBB, CBH, CB.
+(BRANCH_LEN_P_1Kib  1020)
+(BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
 ;; ---
 ;; Conditional jumps
 ;; ---
 
-(define_expand "cbranch4"
+(define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], mode))
+{
+  emit_jump_insn (gen_aarch64_cb (operands[0], operands[1],
+   operands[2], operands[3]));
+  DONE;
+}
+  else
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+operands[1], operands[2]);
+  operands[2] = const0_rtx;
+}
+  }
+)
+
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 
"aarch64_cb_short_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  ""
 )
 
 (define_expand "cbranch4"
@@ -747,13 +770,65 @@ (define_expand "cbranch4"
 (define_expand "cbranchcc4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (match_operand 2 "const0_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
   ""
 )
 
+;; Emit a `CB (register)` or `CB (immediate)` instruction.
+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPI 1 "register_operand" "r")
+(match_operand:GPI 2 "aarch64_cb_operand" "ri")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb%m0\\t%1, %2, %l3";
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; Emit a `CBB (register)` or `CBH (register)` instruct

Re: [PATCH v22 0/3] c: Add _Countof and

2025-05-16 Thread Joseph Myers

On Fri, 16 May 2025, Alejandro Colomar wrote:

> -  Add  (and NDEBUG) to some test files that were missing it,
>and also the forward declaration of strcmp(3).

Depending on libc headers like this in tests is discouraged.  The usual 
idiom is to use abort () on failure of a runtime check (rather than 
assert) and to declare abort in the test (or use __builtin_abort to avoid 
needing the declaration).

-- 
Joseph S. Myers
josmy...@redhat.com

[PATCH 02/10] AArch64: reformat branch instruction rules

2025-05-16 Thread Karl Meakin

Make the formatting of the RTL templates in the rules for branch
instructions more consistent with each other.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Reformat.
(cbranchcc4): Likewise.
(condjump): Likewise.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 77 +--
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 874df262781..05d86595bb1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -705,229 +705,228 @@ (define_insn "jump"
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
-  (label_ref (match_operand 3 "" ""))
+  (label_ref (match_operand 3))
   (pc)))]
   ""
   "
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 operands[2]);
   operands[2] = const0_rtx;
   "
 )
 
 (define_expand "cbranch4"
-  [(set (pc) (if_then_else
-   (match_operator 0 "aarch64_comparison_operator"
-[(match_operand:GPF_F16 1 "register_operand")
- (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
-   (label_ref (match_operand 3 "" ""))
-   (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPF_F16 1 "register_operand")
+(match_operand:GPF_F16 2 
"aarch64_fp_compare_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "
+  {
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 operands[2]);
   operands[2] = const0_rtx;
-  "
+  }
 )
 
 (define_expand "cbranchcc4"
-  [(set (pc) (if_then_else
- (match_operator 0 "aarch64_comparison_operator"
-  [(match_operand 1 "cc_register")
-   (match_operand 2 "const0_operand")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register")
+(match_operand 2 "const0_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "")
+  ""
+)
 
 (define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (label_ref (match_operand 2 "" ""))
+   [(match_operand 1 "cc_register")
+(const_int 0)])
+  (label_ref (match_operand 2))
   (pc)))]
   ""
   {
 /* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
but the "." is required for SVE conditions.  */
 bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 2, "Lbcond",
 use_dot_p ? "b.%M0\\t" : "b%M0\\t");
 else
   return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
  (const_int 0)
  (const_int 1)))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
 ;; and branch sequence from:
 ;; mov x0, #imm1
 ;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
 ;; cmp x1, x0
 ;; b .Label
 ;; into the shorter:
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
 (define_insn_and_split "*compare_condjump"
-  [(set (pc) (if_then_else (EQL
- (match_operand:GPI 0 "register_operand" "r")
- (match_operand:GPI 1 "aarch64_imm24" "n"))
-  (label_ref:P (match_operand 2 "" ""))
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+

[PATCH] c++/modules: Clean up importer_interface

2025-05-16 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This patch removes some no longer needed special casing in linkage
determination, and makes the distinction between "always_emit" and
"internal" for better future-proofing.

gcc/cp/ChangeLog:

* module.cc (importer_interface): Adjust flags.
(get_importer_interface): Rename flags.
(trees_out::core_bools): Clean up special casing.
(trees_out::write_function_def): Rename flag.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 50 +---
 1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4f9c3788380..200e1c2deb3 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5546,8 +5546,10 @@ trees_in::start (unsigned code)
 
 enum class importer_interface {
   unknown,   /* The definition may or may not need to be emitted.  */
-  always_import,  /* The definition can always be found in another TU.  */
-  always_emit,   /* The definition must be emitted in the importer's TU. */
+  external,  /* The definition can always be found in another TU.  */
+  internal,  /* The definition should be emitted in the importer's TU.  */
+  always_emit,   /* The definition must be emitted in the importer's TU,
+regardless of if it's used or not. */
 };
 
 /* Returns what kind of interface an importer will have of DECL.  */
@@ -5558,13 +5560,13 @@ get_importer_interface (tree decl)
   /* Internal linkage entities must be emitted in each importer if
  there is a definition available.  */
   if (!TREE_PUBLIC (decl))
-return importer_interface::always_emit;
+return importer_interface::internal;
 
-  /* Entities that aren't vague linkage are either not definitions or
- will be emitted in this TU, so importers can just refer to an
- external definition.  */
+  /* Other entities that aren't vague linkage are either not definitions
+ or will be publicly emitted in this TU, so importers can just refer
+ to an external definition.  */
   if (!vague_linkage_p (decl))
-return importer_interface::always_import;
+return importer_interface::external;
 
   /* For explicit instantiations, importers can always rely on there
  being a definition in another TU, unless this is a definition
@@ -5574,13 +5576,13 @@ get_importer_interface (tree decl)
   && DECL_EXPLICIT_INSTANTIATION (decl))
 return (header_module_p () && !DECL_EXTERNAL (decl)
? importer_interface::always_emit
-   : importer_interface::always_import);
+   : importer_interface::external);
 
   /* A gnu_inline function is never emitted in any TU.  */
   if (TREE_CODE (decl) == FUNCTION_DECL
   && DECL_DECLARED_INLINE_P (decl)
   && lookup_attribute ("gnu_inline", DECL_ATTRIBUTES (decl)))
-return importer_interface::always_import;
+return importer_interface::external;
 
   /* Everything else has vague linkage.  */
   return importer_interface::unknown;
@@ -5722,29 +5724,13 @@ trees_out::core_bools (tree t, bits_out& bits)
   DECL_NOT_REALLY_EXTERN -> base.not_really_extern
 == that was a lie, it is here  */
 
+   /* decl_flag_1 is DECL_EXTERNAL. Things we emit here, might
+  well be external from the POV of an importer.  */
bool is_external = t->decl_common.decl_flag_1;
-   if (!is_external)
- /* decl_flag_1 is DECL_EXTERNAL. Things we emit here, might
-well be external from the POV of an importer.  */
- // FIXME: Do we need to know if this is a TEMPLATE_RESULT --
- // a flag from the caller?
- switch (code)
-   {
-   default:
- break;
-
-   case VAR_DECL:
- if (TREE_PUBLIC (t)
- && DECL_VTABLE_OR_VTT_P (t))
-   /* We handle vtable linkage specially.  */
-   is_external = true;
- gcc_fallthrough ();
-   case FUNCTION_DECL:
- if (get_importer_interface (t)
- == importer_interface::always_import)
-   is_external = true;
- break;
-   }
+   if (!is_external
+   && VAR_OR_FUNCTION_DECL_P (t)
+   && get_importer_interface (t) == importer_interface::external)
+ is_external = true;
WB (is_external);
   }
 
@@ -12651,7 +12637,7 @@ trees_out::write_function_def (tree decl)
   /* Whether the importer should emit this definition, if used.  */
   flags |= 1 * (DECL_NOT_REALLY_EXTERN (decl)
&& (get_importer_interface (decl)
-   != importer_interface::always_import));
+   != importer_interface::external));
 
   if (f)
{
-- 
2.47.0

Re: [PATCH] RISC-V: Since the loop increment i++ is unreachable, the loop body will never execute more than once

2025-05-16 Thread Jeff Law





On 5/16/25 1:32 AM, Jin Ma wrote:

Reported-by: huangcunjian 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gpr_save_operation_p): Remove
break and fixbug for elt index.
Ideally we'd have a testcase for whatever issue motivated this change, 
but it's pretty clear that the element index and failure to loop were 
just plain wrong code.  But OK even without a testcase.


jeff

Re: [PATCH] match: Don't allow folling statements that can throw internally [PR119903]

2025-05-16 Thread Andrew Pinski

On Fri, May 16, 2025 at 3:13 AM Richard Biener
 wrote:
>
> On Fri, May 9, 2025 at 5:00 AM Andrew Pinski  wrote:
> >
> > This removes the ability to follow statements that can throw internally.
> > This was suggested in bug report as a way to solve the issue here.
> > The overhead is not that high since without non-call exceptions turned
> > on, there is an early exit for non-calls.
>
> So - the testcase doesn't ICE for me any longer.
There was no ICE in the testcase, just the landing pad for the
exception was removed so we ended up with wrong code.

>  I know I suggested this
> as a possible solution but I don't like it very much - one reason is because
> stmt_can_throw_internal asserts on cfun != nullptr and conservative handling
> would mean we can't do any get_def () from IPA w/o switching it a function.
>
> The other reason is that we do get_def() a lot (yeah, it's very much quite
> unoptimized in genmatch generated code).  So there's work to do there
> (we also often valueize twice).
>
> It seems we can defer this now?  The bug was closed at least.

Yes we can defer this now because r16-600-geaee2df409ae40 was able to
fix the testcase too.

Thanks,
Andrew


>
> Richard.
>
> > PR tree-optimization/119903
> >
> > gcc/ChangeLog:
> >
> > * gimple-match-head.cc (get_def): Reject statements that can throw
> > internally.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/tree-ssa/pr119903-1.C: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/gimple-match-head.cc   |  5 -
> >  gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C | 24 ++
> >  2 files changed, 28 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
> >
> > diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> > index 6b3c5febbea..62ff8e57fbb 100644
> > --- a/gcc/gimple-match-head.cc
> > +++ b/gcc/gimple-match-head.cc
> > @@ -63,7 +63,10 @@ get_def (tree (*valueize)(tree), tree name)
> >  {
> >if (valueize && ! valueize (name))
> >  return NULL;
> > -  return SSA_NAME_DEF_STMT (name);
> > +  gimple *t = SSA_NAME_DEF_STMT (name);
> > +  if (stmt_can_throw_internal (cfun, t))
> > +return nullptr;
> > +  return t;
> >  }
> >
> >  /* Routine to determine if the types T1 and T2 are effectively
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C 
> > b/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
> > new file mode 100644
> > index 000..605f989a2eb
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
> > @@ -0,0 +1,24 @@
> > +// { dg-do compile { target c++11 } }
> > +// { dg-options "-O2 -fnon-call-exceptions -ftrapping-math 
> > -fdump-tree-optimized-eh" }
> > +
> > +// PR tree-optimization/119903
> > +// match and simplify would cause the internal throwable fp comparison
> > +// to become only external throwable and lose the landing pad.
> > +
> > +int f() noexcept;
> > +int g() noexcept;
> > +
> > +int m(double a)
> > +{
> > +  try {
> > +if (a < 1.0)
> > +  return f();
> > +return g();
> > +  }catch(...)
> > +  {
> > +return -1;
> > +  }
> > +}
> > +
> > +// Make sure There is a landing pad for the non-call exception from the 
> > comparison.
> > +// { dg-final { scan-tree-dump "LP " "optimized" } }
> > --
> > 2.43.0
> >

Re: gcc project build error in Ubuntu18.04

2025-05-16 Thread Andrew Pinski

On Fri, May 16, 2025 at 4:10 AM Kito Cheng  wrote:
>
> I am surprised that such generic names are defined within the system
> header files, I inclined just rename that to major_version,
> minor_version, could you send a patch for that?

major and minor come from extracting the major/minor parts of the
device ID; they have been around for a long time. Looks like an
include difference exposes them this time around.
https://linux.die.net/man/3/minor

Thanks,
Andrew

>
>
> On Fri, May 16, 2025 at 3:50 PM Songhe Zhu  
> wrote:
> >
> > Hi kito
> > When syncing GCC to the master branch and building it on Ubuntu 18.04, 
> > I encounter the following warnings and errors:
> > 
> > These issues arise because the values major and minor conflict with the 
> > macros major/minor defined in . To avoid compatibility 
> > problems when using a newer version of GCC on older Ubuntu environment, we 
> > propose three solutions:
> > 1. Undefine the macros temporarily::
> >
> > #pragma push_macro("major")
> > #undef major
> > #pragma push_macro("minor")
> > #undef minor
> > /* ... function code ... */
> > #pragma pop_macro("major")
> > #pragma pop_macro("minor")
> >
> > 2. Rename major/minor to non-conflicting names ?
> >
> > 3. Build in a newer Ubuntu environment (e.g., Ubuntu 22.04) to bypass 
> > legacy macro conflicts.
> > --We will try it.
> >
> > refernece patch: RISC-V: Generate extension table in documentation from 
> > riscv-ext.def · gcc-mirror/gcc@124cbbb
> >
> > Thanks a lot~
> >
> > 
> > zhuson...@eswincomputing.com

Re: [PATCH v1] RISC-V: Avoid scalar unsigned SAT_ADD test data duplication

2025-05-16 Thread Jeff Law





On 5/16/25 2:38 AM, pan2...@intel.com wrote:

From: Pan Li 

Some of the previous scalar unsigned SAT_ADD test data are
duplicated in different test files.  This patch would like to
move them into a shared header file, to avoid the test data
duplication.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add more helper macros.
* gcc.target/riscv/sat/sat_arith_data.h: Add the test data
for scalar unsigned SAT_ADD.
* gcc.target/riscv/sat/sat_u_add-run-1-u16.c: Leverage the test
data from the shared header file.
* gcc.target/riscv/sat/sat_u_add-run-1-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-1-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-1-u8.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-2-u16.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-2-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-2-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-2-u8.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-3-u16.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-3-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-3-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-3-u8.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-4-u16.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-4-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-4-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-4-u8.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-5-u16.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-5-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-5-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-5-u8.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-6-u16.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-6-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-6-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-6-u8.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-7-u16-from-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-7-u16-from-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-7-u32-from-u64.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-7-u8-from-u16.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-7-u8-from-u32.c: Ditto
* gcc.target/riscv/sat/sat_u_add-run-7-u8-from-u64.c: Ditto

OK
jeff

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-16 Thread Richard Biener

On Fri, 16 May 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > Targets recently got the ability to request the vector mode to be
> > used for a vector epilogue (or the epilogue of a vector epilogue).  The
> > following adds the ability for it to indicate the epilogue should use
> > loop masking, irrespective of the --param vect-partial-vector-usage
> > setting.
> 
> Is overriding the --param a good idea?  I can imagine we'd eventually
> want a --param to override the override of the --param. :)

;)  So for x86 my plan is to use options_set to check if the user
explicitly specified the --param and in this case honor it, but
I don't want to have the default of the --param to affect all loops
in the same way.  So the default for x86 would be to let target
heuristics decide.

> > The simple prototype below uses a separate flag from the epilogue
> > mode, but I wonder how we want to more generally want to handle
> > whether to use masking or not when iterating over modes.  Currently
> > we mostly rely on --param vect-partial-vector-usage.  aarch64
> > and riscv have both variable-length modes but also fixed-size modes
> > where for the latter, like on x86, the target couldn't request
> > a mode specifically with or without masking.  It seems both
> > aarch64 and riscv fully rely on cost comparison and fully
> > exploiting the mode iteration space (but not masked vs. non-masked?!)
> > here?
> >
> > I was thinking of adding a vectorization_mode class that would
> > encapsulate the mode and whether to allow masking or alternatively
> > to make the vector_modes array (and the m_suggested_epilogue_mode)
> > a std::pair of mode and mask flag?
> 
> Predicated vs. non-predicated SVE is interesting for the main loop.
> The class sounds like it would be useful for that.
> 
> I suppose predicated vs. non-predicated SVE is also potentially
> interesting for an unrolled epilogue, although there, it would in
> theory be better to predicate only the last vector iteration
> (i.e. part predicated, part unpredicated).

Yes, the latter is what we want for AVX512, keep the main loop
not predicated but have the epilog predicated (using the same VF).

> So I suppose unpredicated SVE epilogue loops might be interesting
> until that partial predication is implemented, but I'm not sure how
> useful unpredicated SVE epilogue loops would be "once" the partial
> predication is supported.
> 
> I don't imagine we'll often know a priori for AArch64 which type
> of vector epilogue is best.  Since switching between SVE and
> Advanced SIMD is assumed to be essentially free, I think we'll
> still rely on the current approach of costing both and seeing
> which is cheaper.

So the other case we might run into on x86 is if you have a
known loop tripcount but fully vectorizing the epilogue is
still not possible because while we have half-SSE, like V8QImode,
we don't have V4QI or V2QI, so even with multiple epilogues
we'd still end up with an iterating scalar epilog.  Those
cases might be good candidates for a predicated epilog as well.
So in the end we'd prefer branchless epilogues.

Predication on x86 is quite a bit more expensive so I don't see
us using a predicated main vector loop anytime soon, and I'd
expect that to be the case for all archs when using a fixed-size
mode?  Is that the case for -msve-vector-bits=X as well?  Is
there an advantage for not using a predicated main vector loop?

Richard.

> Thanks,
> Richard
> 
> >
> > For the x86 case going the prototype way would be sufficient, we
> > wouldn't want to say use a masked AVX epilogue for a AVX512 loop,
> > so any further iteration on epilogue modes if the requested mode
> > would fail to vectorize is OK to be unmasked.
> >
> > Any comments on this?  You are not yet using m_suggested_epilogue_mode
> > to get more than one vector epilogue, this might be a way to add
> > heuristics when to use a masked epilogue.
> >
> > Thanks,
> > Richard.
> >
> > * tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
> > Add masked output parameter and return m_masked_epilogue.
> > (vector_costs::m_masked_epilogue): New tristate flag.
> > (vector_costs::vector_costs): Initialize m_masked_epilogue.
> > * tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
> > flag to optionally initialize can_use_partial_vectors_p.
> > (vect_analyze_loop): For epilogues also get whether to use
> > a masked epilogue for this loop from the target and use
> > that for the first epilogue mode we try.
> > ---
> >  gcc/tree-vect-loop.cc | 29 +
> >  gcc/tree-vectorizer.h | 12 +---
> >  2 files changed, 30 insertions(+), 11 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 2d1a6883e6b..4af510ff20c 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop, 
> > vec_info_shared *shared,
> >  const vect_loo

Re: [PATCH 2/2] forwprop: Add alias walk limit to optimize_memcpy_to_memset.

2025-05-16 Thread Richard Biener

On Wed, May 14, 2025 at 5:01 PM Andrew Pinski  wrote:
>
> As sugguested in 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681507.html,
> this adds the aliasing walk limit.

OK for both.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Add a limit on 
> the alias walk.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssa-forwprop.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index 71de99f46ff..0f52e8fe6ef 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -1216,13 +1216,15 @@ optimize_memcpy_to_memset (gimple_stmt_iterator 
> *gsip, tree dest, tree src, tree
>ao_ref_init (&read, src);
>tree vuse = gimple_vuse (stmt);
>gimple *defstmt;
> +  unsigned limit = param_sccvn_max_alias_queries_per_access;
>do {
>  if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
>return false;
>  defstmt = SSA_NAME_DEF_STMT (vuse);
>  if (is_a (defstmt))
>return false;
> -
> +if (limit-- == 0)
> +  return false;
>  /* If the len was null, then we can use TBBA. */
>  if (stmt_may_clobber_ref_p_1 (defstmt, &read,
>   /* tbaa_p = */ len_was_null))
> --
> 2.43.0
>

Re: [PATCH] aarch64: Fix narrowing warning in driver-aarch64.cc [PR118603]

2025-05-16 Thread Kyrylo Tkachov




> On 10 May 2025, at 06:17, Andrew Pinski  wrote:
> 
> Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for
> the variant, it is just easier to add the cast to unsigned in the usage
> in driver-aarch64.cc.
> 
> Build and tested on aarch64-linux-gnu.

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
> PR target/118603
> * config/aarch64/driver-aarch64.cc (aarch64_cpu_data): Add cast to unsigned
> to VARIANT of the define AARCH64_CORE.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/config/aarch64/driver-aarch64.cc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/driver-aarch64.cc 
> b/gcc/config/aarch64/driver-aarch64.cc
> index 9d99554dbc2..0333746ee00 100644
> --- a/gcc/config/aarch64/driver-aarch64.cc
> +++ b/gcc/config/aarch64/driver-aarch64.cc
> @@ -63,7 +63,7 @@ struct aarch64_core_data
> #define DEFAULT_CPU "generic-armv8-a"
> 
> #define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
> PART, VARIANT) \
> -  { CORE_NAME, #ARCH, IMP, PART, VARIANT, feature_deps::cpu_##CORE_IDENT },
> +  { CORE_NAME, #ARCH, IMP, PART, unsigned(VARIANT), 
> feature_deps::cpu_##CORE_IDENT },
> 
> static CONSTEXPR const aarch64_core_data aarch64_cpu_data[] =
> {
> -- 
> 2.43.0
>

Re: [PATCH] match: Allow some optional casts for boolean comparisons

2025-05-16 Thread Richard Biener

On Thu, May 15, 2025 at 5:55 AM Andrew Pinski  wrote:
>
> On Wed, May 14, 2025 at 7:39 PM Andrew Pinski  
> wrote:
> >
> > This is the next step in removing forward_propagate_into_comparison
> > and forward_propagate_into_gimple_cond; In the case of `((int)(a cmp b)) != 
> > 0`
> > we want to do the transformation to `a cmp b` even if the cast is used 
> > twice.
> > This is exactly what 
> > forward_propagate_into_comparison/forward_propagate_into_gimple_cond
> > do and does the copy.
>
> Actually I am thinking we should change:
> this set of patterns:
> ```
> (for cmp (simple_comparison)
>  (simplify
>   (cmp (convert@0 @00) (convert?@1 @10))
>   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>/* Disable this optimization if we're casting a function pointer
>   type on targets that require function pointer canonicalization.  */
>&& !(targetm.have_canonicalize_funcptr_for_compare ()
> && ((POINTER_TYPE_P (TREE_TYPE (@00))
> && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@00
> || (POINTER_TYPE_P (TREE_TYPE (@10))
> && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10))
>&& single_use (@0))
> ```
> In the case of:
> (if (TREE_CODE (@1) == INTEGER_CST)
>  (cmp @00 ...)
> We don't need to care if @0 is single_use or not as we remove one cast.
> Plus all of the cases where we produce constants don't care about
> single_use either.
>
> So let's ignore this patch for now. I will get back to it tomorrow.

I have a recollection that I changed exactly one similar place to defer
the single_use () checks to the result expressions that are not "simple" ...

IIRC I had that changed :S flag handling for this - enforce single-use
when the result isn't "simple" (much similar to how we have ! now),
but IIRC we concluded we don't need this.

PR118483 was the PR where this mattered, the match.pd change I
had for this was the following:

diff --git a/gcc/match.pd b/gcc/match.pd
index f4050687647..8ad574e7404 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7562,7 +7562,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
FIXME: the lack of symmetry is disturbing.  */
 (for cmp (simple_comparison)
  (simplify
-  (cmp (convert@0 @00) (convert?@1 @10))
+  (cmp (convert:S@0 @00) (convert?@1 @10))
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
/* Disable this optimization if we're casting a function pointer
  type on targets that require function pointer canonicalization.  */
@@ -7570,8 +7570,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& ((POINTER_TYPE_P (TREE_TYPE (@00))
 && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@00
|| (POINTER_TYPE_P (TREE_TYPE (@10))
-   && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10))
-   && single_use (@0))
+   && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10)))
(if (TYPE_PRECISION (TREE_TYPE (@00)) == TYPE_PRECISION (TREE_TYPE (@0))
&& (TREE_CODE (@10) == INTEGER_CST
|| @1 != @10)

because the

(if (above || below)
 (if (cmp == EQ_EXPR || cmp == NE_EXPR)
  { constant_boolean_node (cmp == EQ_EXPR ? false : true, type); }
  (if (cmp == LT_EXPR || cmp == LE_EXPR)
   { constant_boolean_node (above ? true : false, type); }
   (if (cmp == GT_EXPR || cmp == GE_EXPR)
{ constant_boolean_node (above ? false : true, type); })

cases at least would be unproblematic and OK when !single_use.  That's exactly
what you noticed.  So moving that check down to where it matters works as well
and is pre-approved.

Richard.

>
> Thanks,
> Andrew
>
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`(a cmp b) != false`, `(a cmp b) == true`,
> > `(a cmp b) != true`, `(a cmp b) == false`): Allow an
> > optional cast between the comparison and the eq/ne.
> > (`bool_val != false`, `bool_val == true`): Allow an optional
> > cast between the bool_val and the ne/eq.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/match.pd | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 79485f9678a..ffb1695e6e6 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6913,15 +6913,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (ncmp @0 @1)
> >   /* The following bits are handled by fold_binary_op_with_conditional_arg. 
> >  */
> >   (simplify
> > -  (ne (cmp@2 @0 @1) integer_zerop)
> > +  (ne (convert? (cmp@2 @0 @1)) integer_zerop)
> >(if (types_match (type, TREE_TYPE (@2)))
> > (cmp @0 @1)))
> >   (simplify
> > -  (eq (cmp@2 @0 @1) integer_truep)
> > +  (eq (convert? (cmp@2 @0 @1)) integer_truep)
> >(if (types_match (type, TREE_TYPE (@2)))
> > (cmp @0 @1)))
> >   (simplify
> > -  (ne (cmp@2 @0 @1) integer_truep)
> > +  (ne (convert? (cmp@2 @0 @1)) integer_truep)
> >(if (types_match (type, TREE_TYPE (@2)))
> > (with { en

[PATCH 01/10] AArch64: place branch instruction rules together

2025-05-16 Thread Karl Meakin

The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch4`
is lowered to RTL.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Move.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 387 ++
 1 file changed, 201 insertions(+), 186 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6dbc9faf713..874df262781 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -674,6 +674,10 @@ (define_insn "aarch64_write_sysregti"
  "msrr\t%x0, %x1, %H1"
 )
 
+;; ---
+;; Unconditional jumps
+;; ---
+
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
@@ -692,6 +696,12 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+
+
+;; ---
+;; Conditional jumps
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -731,6 +741,197 @@ (define_expand "cbranchcc4"
   ""
   "")
 
+(define_insn "condjump"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register" "") (const_int 0)])
+  (label_ref (match_operand 2 "" ""))
+  (pc)))]
+  ""
+  {
+/* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
+   but the "." is required for SVE conditions.  */
+bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond",
+use_dot_p ? "b.%M0\\t" : "b%M0\\t");
+else
+  return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 0)
+ (const_int 1)))]
+)
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, x1, #(CST & 0xfff000)
+;; subsx0, x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+ (match_operand:GPI 0 "register_operand" "r")
+ (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2 "" ""))
+  (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
+ cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_cb1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minu

Re: [PATCH] aarch64: Fix narrowing warning in aarch64_detect_vector_stmt_subtype

2025-05-16 Thread Kyrylo Tkachov




> On 10 May 2025, at 05:59, Andrew Pinski  wrote:
> 
> There is a narrowing warning in aarch64_detect_vector_stmt_subtype
> about gather_load_x32_cost and gather_load_x64_cost converting from int to 
> unsigned.
> These fields are always unsigned and even the constructor for sve_vec_cost 
> takes
> an unsigned. So let's just move the fields over to unsigned.
> 
> Build and tested for aarch64-linux-gnu.

Ok, I had noticed that annoying warning too.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-protos.h (struct sve_vec_cost): Change 
> gather_load_x32_cost
> and gather_load_x64_cost fields to unsigned.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/config/aarch64/aarch64-protos.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index c935e7bcf33..b59eecf5bdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -290,8 +290,8 @@ struct sve_vec_cost : simd_vec_cost
> 
>   /* The cost of a gather load instruction.  The x32 value is for loads
>  of 32-bit elements and the x64 value is for loads of 64-bit elements.  */
> -  const int gather_load_x32_cost;
> -  const int gather_load_x64_cost;
> +  const unsigned int gather_load_x32_cost;
> +  const unsigned int gather_load_x64_cost;
> 
>   /* Additional loop initialization cost of using a gather load instruction.  
> The x32
>  value is for loads of 32-bit elements and the x64 value is for loads of
> -- 
> 2.43.0
>

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-16 Thread Robin Dapp


I was thinking of adding a vectorization_mode class that would
encapsulate the mode and whether to allow masking or alternatively
to make the vector_modes array (and the m_suggested_epilogue_mode)
a std::pair of mode and mask flag?


Without having a very strong opinion (or the full background) on this, the pair 
approach seems useful to me.  The x86 and aarch64 port being more mature 
experimentation might be less important but for riscv we definitely still 
explore the full "solution space" of all possible modes for main loop as well 
as epilogue.


As mode selection is already under target control I recently (locally, not 
upstream) added a riscv-specific --param=autovec-mode for this exact purpose so 
we can choose a single vector mode and improve it in isolation.


Adding the additional flexibility of switching partial/length vectors per mode 
on and off would IMHO help in that regard.  And, while many of the available 
modes might indeed be "disposable" right away I find the option of trying out 
everything attractive as it also allows to learn about the behavior of code 
between loop and epilog (like extracts).
IIRC LLVM also has the option to freely chose loop and epilog strategy, not 
sure about the granularity, though.  We have this as well, just not very 
accessible right now, and I can imagine the pair approach would be a helpful 
step in making it more accessible.


--
Regards
Robin

Re: [PATCH] libgcobol: Add multilib support

2025-05-16 Thread Rainer Orth

Richard Biener  writes:

> On Wed, May 14, 2025 at 6:29 PM James K. Lowden
>  wrote:
>>
>> On Wed, 14 May 2025 11:04:50 +0200
>> Rainer Orth  wrote:
>>
>> > Work around what appears to be a GNU make bug handling MAKEFLAGS
>>
>> Before I say Yes, could someone please tell me why this rumored bug is
>> responsible for so much boilerplate in our Makefile.am files?  You
>> say,
>>
>> > Unlike some runtime libs that can get away without setting
>> > AM_MAKEFLAGS and friends, libgcobol can not since it then tries to
>> > link the 64-bit libgcobol with 32-bit libstdc++.
>>
>> but I don't see the connection between that and 20 lines of definition
>> resting on "what appears to be a bug".

I certainly agree that this is quite a mouthful.

>> I guess I can live with "no one knows, that's what we do."  But I'm
>> sure I'm not alone in preferring to understand how the build builds.
>
> That's the case ... though this boilerplate is not used consistently.

Well, I guess variables are added that are only needed in certain
runtime libs (like GDC* stuff in libphobos).  And, as it so often
happens in GCC development, were copied from another instance when a new
runtime lib was added.

In hindsight, I wonder if we couldn't move the common parts to
multilib.am and only add the specific ones in the runtime libs that need
them.  Haven't tried that yet: my immediate concern is getting the build
working.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH v2] c++, coroutines: Address CWG2563 return value init [PR119916].

2025-05-16 Thread Iain Sandoe

>>+  /* We must manage the cleanups ourselves, because the responsibility for
>>+ them changes after the initial suspend.  However, any use of
>>+ cxx_maybe_build_cleanup () can set the throwing_cleanup flag.  */
>>+  cp_function_chain->throwing_cleanup = false;

>Hmm...what if the gro cleanup throws after initializing the (different type) 
>return value?  That seems like a case that we need throwing_cleanup set for.

So I moved this to the position before the g_r_o is initialized
(since we only manage cleanups of the entities that come before that, although
 that's a bit hard to see from the patch).

>>@@ -5245,8 +5195,11 @@ cp_coroutine_transform::build_ramp_function ()
>> tree not_iarc
>>  = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
>>+  tree do_cleanup = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node,
>>+ not_iarc, coro_before_return);

>As with the 14 patch, this should be reversed.

Yes, the same goof was C&P to several places.

OK for trunk now?
thanks
Iain

--- 8< ---

This addresses the clarification that, when the get_return_object is of a
different type from the ramp return, any necessary conversions should be
performed on the return expression (so that they typically occur after the
function body has started execution).

PR c++/119916

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Do not
initialise initial_await_resume_called here...
(cp_coroutine_transform::build_ramp_function): ... but here.
When the coroutine is not void, initialize a GRO object from
promise.get_return_object().  Use this as the argument to the
return expression.  Use a regular cleanup for the GRO, since
it is ramp-local.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/special-termination-00-sync-completion.C:
Amend for CWG2563 expected behaviour.
* g++.dg/coroutines/torture/special-termination-01-self-destruct.C:
Likewise.
* g++.dg/coroutines/torture/pr119916.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc  | 125 ++
 .../g++.dg/coroutines/torture/pr119916.C  |  66 +
 .../special-termination-00-sync-completion.C  |   2 +-
 .../special-termination-01-self-destruct.C|   2 +-
 4 files changed, 108 insertions(+), 87 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr119916.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 743da068e35..6169a81cea5 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4451,7 +4451,7 @@ cp_coroutine_transform::wrap_original_function_body ()
   tree i_a_r_c
= coro_build_artificial_var (loc, coro_frame_i_a_r_c_id,
 boolean_type_node, orig_fn_decl,
-boolean_false_node);
+NULL_TREE);
   DECL_CHAIN (i_a_r_c) = var_list;
   var_list = i_a_r_c;
   add_decl_expr (i_a_r_c);
@@ -4867,7 +4867,6 @@ cp_coroutine_transform::build_ramp_function ()
   add_decl_expr (coro_fp);
 
   tree coro_promise_live = NULL_TREE;
-  tree coro_gro_live = NULL_TREE;
   if (flag_exceptions)
 {
   /* Signal that we need to clean up the promise object on exception.  */
@@ -4876,13 +4875,6 @@ cp_coroutine_transform::build_ramp_function ()
  boolean_type_node, orig_fn_decl,
  boolean_false_node);
 
-  /* When the get-return-object is in the RETURN slot, we need to arrange
-for cleanup on exception.  */
-  coro_gro_live
-   = coro_build_and_push_artificial_var (loc, "_Coro_gro_live",
- boolean_type_node, orig_fn_decl,
- boolean_false_node);
-
   /* To signal that we need to cleanup copied function args.  */
   if (DECL_ARGUMENTS (orig_fn_decl))
for (tree arg = DECL_ARGUMENTS (orig_fn_decl); arg != NULL;
@@ -4970,13 +4962,19 @@ cp_coroutine_transform::build_ramp_function ()
   tree ramp_try_block = NULL_TREE;
   tree ramp_try_stmts = NULL_TREE;
   tree iarc_x = NULL_TREE;
+  tree coro_before_return = NULL_TREE;
   if (flag_exceptions)
 {
+  coro_before_return
+   = coro_build_and_push_artificial_var (loc, "_Coro_before_return",
+ boolean_type_node, orig_fn_decl,
+ boolean_true_node);
   iarc_x
= coro_build_and_push_artificial_var_with_dve (loc,
   coro_frame_i_a_r_c_id,
   boolean_type_node,
-  orig_fn_decl, NULL_TREE,
+

[PATCH v2] c++, coroutines: Use decltype(auto) for the g_r_o.

2025-05-16 Thread Iain Sandoe

Hi Jason,

>>+ returned reference or prvalue result object ...
>>+ When we use a local to hold this, it is decltype(auto).  */
>>+  tree gro_type
>>+= finish_decltype_type (get_ro, /*id_expression_or_member_access_p*/true,

>This should be false, not true; a call is not an id-expr or member access.

fixed.

>> }
>> -  /* Initialize the resume_idx_var to 0, meaning "not started".  */
>>-  coro_build_and_push_artificial_var_with_dve
>>-(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
>>- build_zero_cst (short_unsigned_type_node), deref_fp);

>Moving this initialization doesn't seem connected to the type of gro, or 
>mentioned above?

A fly-by tidy up.. removed.

>>-promise_live = true;
>>+promise_life++;
>> }

>Please add a Promise copy constructor, whether deleted or defined so that 
>promise_life is accurate if it's called.

Done.

OK for trunk now?
thanks
Iain

--- 8<--

The revised wording for coroutines, uses decltype(auto) for the
type of the get return object, which preserves references. The
test is expected to fail, since it attempts to initialize the
return object from an object that has already been destroyed.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Use
decltype(auto) to determine the type of the temporary
get_return_object.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Count promise construction
and destruction.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc   | 22 ---
 gcc/testsuite/g++.dg/coroutines/pr115908.C | 74 +++---
 2 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 6169a81cea5..bd51c31e615 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -5120,8 +5120,11 @@ cp_coroutine_transform::build_ramp_function ()
   /* Check for a bad get return object type.
  [dcl.fct.def.coroutine] / 7 requires:
  The expression promise.get_return_object() is used to initialize the
- returned reference or prvalue result object ... */
-  tree gro_type = TREE_TYPE (get_ro);
+ returned reference or prvalue result object ...
+ When we use a local to hold this, it is decltype(auto).  */
+  tree gro_type
+= finish_decltype_type (get_ro, /*id_expression_or_member_access_p*/false,
+   tf_warning_or_error); // TREE_TYPE (get_ro);
   if (VOID_TYPE_P (gro_type) && !void_ramp_p)
 {
   error_at (fn_start, "no viable conversion from % provided by"
@@ -5129,11 +5132,6 @@ cp_coroutine_transform::build_ramp_function ()
   return false;
 }
 
-  /* Initialize the resume_idx_var to 0, meaning "not started".  */
-  coro_build_and_push_artificial_var_with_dve
-(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
- build_zero_cst (short_unsigned_type_node), deref_fp);
-
   /* We must manage the cleanups ourselves, with the exception of the g_r_o,
  because the responsibility for them changes after the initial suspend.
  However, any use of cxx_maybe_build_cleanup () in preceding code can
@@ -5159,7 +5157,7 @@ cp_coroutine_transform::build_ramp_function ()
= coro_build_and_push_artificial_var (loc, "_Coro_gro", gro_type,
  orig_fn_decl, NULL_TREE);
 
-  r = cp_build_init_expr (coro_gro, get_ro);
+  r = cp_build_init_expr (coro_gro, STRIP_REFERENCE_REF (get_ro));
   finish_expr_stmt (r);
   tree coro_gro_cleanup
= cxx_maybe_build_cleanup (coro_gro, tf_warning_or_error);
@@ -5167,6 +5165,11 @@ cp_coroutine_transform::build_ramp_function ()
push_cleanup (coro_gro, coro_gro_cleanup, /*eh_only*/false);
 }
 
+  /* Initialize the resume_idx_var to 0, meaning "not started".  */
+  coro_build_and_push_artificial_var_with_dve
+(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
+ build_zero_cst (short_unsigned_type_node), deref_fp);
+
   /* Start the coroutine body.  */
   r = build_call_expr_loc (fn_start, resumer, 1, coro_fp);
   finish_expr_stmt (r);
@@ -5180,7 +5183,8 @@ cp_coroutine_transform::build_ramp_function ()
   /* The ramp is done, we just need the return statement, which we build from
  the return object we constructed before we called the function body.  */
 
-  finish_return_stmt (void_ramp_p ? NULL_TREE : coro_gro);
+  r = void_ramp_p ? NULL_TREE : convert_from_reference (coro_gro);
+  finish_return_stmt (r);
 
   if (flag_exceptions)
 {
diff --git a/gcc/testsuite/g++.dg/coroutines/pr115908.C 
b/gcc/testsuite/g++.dg/coroutines/pr115908.C
index ac27d916de2..d8903748e40 100644
--- a/gcc/testsuite/g++.dg/coroutines/pr115908.C
+++ b/gcc/testsuite/g++.dg/coroutines/pr115908.C
@@ -6,23 +6,26 @@
 
 struct Promise;
 
-bool promise_live = false;
+int promise_life = 0;
 
 struct Handle : std::coroutine_handle {
+
+/* We now expect t

[PATCH 5/5 v2] c++, coroutines: Clean up the ramp cleanups.

2025-05-16 Thread Iain Sandoe

Hi Jason,

>>+ = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
>>+  do_fr_cleanup = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node,
>>+   do_fr_cleanup, coro_before_return);

>This also needs reversing (and similarly below).

Fixed.

>>+  tree fr_cleanup_if = begin_if_stmt ();
>>+  finish_if_stmt_cond (do_fr_cleanup, fr_cleanup_if);
>>+  finish_expr_stmt (delete_frame_call);
>>+  finish_then_clause (fr_cleanup_if);
>>+  finish_if_stmt (fr_cleanup_if);

>You could build a COND_EXPR instead of taking several statements to build an 
>IF_STMT?  i.e.
>
>frame_cleanup = build3 (COND_EXPR, void_type_node, fr_cleanup_if,
>   delete_frame_call, void_node);

done.

OK for trunk now?
thanks
Iain

--- 8< ---

This replaces the cleanup try-catch block in the ramp with a series of
eh-only cleanup statements.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Replace ramp
cleanup try-catch block with eh-only cleanup statements.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc | 200 ++-
 1 file changed, 63 insertions(+), 137 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index bd51c31e615..33e3c1c377d 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4866,39 +4866,6 @@ cp_coroutine_transform::build_ramp_function ()
   coro_fp = pushdecl (coro_fp);
   add_decl_expr (coro_fp);
 
-  tree coro_promise_live = NULL_TREE;
-  if (flag_exceptions)
-{
-  /* Signal that we need to clean up the promise object on exception.  */
-  coro_promise_live
-   = coro_build_and_push_artificial_var (loc, "_Coro_promise_live",
- boolean_type_node, orig_fn_decl,
- boolean_false_node);
-
-  /* To signal that we need to cleanup copied function args.  */
-  if (DECL_ARGUMENTS (orig_fn_decl))
-   for (tree arg = DECL_ARGUMENTS (orig_fn_decl); arg != NULL;
-arg = DECL_CHAIN (arg))
- {
-   param_info *parm_i = param_uses.get (arg);
-   if (parm_i->trivial_dtor)
- continue;
-   parm_i->guard_var = pushdecl (parm_i->guard_var);
-   add_decl_expr (parm_i->guard_var);
- }
-}
-
-  /* deref the frame pointer, to use in member access code.  */
-  tree deref_fp
-= cp_build_indirect_ref (loc, coro_fp, RO_UNARY_STAR,
-tf_warning_or_error);
-  tree frame_needs_free
-= coro_build_and_push_artificial_var_with_dve (loc,
-  coro_frame_needs_free_id,
-  boolean_type_node,
-  orig_fn_decl, NULL_TREE,
-  deref_fp);
-
   /* Build the frame.  */
 
   /* The CO_FRAME internal function is a mechanism to allow the middle end
@@ -4942,25 +4909,24 @@ cp_coroutine_transform::build_ramp_function ()
   finish_if_stmt (if_stmt);
 }
 
+  /* deref the frame pointer, to use in member access code.  */
+  tree deref_fp
+= cp_build_indirect_ref (loc, coro_fp, RO_UNARY_STAR,
+tf_warning_or_error);
+
   /* For now, once allocation has succeeded we always assume that this needs
  destruction, there's no impl. for frame allocation elision.  */
-  r = cp_build_init_expr (frame_needs_free, boolean_true_node);
-  finish_expr_stmt (r);
-
-  /* Set up the promise.  */
-  tree p
-= coro_build_and_push_artificial_var_with_dve (loc, coro_promise_id,
-  promise_type, orig_fn_decl,
-  NULL_TREE, deref_fp);
+  tree frame_needs_free
+= coro_build_and_push_artificial_var_with_dve (loc,
+  coro_frame_needs_free_id,
+  boolean_type_node,
+  orig_fn_decl,
+  boolean_true_node,
+  deref_fp);
+  /* Although it appears to be unused here the frame entry is needed and we
+ just set it true.  */
+  TREE_USED (frame_needs_free) = true;
 
-  /* Up to now any exception thrown will propagate directly to the caller.
- This is OK since the only source of such exceptions would be in allocation
- of the coroutine frame, and therefore the ramp will not have initialized
- any further state.  From here, we will track state that needs explicit
- destruction in the case that promise or g.r.o setup fails or an exception
- is thrown from the initial suspend expression.  */
-  tree ramp_try_block = NULL_TREE;
-  tree ramp_try_stmts = NULL_TREE;
   tree iarc_x = NULL_TREE;
   tree

Re: [COMMITTED] PR tee-optimization/120277 - Check for casts becoming UNDEFINED.

2025-05-16 Thread Andrew MacLeod


On 5/16/25 02:35, Richard Biener wrote:

On Thu, May 15, 2025 at 7:02 PM Andrew MacLeod  wrote:

Recent changes to get_range_from_bitmask can sometimes turn a small
range into an undefined one if the bitmask indicates the bits make all
values impossible.

range_cast () was not expecting this and checks for UNDEFINED before
peforming the cast.   It also needs to check for it after the cast now.

in this testcase, the pattern is

   y = x * 4   <- we know y will have the bottom 2 bits cleared
   z = Y + 7   <- we know z will have the bottom 2 bit set.

then a switch checks for z == 128 | z== 129 and performs a store into
*(int *)y
eventually the store is eliminated as unreachable,  but range analysis
recognizes that the value is UNDEFINED when [121, 122] with the last 2
bits having to be 11 is calculated :-P

Do the default for casts, and if the result is UNDEFINED, turn it into
VARYING.

Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

For unreachable code it probably does not matter much. but IMO dropping
from UNDEFINED to VARYING within the core ranger is pessimizing
and could end up papering over issues that would otherwise show up.

IMO such UNDEFINED -> VARYING should happen at the consumer
side or alternatively a uppermost API layer that's not used from within
ranger itself.

Richard.



Its been a while since I visited this, but for most of the fold 
operations  we turn UNDEFINED into VARYING because of the potential 
double meaning of UNDEFINED.  ISTR that an uninitialized local can also 
show up as UNDEFINED, but we are not allowed to actually remove uses of 
it, we have to treat it as VARYING or bad things happen.  Its value 
truly can be VARYING in that case because its  a garbage value.


Most operations start with:
    if (empty_range_varying (r, type, op1, op2))
  return true;
which checks if either operand is UNDEFINED and sets the return result 
to VARYING before going any further.  In the case of the fold operation 
in the patch, it did that, but then the value changed to UNDEFINED 
during processing of the cast, so i needed to check again.


Anything that is truly UNDEFINED will have an edge leading to it that is 
unexecutable and should be removed somewhere.


Andrew

Re: [PATCH] s390: Floating point vector lane handling

2025-05-16 Thread Stefan Schulze Frielinghaus

On Wed, May 14, 2025 at 04:30:35PM +0200, Juergen Christ wrote:
> Since floating point and vector registers overlap on s390, more
> efficient code can be generated to extract FPRs from VRs.
> Additionally, for double vectors, more efficient code can be generated
> to load specific lanes.
> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md (VF): New mode iterator.
>   (VEC_SET_NONFLOAT): New mode iterator.
>   (VEC_SET_SINGLEFLOAT): New mode iterator.
>   (*vec_set): Split pattern in two.
>   (*vec_setv2df): Extract special handling for V2DF mode.
>   (*vec_extract): Split pattern in two.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/vec-extract-1.c: New test.
>   * gcc.target/s390/vector/vec-set-1.c: New test.
> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/config/s390/vector.md | 135 -
>  .../gcc.target/s390/vector/vec-extract-1.c| 190 ++
>  .../gcc.target/s390/vector/vec-set-1.c|  67 ++
>  3 files changed, 381 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-set-1.c
> 
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index e29255fe1116..580cf6fc71f6 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -75,6 +75,8 @@
>  V1DF V2DF
>  (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
>  
> +(define_mode_iterator VF [(V2SF "TARGET_VXE") (V4SF "TARGET_VXE") V2DF])
> +
>  ; All modes present in V_HW1 and VFT.
>  (define_mode_iterator V_HW1_FT [V16QI V8HI V4SI V2DI V1TI V1DF
>  V2DF (V1SF "TARGET_VXE") (V2SF "TARGET_VXE")
> @@ -506,26 +508,90 @@
>  UNSPEC_VEC_SET))]
>"TARGET_VX")
>  
> +; Iterator for vec_set that does not use special float/vect overlay tricks
> +(define_mode_iterator VEC_SET_NONFLOAT
> +  [V1QI V2QI V4QI V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI V1DI V2DI 
> V2SF V4SF])
> +; Iterator for single element float vectors
> +(define_mode_iterator VEC_SET_SINGLEFLOAT [(V1SF "TARGET_VXE") V1DF (V1TF 
> "TARGET_VXE")])
> +
>  ; FIXME: Support also vector mode operands for 1
>  ; FIXME: A target memory operand seems to be useful otherwise we end
>  ; up with vl vlvgg vst.  Shouldn't the middle-end be able to handle
>  ; that itself?
>  ; vlvgb, vlvgh, vlvgf, vlvgg, vleb, vleh, vlef, vleg, vleib, vleih, vleif, 
> vleig
>  (define_insn "*vec_set"
> -  [(set (match_operand:V0 "register_operand"  "=v,v,v")
> - (unspec:V [(match_operand: 1 "general_operand""d,R,K")
> -(match_operand:SI2 "nonmemory_operand" "an,I,I")
> -(match_operand:V 3 "register_operand"   "0,0,0")]
> -   UNSPEC_VEC_SET))]
> +  [(set (match_operand:VEC_SET_NONFLOAT  0 "register_operand"  "=v,v,v")
> + (unspec:VEC_SET_NONFLOAT
> +   [(match_operand:  1 "general_operand""d,R,K")
> +(match_operand:SI 2 "nonmemory_operand" "an,I,I")
> +(match_operand:VEC_SET_NONFLOAT   3 "register_operand"   "0,0,0")]
> +   UNSPEC_VEC_SET))]
>"TARGET_VX
> && (!CONST_INT_P (operands[2])
> -   || UINTVAL (operands[2]) < GET_MODE_NUNITS (mode))"
> +   || UINTVAL (operands[2]) < GET_MODE_NUNITS 
> (mode))"
>"@
> vlvg\t%v0,%1,%Y2
> vle\t%v0,%1,%2
> vlei\t%v0,%1,%2"
>[(set_attr "op_type" "VRS,VRX,VRI")])
>  
> +(define_insn "*vec_set"
> +  [(set (match_operand:VEC_SET_SINGLEFLOAT 0 "register_operand"  "=v,v")
> + (unspec:VEC_SET_SINGLEFLOAT
> +   [(match_operand:1 "general_operand""f,R")
   ^
Constraint v instead of f gives more flexibility to the RA.  Note, on
s390 we allow values of mode SF and DF in vector registers which do not
overlap with floating-point registers, i.e., with REGNO >= 38.  Of
course, if a value of SFmode was created via a floating-point instruction,
then it initially lives in a FPR.  However, we could give RA the freedom
to move those values to VRs in case as e.g. register pressure increases
for FPRs or in case SF/DFmode values were not created by floating-point
instructions in the first place.  Therefore, at the moment I don't see
that this could hurt us.

> +(match_operand:SI   2 "nonmemory_operand" "an,I")

Although the modes ensure to a certain degree that we deal with lane 0
in this case, however, instead of ignoring operand 2 it would be better
to check that it is indeed lane zero by replacing it with (const_int 0)

> +(match_operand:VEC_SET_SINGLEFLOAT  3 "register_operand"   "0,0")]
> +   UNSPEC_VEC_SET))]
> +  "TARGET_VX"
> +  "@
> +  vlr\t%v0,%v1
> +  vle\t%v0,%1,0"
 ^
Multiple output patterns are aligned with the @ symbol.

> + [(set_attr "op_type" "VRR,VR

[pushed] MAINTAINERS: add myself to write after approval

2025-05-16 Thread Spencer Abson

ChangeLog:

* MAINTAINERS: Add myself to write after approval.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a3e3f25d9d1..8993d176c22 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -329,6 +329,7 @@ from other maintainers or reviewers.
 NameBZ account  Email
 
 Soumya AR   soumyaa 
+Spencer Abson   sabson  
 Mark G. Adams   mgadams 
 Ajit Kumar Agarwal  aagarwa 
 Pedro Alves palves  
-- 
2.34.1

[committed] cobol: Eliminate exception "blob"; streamline some code.

2025-05-16 Thread Robert Dubner

 


0001-cobol-Eliminate-exception-blob-streamline-some-code-.patch
Description: Binary data

Re: [PING][PATCH][GCC15] Alpha: Fix base block alignment calculation regression

2025-05-16 Thread Maciej W. Rozycki

On Thu, 15 May 2025, Jeff Law wrote:

> > > Address this issue by recursing into COMPONENT_REF tree nodes until the
> > > outermost one has been reached, which is supposed to be a MEM_REF one,
> > > accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
> > > Also use tree information to get base block alignment") regression.
> > 
> >   GCC 15 backport ping for:
> > .
> OK.

 Backport committed now, thanks for your review.

  Maciej

Re: [PATCH] match: Don't allow folling statements that can throw internally [PR119903]

2025-05-16 Thread Richard Biener

On Fri, May 9, 2025 at 5:00 AM Andrew Pinski  wrote:
>
> This removes the ability to follow statements that can throw internally.
> This was suggested in bug report as a way to solve the issue here.
> The overhead is not that high since without non-call exceptions turned
> on, there is an early exit for non-calls.

So - the testcase doesn't ICE for me any longer.  I know I suggested this
as a possible solution but I don't like it very much - one reason is because
stmt_can_throw_internal asserts on cfun != nullptr and conservative handling
would mean we can't do any get_def () from IPA w/o switching it a function.

The other reason is that we do get_def() a lot (yeah, it's very much quite
unoptimized in genmatch generated code).  So there's work to do there
(we also often valueize twice).

It seems we can defer this now?  The bug was closed at least.

Richard.

> PR tree-optimization/119903
>
> gcc/ChangeLog:
>
> * gimple-match-head.cc (get_def): Reject statements that can throw
> internally.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/pr119903-1.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-match-head.cc   |  5 -
>  gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C | 24 ++
>  2 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 6b3c5febbea..62ff8e57fbb 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -63,7 +63,10 @@ get_def (tree (*valueize)(tree), tree name)
>  {
>if (valueize && ! valueize (name))
>  return NULL;
> -  return SSA_NAME_DEF_STMT (name);
> +  gimple *t = SSA_NAME_DEF_STMT (name);
> +  if (stmt_can_throw_internal (cfun, t))
> +return nullptr;
> +  return t;
>  }
>
>  /* Routine to determine if the types T1 and T2 are effectively
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
> new file mode 100644
> index 000..605f989a2eb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
> @@ -0,0 +1,24 @@
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-O2 -fnon-call-exceptions -ftrapping-math 
> -fdump-tree-optimized-eh" }
> +
> +// PR tree-optimization/119903
> +// match and simplify would cause the internal throwable fp comparison
> +// to become only external throwable and lose the landing pad.
> +
> +int f() noexcept;
> +int g() noexcept;
> +
> +int m(double a)
> +{
> +  try {
> +if (a < 1.0)
> +  return f();
> +return g();
> +  }catch(...)
> +  {
> +return -1;
> +  }
> +}
> +
> +// Make sure There is a landing pad for the non-call exception from the 
> comparison.
> +// { dg-final { scan-tree-dump "LP " "optimized" } }
> --
> 2.43.0
>

Re: [PATCH] Partially lift restriction from loc_list_from_tree_1

2025-05-16 Thread Richard Biener

On Mon, May 12, 2025 at 11:24 AM Eric Botcazou  wrote:
>
> Hi,
>
> the function accepts all handled_component_p expressions and decodes them by
> means of get_inner_reference as expected, but bails out on bitfields:
>
>  /* TODO: We can extract value of the small expression via shifting even
>for nonzero bitpos.  */
> if (list_ret == 0)
>   return 0;
> if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
> || !multiple_p (bitsize, BITS_PER_UNIT))
>   {
> expansion_failed (loc, NULL_RTX,
>   "bitfield access");
> return 0;
>   }
>
> This lifts the second part of the restriction, which helps for obscure cases
> of packed discriminated record types in Ada, although this requires the latest
> GDB sources.
>
> Tested on x86-64/Linux, OK for the mainline?

OK.

Btw, can we try to add a "guality" for gnat.dg?  Or are you making sure to
add coverage to the gdb testsuite?

Thanks,
Richard.

>
> 2025-05-12  Eric Botcazou  
>
> * dwarf2out.cc (loc_list_from_tree_1) : Do not bail
> out when the size is not a multiple of a byte.
> Deal with bit-fields whose size is not a multiple of a byte when
> dereferencing an address.
>
> --
> Eric Botcazou

Ping^2: [RFC PATCH v2] cselib: Reuse VALUEs on reg adjustments

2025-05-16 Thread Bohan Lei

Ping.


--
From:Jeff Law 
Send Time:2024 Dec. 19 (Thu.) 00:18
To:Bohan Lei; "gcc-patches"
CC:jakub
Subject:Re: Ping: [RFC PATCH v2] cselib: Reuse VALUEs on reg adjustments




On 12/17/24 9:48 PM, Bohan Lei wrote:
> Hi all!
> 
> I would like to ping the patch in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670763.html
> (Message-Id: <20241204045717.12982-1-garth...@linux.alibaba.com>).
> 
> It is supposed to be a generalization of the existing stack pointer VALUE 
> reuse
> mechanism, based on Jakub's commit 2c0fa3ecf70d199af18785702e9e0548fd3ab793.
> Could anyone review this?
I've got it in my gcc-16 queue; without looking at the exact dates, it 
likely came in after close of development for gcc-15.

jeff

[PATCH] libstdc++: Format __float128 as _Float128 only when long double is not 128 IEE [PR119246]

2025-05-16 Thread Tomasz Kamiński

For powerpc64 and sparc architectures that both have __float128 and 128bit long 
double,
the __float128 is same type as long double/__iee128 and already formattable.

Remaining specializaiton make __float128 formattable on x86_64 via _Float128,
however __float128 is now not formattable on x86_32 (-m32) with 
-mlong-double-128,
where __float128 is distinct type from long double that is 128bit IEEE.

PR libstdc++/119246

libstdc++-v3/ChangeLog:

* include/std/format (formatter<__float128, _Char_T): Define if
_GLIBCXX_FORMAT_F128 == 2.
---
This patch avoids dealing with cases when long double is 128bit and thus 
__float128 may be same as long double (it is same for powerpc and will be for 
sparc),
but distinct for x86_32/-mlong-double-128.

This preserve support formatting __float128 on x84_64, where it is formatted 
using _Float128.

Tested on x86_64, powerpc64. For format test checked both 
-mabi=ibmlongdouble,-mabi=ieeelongdouble.
Rainer Orth confirmed that this also work with his patch adding __float128 for 
sparc.

OK for trunk?

 libstdc++-v3/include/std/format | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index b1823db83bc..d1ca05105f9 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2973,11 +2973,9 @@ namespace __format
 };
 #endif
 
-#if defined(__SIZEOF_FLOAT128__) && _GLIBCXX_FORMAT_F128 > 1
-  // Reuse __formatter_fp::format<__format::__flt128_t, Out> for __float128.
-  // This formatter is not declared if _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT is 
true,
-  // as __float128 when present is same type as __ieee128, which may be same as
-  // long double.
+#if defined(__SIZEOF_FLOAT128__) && _GLIBCXX_FORMAT_F128 == 2
+  // Use __formatter_fp::format<__format::__flt128_t, Out> for __float128,
+  // when long double is not 128bit IEEE type.
   template<__format::__char _CharT>
 struct formatter<__float128, _CharT>
 {
@@ -2995,9 +2993,6 @@ namespace __format
 
 private:
   __format::__formatter_fp<_CharT> _M_f;
-
-  static_assert( !is_same_v<__float128, long double>,
-"This specialization should not be used for long double" );
 };
 #endif
 
-- 
2.49.0

Re: gcc project build error in Ubuntu18.04

2025-05-16 Thread Kito Cheng

I am surprised that such generic names are defined within the system
header files, I inclined just rename that to major_version,
minor_version, could you send a patch for that?


On Fri, May 16, 2025 at 3:50 PM Songhe Zhu  wrote:
>
> Hi kito
> When syncing GCC to the master branch and building it on Ubuntu 18.04, I 
> encounter the following warnings and errors:
> 
> These issues arise because the values major and minor conflict with the 
> macros major/minor defined in . To avoid compatibility 
> problems when using a newer version of GCC on older Ubuntu environment, we 
> propose three solutions:
> 1. Undefine the macros temporarily::
>
> #pragma push_macro("major")
> #undef major
> #pragma push_macro("minor")
> #undef minor
> /* ... function code ... */
> #pragma pop_macro("major")
> #pragma pop_macro("minor")
>
> 2. Rename major/minor to non-conflicting names ?
>
> 3. Build in a newer Ubuntu environment (e.g., Ubuntu 22.04) to bypass legacy 
> macro conflicts.
> --We will try it.
>
> refernece patch: RISC-V: Generate extension table in documentation from 
> riscv-ext.def · gcc-mirror/gcc@124cbbb
>
> Thanks a lot~
>
> 
> zhuson...@eswincomputing.com

[PATCH 04/10] AArch64: add constants for branch displacements

2025-05-16 Thread Karl Meakin

Extract the hardcoded values for the minimum PC-relative displacements
into named constants and document them.

gcc/ChangeLog:

* config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
(BRANCH_LEN_N_128MiB): Likewise.
(BRANCH_LEN_P_1MiB): Likewise.
(BRANCH_LEN_N_1MiB): Likewise.
(BRANCH_LEN_P_32KiB): Likewise.
(BRANCH_LEN_N_32KiB): Likewise.
---
 gcc/config/aarch64/aarch64.md | 64 ++-
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ba0d1cccdd0..c31ad4fc16e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -692,12 +692,28 @@ (define_insn "indirect_jump"
 (define_insn "jump"
   [(set (pc) (label_ref (match_operand 0 "" "")))]
   ""
   "b\\t%l0"
   [(set_attr "type" "branch")]
 )
 
+;; Maximum PC-relative positive/negative displacements for various branching
+;; instructions.
+(define_constants
+  [
+;; +/- 128MiB.  Used by B, BL.
+(BRANCH_LEN_P_128MiB  134217724)
+(BRANCH_LEN_N_128MiB -134217728)
+
+;; +/- 1MiB.  Used by B., CBZ, CBNZ.
+(BRANCH_LEN_P_1MiB  1048572)
+(BRANCH_LEN_N_1MiB -1048576)
 
+;; +/- 32KiB.  Used by TBZ, TBNZ.
+(BRANCH_LEN_P_32KiB  32764)
+(BRANCH_LEN_N_32KiB -32768)
+  ]
+)
 
 ;; ---
 ;; Conditional jumps
 ;; ---
@@ -743,41 +759,45 @@ (define_expand "cbranchcc4"
 ;; Emit `B`, assuming that the condition is already in the CC register.
 (define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
   (label_ref (match_operand 2))
   (pc)))]
   ""
   {
 /* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
but the "." is required for SVE conditions.  */
 bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 2, "Lbcond",
 use_dot_p ? "b.%M0\\t" : "b%M0\\t");
 else
   return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
 ;; and branch sequence from:
 ;; mov x0, #imm1
 ;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
 ;; cmp x1, x0
 ;; b .Label
 ;; into the shorter:
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
@@ -809,69 +829,77 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
 ;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
 (define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
 else
   return "\\t%0, %l1";
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-

[PATCH 06/10] AArch64: recognize `+cmpbr` option

2025-05-16 Thread Karl Meakin

Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
 gcc/config/aarch64/aarch64.h | 3 +++
 gcc/doc/invoke.texi  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index dbbb021f05a..1c3e69799f5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -249,6 +249,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("cmpbr", CMPBR, (), (), (), "cmpbr")
+
 AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
 
 AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e8bd8c73c12..d5c4a42e96d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -202,326 +202,329 @@ constexpr auto AARCH64_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
   = AARCH64_ISA_MODE_SM_OFF;
 constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
   = aarch64_feature_flags (AARCH64_DEFAULT_ISA_MODE);
 
 #endif
 
 /* Macros to test ISA flags.
 
There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
is not always set when its constituent features are present.
Check (TARGET_AES && TARGET_SHA2) instead.  */
 
 #define AARCH64_HAVE_ISA(X) (bool (aarch64_isa_flags & AARCH64_FL_##X))
 
 #define AARCH64_ISA_MODE((aarch64_isa_flags & AARCH64_FL_ISA_MODES).val[0])
 
 /* The current function is a normal non-streaming function.  */
 #define TARGET_NON_STREAMING AARCH64_HAVE_ISA (SM_OFF)
 
 /* The current function has a streaming body.  */
 #define TARGET_STREAMING AARCH64_HAVE_ISA (SM_ON)
 
 /* The current function has a streaming-compatible body.  */
 #define TARGET_STREAMING_COMPATIBLE \
   ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
 
 /* PSTATE.ZA is enabled in the current function body.  */
 #define TARGET_ZA AARCH64_HAVE_ISA (ZA_ON)
 
 /* AdvSIMD is supported in the default configuration, unless disabled by
-mgeneral-regs-only or by the +nosimd extension.  The set of available
instructions is then subdivided into:
 
- the "base" set, available both in SME streaming mode and in
  non-streaming mode
 
- the full set, available only in non-streaming mode.  */
 #define TARGET_BASE_SIMD AARCH64_HAVE_ISA (SIMD)
 #define TARGET_SIMD (TARGET_BASE_SIMD && TARGET_NON_STREAMING)
 #define TARGET_FLOAT AARCH64_HAVE_ISA (FP)
 
 /* AARCH64_FL options necessary for system register implementation.  */
 
 /* Define AARCH64_FL aliases for architectural features which are protected
by -march flags in binutils but which receive no special treatment by GCC.
 
Such flags are inherited from the Binutils definition of system registers
and are mapped to the architecture in which the feature is implemented.  */
 #define AARCH64_FL_RASAARCH64_FL_V8A
 #define AARCH64_FL_LORAARCH64_FL_V8_1A
 #define AARCH64_FL_PANAARCH64_FL_V8_1A
 #define AARCH64_FL_AMUAARCH64_FL_V8_4A
 #define AARCH64_FL_SCXTNUMAARCH64_FL_V8_5A
 #define AARCH64_FL_ID_PFR2AARCH64_FL_V8_5A
 
 /* Armv8.9-A extension feature bits defined in Binutils but absent from GCC,
aliased to their base architecture.  */
 #define AARCH64_FL_AIEAARCH64_FL_V8_9A
 #define AARCH64_FL_DEBUGv8p9  AARCH64_FL_V8_9A
 #define AARCH64_FL_FGT2   AARCH64_FL_V8_9A
 #define AARCH64_FL_ITEAARCH64_FL_V8_9A
 #define AARCH64_FL_PFAR   AARCH64_FL_V8_9A
 #define AARCH64_FL_PMUv3_ICNTRAARCH64_FL_V8_9A
 #define AARCH64_FL_PMUv3_SS   AARCH64_FL_V8_9A
 #define AARCH64_FL_PMUv3p9AARCH64_FL_V8_9A
 #define AARCH64_FL_RASv2  AARCH64_FL_V8_9A
 #define AARCH64_FL_S1PIE  AARCH64_FL_V8_9A
 #define AARCH64_FL_S1POE  AARCH64_FL_V8_9A
 #define AARCH64_FL_S2PIE  AARCH64_FL_V8_9A
 #define AARCH64_FL_S2POE  AARCH64_FL_V8_9A
 #define AARCH64_FL_SCTLR2 AARCH64_FL_V8_9A
 #define AARCH64_FL_SEBEP  AARCH64_FL_V8_9A
 #define AARCH64_FL_SPE_FDSAARCH64_FL_V8_9A
 #define AARCH64_FL_TCR2   AARCH64_FL_V8_9A
 
 #define TARGET_V8R AARCH64_HAVE_ISA (V8R)
 #define TARGET_V9A AARCH64_HAVE_ISA (V9A)
 
 
 /* SHA2 is an optional extension to AdvSIMD.  */
 #define TARGET_SHA2 AARCH64_HAVE_ISA (SHA2)
 
 /* SHA3 is an optional extension to AdvSIMD.  */
 #define TARGET_SHA3 AARCH64_HAVE_ISA (SHA3)
 
 /* AES is an optional extension to AdvSIMD.  */
 #define TARGET_AES AARCH64_HAVE_ISA (AES)
 
 /* SM is an optiona

[PATCH 05/10] AArch64: make `far_branch` attribute a boolean

2025-05-16 Thread Karl Meakin

The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.

gcc/ChangeLog:

* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cbz1): Likewise.
(*aarch64_tbz1): Likewise.
(@aarch64_tbz): Likewise.
---
 gcc/config/aarch64/aarch64.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c31ad4fc16e..b61e3e5a72f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -554,16 +554,14 @@ (define_attr "mode_enabled" "false,true"
 ;; Attribute that controls whether an alternative is enabled or not.
 (define_attr "enabled" "no,yes"
   (if_then_else (and (eq_attr "arch_enabled" "yes")
 (eq_attr "mode_enabled" "true"))
(const_string "yes")
(const_string "no")))
 
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
 ;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
+(define_attr "far_branch" "no,yes" (const_string "no"))
 
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
@@ -759,45 +757,45 @@ (define_expand "cbranchcc4"
 ;; Emit `B`, assuming that the condition is already in the CC register.
 (define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
   (label_ref (match_operand 2))
   (pc)))]
   ""
   {
 /* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
but the "." is required for SVE conditions.  */
 bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 2, "Lbcond",
 use_dot_p ? "b.%M0\\t" : "b%M0\\t");
 else
   return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
(if_then_else (and (ge (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
 ;; and branch sequence from:
 ;; mov x0, #imm1
 ;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
 ;; cmp x1, x0
 ;; b .Label
 ;; into the shorter:
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
@@ -829,77 +827,77 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
 ;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
 (define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
 else
   return "\\t%0, %l1";
   }
   [(set_attr "type" "branch")
(set (attr "length")
(if_then_else (and (ge (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
 (define_insn "*aarch64_tbz1"
   [(set (pc) (if_then_else (LTGE (match_operand:A

[PATCH 07/10] AArch64: precommit test for CMPBR instructions

2025-05-16 Thread Karl Meakin

Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1659 ++
 gcc/testsuite/lib/target-supports.exp|   14 +-
 2 files changed, 1667 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
new file mode 100644
index 000..8534283bc26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
@@ -0,0 +1,1659 @@
+// Test that the instructions added by FEAT_CMPBR are emitted
+// { dg-do compile }
+// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } }
+// { dg-options "-march=armv9.5-a+cmpbr -O2" }
+// { dg-final { check-function-bodies "**" "" "" } }
+
+#include 
+
+typedef uint8_t u8;
+typedef int8_t i8;
+
+typedef uint16_t u16;
+typedef int16_t i16;
+
+typedef uint32_t u32;
+typedef int32_t i32;
+
+typedef uint64_t u64;
+typedef int64_t i64;
+
+int taken();
+int not_taken();
+
+#define COMPARE(ty, name, op, rhs) 
\
+  int ty##_x0_##name##_##rhs(ty x0, ty x1) {   
\
+return (x0 op rhs) ? taken() : not_taken();
\
+  }
+
+#define COMPARE_ALL(unsigned_ty, signed_ty, rhs)   
\
+  COMPARE(unsigned_ty, eq, ==, rhs);   
\
+  COMPARE(unsigned_ty, ne, !=, rhs);   
\
+   
\
+  COMPARE(unsigned_ty, ult, <, rhs);   
\
+  COMPARE(unsigned_ty, ule, <=, rhs);  
\
+  COMPARE(unsigned_ty, ugt, >, rhs);   
\
+  COMPARE(unsigned_ty, uge, >=, rhs);  
\
+   
\
+  COMPARE(signed_ty, slt, <, rhs); 
\
+  COMPARE(signed_ty, sle, <=, rhs);
\
+  COMPARE(signed_ty, sgt, >, rhs); 
\
+  COMPARE(signed_ty, sge, >=, rhs);
+
+//  CBB (register) 
+COMPARE_ALL(u8, i8, x1);
+
+//  CBH (register) 
+COMPARE_ALL(u16, i16, x1);
+
+//  CB (register) 
+COMPARE_ALL(u32, i32, x1);
+COMPARE_ALL(u64, i64, x1);
+
+//  CB (immediate) 
+COMPARE_ALL(u32, i32, 42);
+COMPARE_ALL(u64, i64, 42);
+
+//  Special cases 
+// Comparisons against the immediate 0 can be done for all types,
+// because we can use the wzr/xzr register as one of the operands.
+// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible,
+// because they have larger range.
+COMPARE_ALL(u8, i8, 0);
+COMPARE_ALL(u16, i16, 0);
+COMPARE_ALL(u32, i32, 0);
+COMPARE_ALL(u64, i64, 0);
+
+// CBB and CBH cannot have immediate operands.
+// Instead we have to do a MOV+CB.
+COMPARE_ALL(u8, i8, 42);
+COMPARE_ALL(u16, i16, 42);
+
+// 64 is out of the range for immediate operands (0 to 63).
+// * For 8/16-bit types, use a MOV+CB as above.
+// * For 32/64-bit types, use a CMP+B instead,
+//   because B has a longer range than CB.
+COMPARE_ALL(u8, i8, 64);
+COMPARE_ALL(u16, i16, 64);
+COMPARE_ALL(u32, i32, 64);
+COMPARE_ALL(u64, i64, 64);
+
+// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 
12
+// bits), but it can be materialized in a single MOV.
+COMPARE_ALL(u16, i16, 4098);
+COMPARE_ALL(u32, i32, 4098);
+COMPARE_ALL(u64, i64, 4098);
+
+/*
+** u8_x0_eq_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L4
+** b   not_taken
+** b   taken
+*/
+
+/*
+** u8_x0_ne_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L6
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_ult_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bls .L8
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_ule_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bcc .L10
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_ugt_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bcs .L12
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_uge_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bhi .L14
+** b   taken
+** b   not_taken
+*/
+
+/*
+** i8_x0_slt_x1:
+** sxtbw1, w1
+** cmp w1, w0, sxtb
+** ble .L16
+** b   taken
+** b   not_taken
+*/
+
+/*
+** i8_x0_sle_x1:
+**

[PATCH 00/10] AArch64: CMPBR support

2025-05-16 Thread Karl Meakin

This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Testing done:
`make bootstrap; make check`

Karl Meakin (10):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions
  AArch64: make rules for CBZ/TBZ higher priority
  Use HS/LO instead of CS/CC

 .../aarch64/aarch64-option-extensions.def |2 +
 gcc/config/aarch64/aarch64-simd.md|2 +-
 gcc/config/aarch64/aarch64-sme.md |2 +-
 gcc/config/aarch64/aarch64.cc |   39 +-
 gcc/config/aarch64/aarch64.h  |3 +
 gcc/config/aarch64/aarch64.md |  564 ---
 gcc/config/aarch64/iterators.md   |5 +
 gcc/config/aarch64/predicates.md  |   15 +
 gcc/doc/invoke.texi   |3 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1481 +
 gcc/testsuite/lib/target-supports.exp |   14 +-
 11 files changed, 1898 insertions(+), 232 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

-- 
2.45.2

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Richard Sandiford

Jennifer Schmitz  writes:
> [PATCH] [PR120276] regcprop: Return from copy_value for unordered modes
>
> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
> partial_subreg_p in the function copy_value during the RTL pass
> regcprop, failing the assertion in
>
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
>   /* Modes involved in a subreg must be ordered.  In particular, we must
>  always know at compile time whether the subreg is paradoxical.  */
>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));
>   return maybe_lt (outer_prec, inner_prec);
> }
>
> Returning from the function if the modes are not ordered before reaching
> the call to partial_subreg_p resolves the ICE and passes bootstrap and
> testing without regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   PR middle-end/120276
>   * regcprop.cc (copy_value): Return in case of unordered modes.
>
> gcc/testsuite/
>   PR middle-end/120276
>   * gcc.dg/torture/pr120276.c: New test.

OK, thanks.

Richard

> ---
>  gcc/regcprop.cc |  4 
>  gcc/testsuite/gcc.dg/torture/pr120276.c | 20 
>  2 files changed, 24 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr120276.c
>
> diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
> index 4fa1305526c..98ab3f77e83 100644
> --- a/gcc/regcprop.cc
> +++ b/gcc/regcprop.cc
> @@ -332,6 +332,10 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>if (vd->e[sr].mode == VOIDmode)
>  set_value_regno (sr, vd->e[dr].mode, vd);
>  
> +  else if (!ordered_p (GET_MODE_PRECISION (vd->e[sr].mode),
> +GET_MODE_PRECISION (GET_MODE (src
> +return;
> +
>/* If we are narrowing the input to a smaller number of hard regs,
>   and it is in big endian, we are really extracting a high part.
>   Since we generally associate a low part of a value with the value 
> itself,
> diff --git a/gcc/testsuite/gcc.dg/torture/pr120276.c 
> b/gcc/testsuite/gcc.dg/torture/pr120276.c
> new file mode 100644
> index 000..9717a7103e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr120276.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64*-*-* } } 
> */
> +
> +int a;
> +char b[1];
> +int c[18];
> +void d(char *);
> +void e() {
> +  int f;
> +  char *g;
> +  a = 0;
> +  for (; a < 18; a++) {
> +int h = f = 0;
> +for (; f < 4; f++) {
> +  g[a * 4 + f] = c[a] >> h;
> +  h += 8;
> +}
> +  }
> +  d(b);
> +}
> \ No newline at end of file

Re: [PATCH v22 0/3] c: Add _Countof and

2025-05-16 Thread Alejandro Colomar

Hi Joseph,

On Fri, May 16, 2025 at 12:25:39PM +, Joseph Myers wrote:
> On Fri, 16 May 2025, Alejandro Colomar wrote:
> 
> > -  Add  (and NDEBUG) to some test files that were missing it,
> >and also the forward declaration of strcmp(3).
> 
> Depending on libc headers like this in tests is discouraged.  The usual 
> idiom is to use abort () on failure of a runtime check (rather than 
> assert) and to declare abort in the test (or use __builtin_abort to avoid 
> needing the declaration).

Hmmm, I've been trying to find a compromise between readability and
simplicity, and I think I have something.  I've seen some tests that
define assert() themselves.  I like assert(3) because it's more
readable compared to a conditional plus abort(3).

So, how do you feel about the following change?

diff --git i/gcc/testsuite/gcc.dg/countof-stdcountof.c 
w/gcc/testsuite/gcc.dg/countof-stdcountof.c
index a7fe4079c69..2fb0c6306ef 100644
--- i/gcc/testsuite/gcc.dg/countof-stdcountof.c
+++ w/gcc/testsuite/gcc.dg/countof-stdcountof.c
@@ -3,8 +3,7 @@
 
 #include 
 
-#undef NDEBUG
-#include 
+#define assert(e)  ((e) ? (void) 0 : __builtin_abort ())
 
 extern int strcmp (const char *, const char *);
 
diff --git i/gcc/testsuite/gcc.dg/countof-vmt.c 
w/gcc/testsuite/gcc.dg/countof-vmt.c
index f84d5aa618d..cf4bfd1aa74 100644
--- i/gcc/testsuite/gcc.dg/countof-vmt.c
+++ w/gcc/testsuite/gcc.dg/countof-vmt.c
@@ -1,8 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-std=c2y" } */
 
-#undef NDEBUG
-#include 
+#define assert(e)  ((e) ? (void) 0 : __builtin_abort ())
 
 void
 inner_vla_noeval (void)
diff --git i/gcc/testsuite/gcc.dg/countof-zero.c 
w/gcc/testsuite/gcc.dg/countof-zero.c
index 07dfc2bfbf2..678a08148a5 100644
--- i/gcc/testsuite/gcc.dg/countof-zero.c
+++ w/gcc/testsuite/gcc.dg/countof-zero.c
@@ -1,8 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-std=c2y" } */
 
-#undef NDEBUG
-#include 
+#define assert(e)  ((e) ? (void) 0 : __builtin_abort ())
 
 void
 vla (void)
diff --git i/gcc/testsuite/gcc.dg/countof.c 
w/gcc/testsuite/gcc.dg/countof.c
index 15ed7719100..534488501c6 100644
--- i/gcc/testsuite/gcc.dg/countof.c
+++ w/gcc/testsuite/gcc.dg/countof.c
@@ -1,8 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-std=c2y -pedantic-errors" } */
 
-#undef NDEBUG
-#include 
+#define assert(e)  ((e) ? (void) 0 : __builtin_abort ())
 
 void
 array (void)


Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature

Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]

2025-05-16 Thread Richard Sandiford

Dimitar Dimitrov  writes:
> When a paradoxical subreg is detected, validate_subreg exits early, thus
> skipping the important checks later in the function.
>
> Fix by continuing with the checks instead of declaring early that the
> paradoxical subreg is valid.
>
> One of the newly allowed subsequent checks needed to be disabled for
> paradoxical subregs.  It turned out that combine attempts to create
> a paradoxical subreg of mem even for strict-alignment targets.
> That is invalid and should eventually be rejected, but is
> temporarily left allowed to prevent regressions for
> armv8l-unknown-linux-gnueabihf.
>
> Tests I did:
>  - No regressions were found for C and C++ for the following targets:
>- native x86_64-pc-linux-gnu
>- cross riscv64-unknown-linux-gnu
>- cross riscv32-none-elf
>  - Sanity checked armv8l-unknown-linux-gnueabihf by cross-building
>up to including libgcc. I'll monitor Linaro CI bot for the
>full regression test results.
>  - Sanity checked powerpc64-unknown-linux-gnu by building native
>toolchain, but could not setup qemu-user for DejaGnu testing.
>
>   PR target/119966
>
> gcc/ChangeLog:
>
>   * emit-rtl.cc (validate_subreg): Do not exit immediately for
>   paradoxical subregs.  Filter subsequent tests which are
>   not valid for paradoxical subregs.
>
> Co-authored-by: Richard Sandiford 
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/emit-rtl.cc | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 3e2c4309dee..e46b0f9eac4 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -969,10 +969,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  }
>  
>/* Paradoxical subregs must have offset zero.  */
> -  if (maybe_gt (osize, isize))
> -return known_eq (offset, 0U);
> +  if (maybe_gt (osize, isize) && !known_eq (offset, 0U))
> +return false;
>  
> -  /* This is a normal subreg.  Verify that the offset is representable.  */
> +  /* Verify that the offset is representable.  */
>  
>/* For hard registers, we already have most of these rules collected in
>   subreg_offset_representable_p.  */
> @@ -988,9 +988,13 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  
>return subreg_offset_representable_p (regno, imode, offset, omode);
>  }
> -  /* Do not allow SUBREG with stricter alignment than the inner MEM.  */
> +  /* Do not allow normal SUBREG with stricter alignment than the inner MEM.
> +
> + FIXME: Combine can create paradoxical mem subregs even for
> + strict-alignment targets.  Allow it until combine is fixed.  */

Are the details captured in bugzilla somewhere?  If not, could you file
a PR and explain when this happens, or add a comment to PR119966?

I think this should have a reference to a particular bugzilla comment
that describes the problem, otherwise it would be hard to tell later
whether the problem has been fixed.

OK with that change, thanks.

Richard

>else if (reg && MEM_P (reg) && STRICT_ALIGNMENT
> -&& MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
> +&& MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode)
> +&& known_le (osize, isize))
>  return false;
>  
>/* The outer size must be ordered wrt the register size, otherwise
> @@ -999,7 +1003,7 @@ validate_subreg (machine_mode omode, machine_mode imode,
>if (!ordered_p (osize, regsize))
>  return false;
>  
> -  /* For pseudo registers, we want most of the same checks.  Namely:
> +  /* For normal pseudo registers, we want most of the same checks.  Namely:
>  
>   Assume that the pseudo register will be allocated to hard registers
>   that can hold REGSIZE bytes each.  If OSIZE is not a multiple of 
> REGSIZE,
> @@ -1008,8 +1012,15 @@ validate_subreg (machine_mode omode, machine_mode 
> imode,
>   otherwise it is at the lowest offset.
>  
>   Given that we've already checked the mode and offset alignment,
> - we only have to check subblock subregs here.  */
> + we only have to check subblock subregs here.
> +
> + For paradoxical little-endian registers, this check is redundant.  The
> + offset has already been validated to be zero.
> +
> + For paradoxical big-endian registers, this check is not valid
> + because the offset is zero.  */
>if (maybe_lt (osize, regsize)
> +  && known_le (osize, isize)
>&& ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P 
> (omode
>  {
>/* It is invalid for the target to pick a register size for a mode

1 2 >

1 - 100 of 112 matches

Mail list logo