[gcc(refs/users/mikael/heads/stabilisation_descriptor_v02)] fortran: Delay evaluation of array bounds after reallocation

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:6b68aa726ed5a45e33bd2d87636e2daa5059fd65

commit 6b68aa726ed5a45e33bd2d87636e2daa5059fd65
Author: Mikael Morin 
Date:   Mon Jul 7 11:46:08 2025 +0200

fortran: Delay evaluation of array bounds after reallocation

Delay the evaluation of bounds, offset, etc after the reallocation,
for the scalarization of allocatable arrays on the left hand side of
assignments.

Before this change, the code preceding the scalarization loop is like:

  D.4757 = ref2.offset;
  D.4759 = ref2.dim[0].ubound;
  D.4762 = ref2.dim[0].lbound;
  {
if (ref2.data == 0B) goto realloc;
if (ref2.dim[0].lbound + 4 != ref2.dim[0].ubound) goto realloc;
goto L.10;
realloc:
... change offset and bounds ...
D.4757 = ref2.offset;
D.4762 = NON_LVALUE_EXPR ;
... reallocation ...
L.10:;
  }
  while (1)
{
  ... scalarized code ...

so the bounds etc are evaluated first to variables, and the reallocation
code takes care to update the variables during the reallocation.  This
is problematic because the variables' initialization references the
array bounds, which for unallocated arrays are uninitialized at the
evaluation point.  This used to (correctly) cause uninitialized warnings
(see PR fortran/108889), and a workaround for variables was found, that
initializes the bounds of arrays variables to some value beforehand if
they are unallocated.  For allocatable components, there is no warning
but the problem remains, some uninitialized values are used, even if
discarded later.

After this change the code becomes:

{
  if (ref2.data == 0B) goto realloc;
  if (ref2.dim[0].lbound + 4 != ref2.dim[0].ubound) goto realloc;
  goto L.10;
  realloc:;
  ... change offset and bounds ...
  ... reallocation ...
  L.10:;
}
D.4762 = ref2.offset;
D.4763 = ref2.dim[0].lbound;
D.4764 = ref2.dim[0].ubound;
while (1)
  {
... scalarized code

so the scalarizer avoids storing the values to variables at the time it
evaluates them, if the array is reallocatable on assignment.  Instead,
it keeps expressions with references to the array descriptor fields,
expressions that remain valid through reallocation.  After the
reallocation code has been generated, the expressions stored by the
scalarizer are evaluated in place to variables.

The decision to delay evaluation is based on the existing field
is_alloc_lhs, which requires a few tweaks to be alway correct wrt to
what its name suggests.  Namely it should be set even if the assignment
right hand side is an intrinsic function, and it should not be set if
the right hand side is a scalar and neither if the -fno-realloc-lhs flag
is passed to the compiler.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_descriptor): Don't evaluate
offset and data to a variable if is_alloc_lhs is set.  Move the
existing evaluation decision condition for data...
(save_descriptor_data): ... here as a new predicate.
(evaluate_bound): Add argument save_value.  Omit the evaluation
of the value to a variable if that argument isn't set.
(gfc_conv_expr_descriptor): Update caller.
(gfc_conv_section_startstride): Update caller.  Set save_value
if is_alloc_lhs is not set.  Omit the evaluation of stride to a
variable if save_value isn't set.
(gfc_set_delta): Omit the evaluation of delta to a variable
if is_alloc_lhs is set.
(gfc_is_reallocatable_lhs): Return false if flag_realloc_lhs
isn't set.
(gfc_alloc_allocatable_for_assignment): Don't update
the variables that may be stored in saved_offset, delta, and
data.  Call instead...
(update_realloated_descriptor): ... this new procedure.
* trans-expr.cc (gfc_trans_assignment_1): Don't omit setting the
is_alloc_lhs flag if the right hand side is an intrinsic
function.  Clear the flag if the right hand side is scalar.

Diff:
---
 gcc/fortran/trans-array.cc | 137 -
 gcc/fortran/trans-expr.cc  |  14 ++---
 2 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7be2d7b11a62..7b83d3fab8d7 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3420,6 +3420,23 @@ gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, 
bool subscript,
 }
 
 
+/* Given an array descriptor expression DESCR and its data point

[gcc(refs/users/mikael/heads/stabilisation_descriptor_v02)] fortran: Amend descriptor bounds init if unallocated

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:0312dae9024cd86b355bbaf790c97759b287c8a0

commit 0312dae9024cd86b355bbaf790c97759b287c8a0
Author: Mikael Morin 
Date:   Wed Jul 9 09:40:32 2025 +0200

fortran: Amend descriptor bounds init if unallocated

Always generate the conditional initialization of unallocated variables
regardless on the basic variable allocation tracking done in the
frontend and with an additional always false condition.

The scalarizer used to always evaluate array bounds, including in the
case of unallocated arrays on the left hand side of an assignment.  This
was (correctly) causing uninitialized warnings, even if the
uninitialized values were in the end discarded.

Since the fix for PR fortran/108889, an initialization of the descriptor
bounds is added to silent the uninitialized warnings, conditional on the
array being unallocated.  This initialization is not useful in the
execution of the program, and it is removed if the compiler can prove
that the variable is unallocated (in the case of a local variable for
example).  Unfortunately, the compiler is not always able to prove it
and the useless initialization may remain in the final code.
Moreover, the generated code that was causing the evaluation of
uninitialized variables has ben changed to avoid them, so we can try to
remove or revisit that unallocated variable bounds initialization tweak.

Unfortunately, just removing the extra initialization restores the
warnings at -O0, as there is no dead code removal at that optimization
level.  Instead, this change keeps the initialization and modifies its
guarding condition with an extra always false variable, so that if
optimizations are enabled the whole initialization block is removed, and
if they are disabled it remains and is sufficient to prevent the
warning.

The new variable requires the code generation to be done earlier in the
function so that the variable declaration and usage are in the same
scope.

As the modified condition guarantees the removal of the block with
optimizations, we can emit it more broadly and remove the basic
allocation tracking that was done in the frontend to limit its emission.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_symbol): Remove field allocated_in_scope.
* trans-array.cc (gfc_array_allocate): Don't set it.
(gfc_alloc_allocatable_for_assignment): Likewise.
Generate the unallocated descriptor bounds initialisation
before the opening of the reallocation code block.  Create a
variable and use it as additional condition to the unallocated
descriptor bounds initialisation.

Diff:
---
 gcc/fortran/gfortran.h |  4 --
 gcc/fortran/trans-array.cc | 91 --
 2 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6848bd1762d3..69367e638c5b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2028,10 +2028,6 @@ typedef struct gfc_symbol
   /* Set if this should be passed by value, but is not a VALUE argument
  according to the Fortran standard.  */
   unsigned pass_as_value:1;
-  /* Set if an allocatable array variable has been allocated in the current
- scope. Used in the suppression of uninitialized warnings in reallocation
- on assignment.  */
-  unsigned allocated_in_scope:1;
   /* Set if an external dummy argument is called with different argument lists.
  This is legal in Fortran, but can cause problems with autogenerated
  C prototypes for C23.  */
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7b83d3fab8d7..52888c1e1f1b 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -6800,8 +6800,6 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree 
status, tree errmsg,
   else
   gfc_add_expr_to_block (&se->pre, set_descriptor);
 
-  expr->symtree->n.sym->allocated_in_scope = 1;
-
   return true;
 }
 
@@ -11495,14 +11493,60 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo 
*loop,
   && !expr2->value.function.isym)
 expr2->ts.u.cl->backend_decl = rss->info->string_length;
 
-  gfc_start_block (&fblock);
-
   /* Since the lhs is allocatable, this must be a descriptor type.
  Get the data and array size.  */
   desc = linfo->descriptor;
   gcc_assert (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (desc)));
   array1 = gfc_conv_descriptor_data_get (desc);
 
+  /* If the data is null, set the descriptor bounds and offset.  This 
suppresses
+ the maybe used uninitialized warning.  Note that the always false variable
+ prevents this block from from ever being executed.  The whole block should
+ be removed by optimizations.  Component references are not subject to the
+ warnings, so we don't uselessly complicate the generated code 

[gcc(refs/users/mikael/heads/stabilisation_descriptor_v02)] fortran: Factor array descriptor references

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:8c7924c0e3ad450e98ae2081dce8fa2a9916479d

commit 8c7924c0e3ad450e98ae2081dce8fa2a9916479d
Author: Mikael Morin 
Date:   Wed Jul 9 21:18:18 2025 +0200

fortran: Factor array descriptor references

Save parts of array descriptor references to a variable so that all the
expressions using the descriptor as base object benefit from the
simplified reference.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_descriptor): Move the descriptor
reference initialisation...
(set_factored_descriptor_value): ... to this new function.  Walk
the reference passed as arguments and try to simplify some of it
to a variable.

Diff:
---
 gcc/fortran/trans-array.cc | 79 +-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 52888c1e1f1b..51ec1c78a28c 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3437,6 +3437,83 @@ save_descriptor_data (tree descr, tree data)
 }
 
 
+/* Save the descriptor reference VALUE to storage pointed by DESC_PTR.  As 
there
+   may be a lot of code using subreferences of the descriptor, try to factor
+   them by evaluating the leading part of the data reference to a variable,
+   adding extra code to BLOCK.
+
+   To avoid copying large amounts of data we only save pointers in the 
reference
+   chain, and as late in the chain as possible.*/
+
+void
+set_factored_descriptor_value (stmtblock_t *block, tree *desc_ptr, tree value)
+{
+  /* As the reference is processed from last to first, statements will be
+ generate in reversed order, so can't be put directly in BLOCK.  We use
+ TMP_BLOCK instead.  */
+  stmtblock_t tmp_block;
+  tree accumulated_code = NULL_TREE;
+
+  gfc_init_block (&tmp_block);
+
+  tree *ptr_ref = nullptr;
+
+  tree data_ref = value;
+  bool seen_component = false;
+  while (true)
+{
+  if (TREE_CODE (data_ref) == INDIRECT_REF)
+   {
+ /* If there is no component reference after the pointer dereference in
+the reference chain, the pointer can't be saved to a variable as 
+it may be a pointer or allocatable, and we have to keep the parent
+reference to be able to update the pointer value.  Otherwise the
+pointer can be saved to a variable.  */
+ if (seen_component)
+   {
+ /* Don't evaluate the pointer to a variable yet; do it only if the
+variable would be significantly more simple than the reference
+it replaces.  That is if the reference contains anything
+different from a NOP, a COMPONENT or a DECL.  */
+ ptr_ref = &TREE_OPERAND (data_ref, 0);
+   }
+
+ data_ref = TREE_OPERAND (data_ref, 0);
+   }
+  else if (TREE_CODE (data_ref) == COMPONENT_REF)
+   {
+ seen_component = true;
+ data_ref = TREE_OPERAND (data_ref, 0);
+   }
+  else if (TREE_CODE (data_ref) == NOP_EXPR)
+   data_ref = TREE_OPERAND (data_ref, 0);
+  else
+   {
+ if (DECL_P (data_ref))
+   break;
+
+ if (ptr_ref != nullptr)
+   {
+ /* We have seen a pointer before, and its reference appears to be
+worth saving.  Do it now.  */
+ tree ptr = *ptr_ref;
+ *ptr_ref = gfc_evaluate_now (ptr, &tmp_block);
+ gfc_add_expr_to_block (&tmp_block, accumulated_code);
+ accumulated_code = gfc_finish_block (&tmp_block);
+   }
+
+ if (TREE_CODE (data_ref) == ARRAY_REF)
+   data_ref = TREE_OPERAND (data_ref, 0);
+ else
+   break;
+   }
+}
+
+  *desc_ptr = value;
+  gfc_add_expr_to_block (block, accumulated_code);
+}
+
+
 /* Translate expressions for the descriptor and data pointer of a SS.  */
 /*GCC ARRAYS*/
 
@@ -3457,7 +3534,7 @@ gfc_conv_ss_descriptor (stmtblock_t * block, gfc_ss * ss, 
int base)
   se.descriptor_only = 1;
   gfc_conv_expr_lhs (&se, ss_info->expr);
   gfc_add_block_to_block (block, &se.pre);
-  info->descriptor = se.expr;
+  set_factored_descriptor_value (block, &info->descriptor, se.expr);
   ss_info->string_length = se.string_length;
   ss_info->class_container = se.class_container;


[gcc] Created branch 'mikael/heads/stabilisation_descriptor_v02' in namespace 'refs/users'

2025-07-10 Thread Mikael Morin via Gcc-cvs
The branch 'mikael/heads/stabilisation_descriptor_v02' was created in namespace 
'refs/users' pointing to:

 8c7924c0e3ad... fortran: Factor array descriptor references


[gcc r16-2173] [PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

2025-07-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e6f2daff77ee1f709105cb9f8e3e92f04c179431

commit r16-2173-ge6f2daff77ee1f709105cb9f8e3e92f04c179431
Author: Jan Dubiec 
Date:   Thu Jul 10 07:41:08 2025 -0600

[PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

This patch fixes SFtype to UDWtype (aka float to unsigned long long)
conversion on targets without DFmode like e.g. H8/300H. It solely relies
on SFtype->UWtype and UWtype->UDWtype conversions/casts. The existing code
in line 2218 (counter = a) assigns/casts a float which is *always* not 
lesser
than Wtype_MAXp1_F to an UWtype int which of course does not have enough
capacity.

PR target/116363

libgcc/ChangeLog:

* libgcc2.c (__fixunssfDI): Fix SFtype to UDWtype conversion for 
targets
without LIBGCC2_HAS_DF_MODE defined

Diff:
---
 libgcc/libgcc2.c | 41 +
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index faefff3730ca..df99c78eb204 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -2187,36 +2187,21 @@ __fixunssfDI (SFtype a)
   if (a < 1)
 return 0;
   if (a < Wtype_MAXp1_F)
-return (UWtype)a;
+return (UWtype) a;
   if (a < Wtype_MAXp1_F * Wtype_MAXp1_F)
 {
-  /* Since we know that there are fewer significant bits in the SFmode
-quantity than in a word, we know that we can convert out all the
-significant bits in one step, and thus avoid losing bits.  */
-
-  /* ??? This following loop essentially performs frexpf.  If we could
-use the real libm function, or poke at the actual bits of the fp
-format, it would be significantly faster.  */
-
-  UWtype shift = 0, counter;
-  SFtype msb;
-
-  a /= Wtype_MAXp1_F;
-  for (counter = W_TYPE_SIZE / 2; counter != 0; counter >>= 1)
-   {
- SFtype counterf = (UWtype)1 << counter;
- if (a >= counterf)
-   {
- shift |= counter;
- a /= counterf;
-   }
-   }
-
-  /* Rescale into the range of one word, extract the bits of that
-one word, and shift the result into position.  */
-  a *= Wtype_MAXp1_F;
-  counter = a;
-  return (DWtype)counter << shift;
+  /* We assume that SFtype -> UWtype and UWtype -> UDWtype casts work
+ properly. Obviously, we *cannot* assume that SFtype -> UDWtype
+ works as expected.  */
+  SFtype a_hi, a_lo;
+
+  a_hi = a / Wtype_MAXp1_F;
+  a_lo = a - a_hi * Wtype_MAXp1_F;
+
+  /* A lot of parentheses. This is to make it very clear what is
+ the sequence of operations.  */
+  return ((UDWtype) ((UWtype) a_hi)) << W_TYPE_SIZE
+| (UDWtype) ((UWtype) a_lo);
 }
   return -1;
 #else


[gcc r16-2174] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Robin Dapp via Gcc-cvs
https://gcc.gnu.org/g:dcba959fb30dc250eeb6fdd05aa878e5f1fc8c2d

commit r16-2174-gdcba959fb30dc250eeb6fdd05aa878e5f1fc8c2d
Author: Robin Dapp 
Date:   Thu Jul 10 09:41:48 2025 +0200

RISC-V: Make zero-stride load broadcast a tunable.

This patch makes the zero-stride load broadcast idiom dependent on a
uarch-tunable "use_zero_stride_load".  Right now we have quite a few
paths that reach a strided load and some of them are not exactly
straightforward.

While broadcast is relatively rare on rv64 targets it is more common on
rv32 targets that want to vectorize 64-bit elements.

While the patch is more involved than I would have liked it could have
even touched more places.  The whole broadcast-like insn path feels a
bit hackish due to the several optimizations we employ.  Some of the
complications stem from the fact that we lump together real broadcasts,
vector single-element sets, and strided broadcasts.  The strided-load
alternatives currently require a memory_constraint to work properly
which causes more complications when trying to disable just these.

In short, the whole pred_broadcast handling in combination with the
sew64_scalar_helper could use work in the future.  I was about to start
with it in this patch but soon realized that it would only distract from
the original intent.  What can help in the future is split strided and
non-strided broadcast entirely, as well as the single-element sets.

Yet unclear is whether we need to pay special attention for misaligned
strided loads (PR120782).

I regtested on rv32 and rv64 with strided_load_broadcast_p forced to
true and false.  With either I didn't observe any new execution failures
but obviously there are new scan failures with strided broadcast turned
off.

PR target/118734

gcc/ChangeLog:

* config/riscv/constraints.md (Wdm): Use tunable for Wdm
constraint.
* config/riscv/riscv-protos.h (emit_avltype_insn): Declare.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this.
* config/riscv/predicates.md: Use renamed function.
(strided_load_broadcast_p): Declare.
* config/riscv/riscv-selftests.cc (run_broadcast_selftests):
Only run broadcast selftest if strided broadcasts are OK.
* config/riscv/riscv-v.cc (emit_avltype_insn): New function.
(sew64_scalar_helper): Only emit a pred_broadcast if the new
tunable says so.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this and use new tunable.
* config/riscv/riscv.cc (struct riscv_tune_param): Add strided
broad tunable.
(strided_load_broadcast_p): Implement.
* config/riscv/vector.md: Use strided_load_broadcast_p () and
work around 64-bit broadcast on rv32 targets.

Diff:
---
 gcc/config/riscv/constraints.md |  7 ++--
 gcc/config/riscv/predicates.md  |  2 +-
 gcc/config/riscv/riscv-protos.h |  4 ++-
 gcc/config/riscv/riscv-selftests.cc | 10 --
 gcc/config/riscv/riscv-v.cc | 58 +++-
 gcc/config/riscv/riscv.cc   | 20 +++
 gcc/config/riscv/vector.md  | 66 +++--
 7 files changed, 133 insertions(+), 34 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index ccab1a2e29df..5ecaa19eb014 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -237,10 +237,11 @@
  (and (match_code "const_vector")
   (match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
(GET_MODE (op)))")))
 
-(define_memory_constraint "Wdm"
+(define_constraint "Wdm"
   "Vector duplicate memory operand"
-  (and (match_code "mem")
-   (match_code "reg" "0")))
+  (and (match_test "strided_load_broadcast_p ()")
+   (and (match_code "mem")
+   (match_code "reg" "0"
 
 ;; Vendor ISA extension constraints.
 
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8baad2fae7a9..1f9a6b562e53 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -617,7 +617,7 @@
 
 ;; The scalar operand can be directly broadcast by RVV instructions.
 (define_predicate "direct_broadcast_operand"
-  (match_test "riscv_vector::can_be_broadcasted_p (op)"))
+  (match_test "riscv_vector::can_be_broadcast_p (op)"))
 
 ;; A CONST_INT operand that has exactly two bits cleared.
 (define_predicate "const_nottwobits_operand"
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 38f63ea84248..a41c4c299fac 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -604,6 +604,7 @@ void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);

[gcc r16-2175] expand: ICE if asked to expand RDIV with non-float type.

2025-07-10 Thread Robin Dapp via Gcc-cvs
https://gcc.gnu.org/g:5aa21765236730c1772c19454cbb71365b84d583

commit r16-2175-g5aa21765236730c1772c19454cbb71365b84d583
Author: Robin Dapp 
Date:   Wed Jul 9 15:58:05 2025 +0200

expand: ICE if asked to expand RDIV with non-float type.

This patch adds asserts that ensure we only expand an RDIV_EXPR with
actual float mode.  It also replaces the RDIV_EXPR in setting a
vectorized loop's length by EXACT_DIV_EXPR.  The code in question is
only used with length-control targets (riscv, powerpc, s390).

PR target/121014

gcc/ChangeLog:

* cfgexpand.cc (expand_debug_expr): Assert FLOAT_MODE_P.
* optabs-tree.cc (optab_for_tree_code): Assert FLOAT_TYPE_P.
* tree-vect-loop.cc (vect_get_loop_len): Use EXACT_DIV_EXPR.

Diff:
---
 gcc/cfgexpand.cc  | 2 ++
 gcc/optabs-tree.cc| 2 ++
 gcc/tree-vect-loop.cc | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 33649d43f71c..a656ccebf176 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -5358,6 +5358,8 @@ expand_debug_expr (tree exp)
   return simplify_gen_binary (MULT, mode, op0, op1);
 
 case RDIV_EXPR:
+  gcc_assert (FLOAT_MODE_P (mode));
+  /* Fall through.  */
 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
   if (unsignedp)
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 6dfe8ee4c4e4..9308a6dfd65c 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -82,6 +82,8 @@ optab_for_tree_code (enum tree_code code, const_tree type,
return unknown_optab;
   /* FALLTHRU */
 case RDIV_EXPR:
+  gcc_assert (FLOAT_TYPE_P (type));
+  /* FALLTHRU */
 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
   if (TYPE_SATURATING (type))
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8ea0f45d79fc..56f80db57bbc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11079,7 +11079,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
gimple_stmt_iterator *gsi,
  factor = exact_div (nunits1, nunits2).to_constant ();
  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
  gimple_seq seq = NULL;
- loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
+ loop_len = gimple_build (&seq, EXACT_DIV_EXPR, iv_type, loop_len,
   build_int_cst (iv_type, factor));
  if (seq)
gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);


[gcc] Deleted branch 'mikael/heads/stabilisation_descriptor_v01' in namespace 'refs/users'

2025-07-10 Thread Mikael Morin via Gcc-cvs
The branch 'mikael/heads/stabilisation_descriptor_v01' in namespace 
'refs/users' was deleted.
It previously pointed to:

 a8bc113ef2e4... Revert "Ajout directive warning"

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  a8bc113... Revert "Ajout directive warning"
  3ade607... Ajout directive warning
  662b288... Revert "Revert ajout code mort"
  13f5c49... Déplacement variables après réallocation


[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] Correction array_constructor_1

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:41b730b8a79522e8e5a6115f01a02968a571e85b

commit 41b730b8a79522e8e5a6115f01a02968a571e85b
Author: Mikael Morin 
Date:   Sat Jul 5 15:05:20 2025 +0200

Correction array_constructor_1

Diff:
---
 gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 
b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
index 45eafacd5a67..a0c55076a9ae 100644
--- a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
+++ b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
@@ -9,6 +9,8 @@ program grow_type_array
 
 type(container), allocatable :: list(:)
 
+allocate(list(0))
+
 list = [list, new_elem(5)]
 
 deallocate(list)


[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] fortran: Delay evaluation of array bounds after reallocation

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:f6115ed47ddc5e89ea058afc98f43bae5d49cbf0

commit f6115ed47ddc5e89ea058afc98f43bae5d49cbf0
Author: Mikael Morin 
Date:   Mon Jul 7 11:46:08 2025 +0200

fortran: Delay evaluation of array bounds after reallocation

Delay the evaluation of bounds, offset, etc after the reallocation,
for the scalarization of allocatable arrays on the left hand side of
assignments.

Before this change, the code preceding the scalarization loop is like:

  D.4757 = ref2.offset;
  D.4759 = ref2.dim[0].ubound;
  D.4762 = ref2.dim[0].lbound;
  {
if (ref2.data == 0B) goto realloc;
if (ref2.dim[0].lbound + 4 != ref2.dim[0].ubound) goto realloc;
goto L.10;
realloc:
... change offset and bounds ...
D.4757 = ref2.offset;
D.4762 = NON_LVALUE_EXPR ;
... reallocation ...
L.10:;
  }
  while (1)
{
  ... scalarized code ...

so the bounds etc are evaluated first to variables, and the reallocation
code takes care to update the variables during the reallocation.  This
is problematic because the variables' initialization references the
array bounds, which for unallocated arrays are uninitialized at the
evaluation point.  This used to (correctly) cause uninitialized warnings
(see PR fortran/108889), and a workaround for variables was found, that
initializes the bounds of arrays variables to some value beforehand if
they are unallocated.  For allocatable components, there is no warning
but the problem remains, some uninitialized values are used, even if
discarded later.

After this change the code becomes:

{
  if (ref2.data == 0B) goto realloc;
  if (ref2.dim[0].lbound + 4 != ref2.dim[0].ubound) goto realloc;
  goto L.10;
  realloc:;
  ... change offset and bounds ...
  ... reallocation ...
  L.10:;
}
D.4762 = ref2.offset;
D.4763 = ref2.dim[0].lbound;
D.4764 = ref2.dim[0].ubound;
while (1)
  {
... scalarized code

so the scalarizer avoids storing the values to variables at the time it
evaluates them, if the array is reallocatable on assignment.  Instead,
it keeps expressions with references to the array descriptor fields,
expressions that remain valid through reallocation.  After the
reallocation code has been generated, the expressions stored by the
scalarizer are evaluated in place to variables.

The decision to delay evaluation is based on the existing field
is_alloc_lhs, which requires a few tweaks to be alway correct wrt to
what its name suggests.  Namely it should be set even if the assignment
right hand side is an intrinsic function, and it should not be set if
the right hand side is a scalar and neither if the -fno-realloc-lhs flag
is passed to the compiler.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_descriptor): Don't evaluate
offset and data to a variable if is_alloc_lhs is set.  Move the
existing evaluation decision condition for data...
(save_descriptor_data): ... here as a new predicate.
(evaluate_bound): Add argument save_value.  Omit the evaluation
of the value to a variable if that argument isn't set.
(gfc_conv_expr_descriptor): Update caller.
(gfc_conv_section_startstride): Update caller.  Set save_value
if is_alloc_lhs is not set.  Omit the evaluation of stride to a
variable if save_value isn't set.
(gfc_set_delta): Omit the evaluation of delta to a variable
if is_alloc_lhs is set.
(gfc_is_reallocatable_lhs): Return false if flag_realloc_lhs
isn't set.
(gfc_alloc_allocatable_for_assignment): Don't update
the variables that may be stored in saved_offset, delta, and
data.  Call instead...
(update_reallocated_descriptor): ... this new procedure.
* trans-expr.cc (gfc_trans_assignment_1): Don't omit setting the
is_alloc_lhs flag if the right hand side is an intrinsic
function.  Clear the flag if the right hand side is scalar.

Diff:
---
 gcc/fortran/trans-array.cc | 137 -
 gcc/fortran/trans-expr.cc  |  14 ++---
 2 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7be2d7b11a62..7b83d3fab8d7 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3420,6 +3420,23 @@ gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, 
bool subscript,
 }
 
 
+/* Given an array descriptor expression DESCR and its data poin

[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] fortran: generate array reallocation out of loops

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:a1e8410d02b3c9cd658dab13fa5422a3b99c2230

commit a1e8410d02b3c9cd658dab13fa5422a3b99c2230
Author: Mikael Morin 
Date:   Sun Jul 6 16:56:16 2025 +0200

fortran: generate array reallocation out of loops

Generate the array reallocation on assignment code before entering the
scalarization loops.  This doesn't move the generated code itself,
which was already put before the outermost loop, but only changes the
current scope at the time the code is generated.  This is a prerequisite
for a followup patch that makes the reallocation code create new
variables.  Without this change the new variables would be declared in
the innermost loop body and couldn't be used outside of it.

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_trans_assignment_1): Generate array
reallocation code before entering the scalarisation loops.

Diff:
---
 gcc/fortran/trans-expr.cc | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 3e0d763d2fb0..760c8c4e72bd 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12943,6 +12943,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
   rhs_caf_attr = gfc_caf_attr (expr2, false, &rhs_refs_comp);
 }
 
+  tree reallocation = NULL_TREE;
   if (lss != gfc_ss_terminator)
 {
   /* The assignment needs scalarization.  */
@@ -13011,6 +13012,15 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
  ompws_flags |= OMPWS_SCALARIZER_WS | OMPWS_SCALARIZER_BODY;
}
 
+  /* F2003: Allocate or reallocate lhs of allocatable array.  */
+  if (realloc_flag)
+   {
+ realloc_lhs_warning (expr1->ts.type, true, &expr1->where);
+ ompws_flags &= ~OMPWS_SCALARIZER_WS;
+ reallocation = gfc_alloc_allocatable_for_assignment (&loop, expr1,
+  expr2);
+   }
+
   /* Start the scalarized loop body.  */
   gfc_start_scalarized_body (&loop, &body);
 }
@@ -13319,15 +13329,8 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
  gfc_add_expr_to_block (&body, tmp);
}
 
-  /* F2003: Allocate or reallocate lhs of allocatable array.  */
-  if (realloc_flag)
-   {
- realloc_lhs_warning (expr1->ts.type, true, &expr1->where);
- ompws_flags &= ~OMPWS_SCALARIZER_WS;
- tmp = gfc_alloc_allocatable_for_assignment (&loop, expr1, expr2);
- if (tmp != NULL_TREE)
-   gfc_add_expr_to_block (&loop.code[expr1->rank - 1], tmp);
-   }
+  if (reallocation != NULL_TREE)
+   gfc_add_expr_to_block (&loop.code[loop.dimen - 1], reallocation);
 
   if (maybe_workshare)
ompws_flags &= ~OMPWS_SCALARIZER_BODY;


[gcc] Created branch 'mikael/heads/stabilisation_descriptor_v01' in namespace 'refs/users'

2025-07-10 Thread Mikael Morin via Gcc-cvs
The branch 'mikael/heads/stabilisation_descriptor_v01' was created in namespace 
'refs/users' pointing to:

 7e72a078ae71... fortran: Amend descriptor bounds init if unallocated


[gcc r16-2160] Remove vect_dissolve_slp_only_groups

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:e13076208452c001fd831eaaaebe1fd34762dc31

commit r16-2160-ge13076208452c001fd831eaaaebe1fd34762dc31
Author: Richard Biener 
Date:   Wed Jul 9 15:10:26 2025 +0200

Remove vect_dissolve_slp_only_groups

This function dissolves DR groups that are not subject to SLP.  Which
means it is no longer necessary.

* tree-vect-loop.cc (vect_dissolve_slp_only_groups): Remove.
(vect_analyze_loop_2): Do not call it.

Diff:
---
 gcc/tree-vect-loop.cc | 75 ---
 1 file changed, 75 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d57d34dfad27..2d5ea414559f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2260,78 +2260,6 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
   return opt_result::success ();
 }
 
-/* Look for SLP-only access groups and turn each individual access into its own
-   group.  */
-static void
-vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
-{
-  unsigned int i;
-  struct data_reference *dr;
-
-  DUMP_VECT_SCOPE ("vect_dissolve_slp_only_groups");
-
-  vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  FOR_EACH_VEC_ELT (datarefs, i, dr)
-{
-  gcc_assert (DR_REF (dr));
-  stmt_vec_info stmt_info
-   = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (DR_STMT (dr)));
-
-  /* Check if the load is a part of an interleaving chain.  */
-  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
-   {
- stmt_vec_info first_element = DR_GROUP_FIRST_ELEMENT (stmt_info);
- dr_vec_info *dr_info = STMT_VINFO_DR_INFO (first_element);
- unsigned int group_size = DR_GROUP_SIZE (first_element);
-
- /* Check if SLP-only groups.  */
- if (!STMT_SLP_TYPE (stmt_info)
- && STMT_VINFO_SLP_VECT_ONLY (first_element))
-   {
- /* Dissolve the group.  */
- STMT_VINFO_SLP_VECT_ONLY (first_element) = false;
-
- stmt_vec_info vinfo = first_element;
- while (vinfo)
-   {
- stmt_vec_info next = DR_GROUP_NEXT_ELEMENT (vinfo);
- DR_GROUP_FIRST_ELEMENT (vinfo) = vinfo;
- DR_GROUP_NEXT_ELEMENT (vinfo) = NULL;
- DR_GROUP_SIZE (vinfo) = 1;
- if (STMT_VINFO_STRIDED_P (first_element)
- /* We cannot handle stores with gaps.  */
- || DR_IS_WRITE (dr_info->dr))
-   {
- STMT_VINFO_STRIDED_P (vinfo) = true;
- DR_GROUP_GAP (vinfo) = 0;
-   }
- else
-   DR_GROUP_GAP (vinfo) = group_size - 1;
- /* Duplicate and adjust alignment info, it needs to
-be present on each group leader, see dr_misalignment.  */
- if (vinfo != first_element)
-   {
- dr_vec_info *dr_info2 = STMT_VINFO_DR_INFO (vinfo);
- dr_info2->target_alignment = dr_info->target_alignment;
- int misalignment = dr_info->misalignment;
- if (misalignment != DR_MISALIGNMENT_UNKNOWN)
-   {
- HOST_WIDE_INT diff
-   = (TREE_INT_CST_LOW (DR_INIT (dr_info2->dr))
-  - TREE_INT_CST_LOW (DR_INIT (dr_info->dr)));
- unsigned HOST_WIDE_INT align_c
-   = dr_info->target_alignment.to_constant ();
- misalignment = (misalignment + diff) % align_c;
-   }
- dr_info2->misalignment = misalignment;
-   }
- vinfo = next;
-   }
-   }
-   }
-}
-}
-
 /* Determine if operating on full vectors for LOOP_VINFO might leave
some scalar iterations still to do.  If so, decide how we should
handle those scalar iterations.  The possibilities are:
@@ -2687,9 +2615,6 @@ start_over:
   goto again;
 }
 
-  /* Dissolve SLP-only groups.  */
-  vect_dissolve_slp_only_groups (loop_vinfo);
-
   /* For now, we don't expect to mix both masking and length approaches for one
  loop, disable it if both are recorded.  */
   if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)


[gcc r16-2158] Remove non-SLP vectorization factor determining

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:4b47acfe2b626d1276e229a0cf165e934813df6c

commit r16-2158-g4b47acfe2b626d1276e229a0cf165e934813df6c
Author: Richard Biener 
Date:   Wed Jul 9 12:53:45 2025 +0200

Remove non-SLP vectorization factor determining

The following removes the VF determining step from non-SLP stmts.
For now we keep setting STMT_VINFO_VECTYPE for all stmts, there are
too many places to fix, including some more complicated ones, so
this is defered for a followup.

Along this removes vect_update_vf_for_slp, merging the check for
present hybrid SLP stmts to vect_detect_hybrid_slp and fail analysis
early.  This also removes to essentially duplicate this check in
the stmt walk of vect_analyze_loop_operations.  Getting rid of that,
and performing some other checks earlier is also defered to a followup.

* tree-vect-loop.cc (vect_determine_vf_for_stmt_1): Rename
to ...
(vect_determine_vectype_for_stmt_1): ... this and only set
STMT_VINFO_VECTYPE.  Fail for single-element vector types.
(vect_determine_vf_for_stmt): Rename to ...
(vect_determine_vectype_for_stmt): ... this and only set
STMT_VINFO_VECTYPE. Fail for single-element vector types.
(vect_determine_vectorization_factor): Rename to ...
(vect_set_stmts_vectype): ... this and only set STMT_VINFO_VECTYPE.
(vect_update_vf_for_slp): Remove.
(vect_analyze_loop_operations): Remove walk over stmts.
(vect_analyze_loop_2): Call vect_set_stmts_vectype instead of
vect_determine_vectorization_factor.  Set vectorization factor
from LOOP_VINFO_SLP_UNROLLING_FACTOR.  Fail if 
vect_detect_hybrid_slp
detects hybrid stmts or when vect_make_slp_decision finds
nothing to SLP.
* tree-vect-slp.cc (vect_detect_hybrid_slp): Move check
whether we have any hybrid stmts here from vect_update_vf_for_slp
* tree-vect-stmts.cc (vect_analyze_stmt): Remove loop over
stmts.
* tree-vectorizer.h (vect_detect_hybrid_slp): Update.

Diff:
---
 gcc/tree-vect-loop.cc  | 220 ++---
 gcc/tree-vect-slp.cc   |  48 ++-
 gcc/tree-vect-stmts.cc |  12 ++-
 gcc/tree-vectorizer.h  |   2 +-
 4 files changed, 100 insertions(+), 182 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 42e00159ff82..98ac528e3a97 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -168,9 +168,8 @@ static stmt_vec_info vect_is_simple_reduction 
(loop_vec_info, stmt_vec_info,
may already be set for general statements (not just data refs).  */
 
 static opt_result
-vect_determine_vf_for_stmt_1 (vec_info *vinfo, stmt_vec_info stmt_info,
- bool vectype_maybe_set_p,
- poly_uint64 *vf)
+vect_determine_vectype_for_stmt_1 (vec_info *vinfo, stmt_vec_info stmt_info,
+  bool vectype_maybe_set_p)
 {
   gimple *stmt = stmt_info->stmt;
 
@@ -192,6 +191,12 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (stmt_vectype)
 {
+  if (known_le (TYPE_VECTOR_SUBPARTS (stmt_vectype), 1U))
+   return opt_result::failure_at (STMT_VINFO_STMT (stmt_info),
+  "not vectorized: unsupported "
+  "data-type in %G",
+  STMT_VINFO_STMT (stmt_info));
+
   if (STMT_VINFO_VECTYPE (stmt_info))
/* The only case when a vectype had been already set is for stmts
   that contain a data ref, or for "pattern-stmts" (stmts generated
@@ -203,9 +208,6 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;
 }
 
-  if (nunits_vectype)
-vect_update_max_nunits (vf, nunits_vectype);
-
   return opt_result::success ();
 }
 
@@ -215,13 +217,12 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
or false if something prevented vectorization.  */
 
 static opt_result
-vect_determine_vf_for_stmt (vec_info *vinfo,
-   stmt_vec_info stmt_info, poly_uint64 *vf)
+vect_determine_vectype_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info)
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: %G",
 stmt_info->stmt);
-  opt_result res = vect_determine_vf_for_stmt_1 (vinfo, stmt_info, false, vf);
+  opt_result res = vect_determine_vectype_for_stmt_1 (vinfo, stmt_info, false);
   if (!res)
 return res;
 
@@ -240,7 +241,7 @@ vect_determine_vf_for_stmt (vec_info *vinfo,
dump_printf_loc (MSG_NOTE, vect_location,
 "==> examining pattern def stmt: %G",
 def_stmt_info->stmt);
-

[gcc r16-2159] Remove vect_analyze_loop_operations

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:3bf2aa834e1270e3167c9559bef9a8ef1f668604

commit r16-2159-g3bf2aa834e1270e3167c9559bef9a8ef1f668604
Author: Richard Biener 
Date:   Wed Jul 9 15:04:12 2025 +0200

Remove vect_analyze_loop_operations

This removes the remains of vect_analyze_loop_operations.  All the
checks it does still on LC PHIs of inner loops in outer loop
vectorization should be handled by vectorizable_lc_phi.

* tree-vect-loop.cc (vect_active_double_reduction_p): Remove.
(vect_analyze_loop_operations): Remove.
(vect_analyze_loop_2): Do not call it.

Diff:
---
 gcc/tree-vect-loop.cc | 137 --
 1 file changed, 137 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 98ac528e3a97..d57d34dfad27 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1960,133 +1960,6 @@ vect_create_loop_vinfo (class loop *loop, 
vec_info_shared *shared,
 
 
 
-/* Return true if STMT_INFO describes a double reduction phi and if
-   the other phi in the reduction is also relevant for vectorization.
-   This rejects cases such as:
-
-  outer1:
-   x_1 = PHI ;
-   ...
-
-  inner:
-   x_2 = ...;
-   ...
-
-  outer2:
-   x_3 = PHI ;
-
-   if nothing in x_2 or elsewhere makes x_1 relevant.  */
-
-static bool
-vect_active_double_reduction_p (stmt_vec_info stmt_info)
-{
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def)
-return false;
-
-  return STMT_VINFO_RELEVANT_P (STMT_VINFO_REDUC_DEF (stmt_info));
-}
-
-/* Function vect_analyze_loop_operations.
-
-   Scan the loop stmts and make sure they are all vectorizable.  */
-
-static opt_result
-vect_analyze_loop_operations (loop_vec_info loop_vinfo)
-{
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
-  int i;
-  stmt_vec_info stmt_info;
-
-  DUMP_VECT_SCOPE ("vect_analyze_loop_operations");
-
-  for (i = 0; i < nbbs; i++)
-{
-  basic_block bb = bbs[i];
-
-  for (gphi_iterator si = gsi_start_phis (bb); !gsi_end_p (si);
-  gsi_next (&si))
-{
-  gphi *phi = si.phi ();
-
- stmt_info = loop_vinfo->lookup_stmt (phi);
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location, "examining phi: %G",
-(gimple *) phi);
- if (virtual_operand_p (gimple_phi_result (phi)))
-   continue;
-
- /* ???  All of the below unconditional FAILs should be in
-done earlier after analyzing cycles, possibly when
-determining stmt relevancy?  */
-
-  /* Inner-loop loop-closed exit phi in outer-loop vectorization
- (i.e., a phi in the tail of the outer-loop).  */
-  if (! is_loop_header_bb_p (bb))
-{
-  /* FORNOW: we currently don't support the case that these phis
- are not used in the outerloop (unless it is double reduction,
- i.e., this phi is vect_reduction_def), cause this case
- requires to actually do something here.  */
-  if (STMT_VINFO_LIVE_P (stmt_info)
- && !vect_active_double_reduction_p (stmt_info))
-   return opt_result::failure_at (phi,
-  "Unsupported loop-closed phi"
-  " in outer-loop.\n");
-
-  /* If PHI is used in the outer loop, we check that its operand
- is defined in the inner loop.  */
-  if (STMT_VINFO_RELEVANT_P (stmt_info))
-{
-  tree phi_op;
-
-  if (gimple_phi_num_args (phi) != 1)
-return opt_result::failure_at (phi, "unsupported phi");
-
-  phi_op = PHI_ARG_DEF (phi, 0);
- stmt_vec_info op_def_info = loop_vinfo->lookup_def (phi_op);
- if (!op_def_info)
-   return opt_result::failure_at (phi, "unsupported phi\n");
-
- if (STMT_VINFO_RELEVANT (op_def_info) != vect_used_in_outer
- && (STMT_VINFO_RELEVANT (op_def_info)
- != vect_used_in_outer_by_reduction))
-   return opt_result::failure_at (phi, "unsupported phi\n");
-
- if ((STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
-  || (STMT_VINFO_DEF_TYPE (stmt_info)
-  == vect_double_reduction_def))
- && ! PURE_SLP_STMT (stmt_info))
-   return opt_result::failure_at (phi, "unsupported phi\n");
-}
-
-  continue;
-}
-
-  gcc_assert (stmt_info);
-
-  if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
-   || STMT_VINFO_LIVE_P (stmt_info))
- && STMT_VINFO_DEF_TYPE (stmt_info) != ve

[gcc r16-2171] testsuite: Add -funwind-tables to sve*/pfalse* tests

2025-07-10 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:2ff8da46152cbade579700823cc7b1460ddd91b8

commit r16-2171-g2ff8da46152cbade579700823cc7b1460ddd91b8
Author: Richard Sandiford 
Date:   Thu Jul 10 14:23:57 2025 +0100

testsuite: Add -funwind-tables to sve*/pfalse* tests

The SVE svpfalse folding tests use CFI directives to delimit the
function bodies.  That requires -funwind-tables to be enabled,
which is true by default for *-linux-gnu targets, but not for *-elf.

gcc/testsuite/
* gcc.target/aarch64/sve/pfalse-binary.c: Add -funwind-tables.
* gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_rotate.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binaryxn.c: Likewise.
* gcc.target/aarch64/sve/pfalse-clast.c: Likewise.
* gcc.target/aarch64/sve/pfalse-compare_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-count_pred.c: Likewise.
* gcc.target/aarch64/sve/pfalse-fold_left.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_ext.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_gather_sv.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_gather_vs.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_replicate.c: Likewise.
* gcc.target/aarch64/sve/pfalse-prefetch.c: Likewise.
* gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: Likewise.
* gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: Likewise.
* gcc.target/aarch64/sve/pfalse-ptest.c: Likewise.
* gcc.target/aarch64/sve/pfalse-rdffr.c: Likewise.
* gcc.target/aarch64/sve/pfalse-reduction.c: Likewise.
* gcc.target/aarch64/sve/pfalse-reduction_wide.c: Likewise.
* gcc.target/aarch64/sve/pfalse-shift_right_imm.c: Likewise.
* gcc.target/aarch64/sve/pfalse-store.c: Likewise.
* gcc.target/aarch64/sve/pfalse-store_scatter_index.c: Likewise.
* gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: Likewise.
* gcc.target/aarch64/sve/pfalse-storexn.c: Likewise.
* gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-ternary_rotate.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_convertxn.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_pred.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_to_uint.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unaryxn.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: 
Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_wide.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-compare.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c,
* 
gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c,
* gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: 
Likewise.
* gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c,
* gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c,
* gcc.target/aarch64/sve2/pfalse-unary.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-unary_convert.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-unary_to_int.c: Likewise.

Diff:
---
 gcc/testsuite/gcc

[gcc r16-2172] [RISC-V] Detect new fusions for RISC-V

2025-07-10 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:742f55622690d35c6cc95f2b8722307699731571

commit r16-2172-g742f55622690d35c6cc95f2b8722307699731571
Author: Daniel Barboza 
Date:   Thu Jul 10 07:28:38 2025 -0600

[RISC-V] Detect new fusions for RISC-V

This is primarily Daniel's work...  He's chasing things in QEMU & LLVM right
now so I'm doing a bit of clean-up and shepherding this patch forward.

--

Instruction fusion is a reasonably common way to improve the performance of
code on many architectures/designs.  A few years ago we submitted (via 
VRULL I
suspect) fusion support for a number of cases in the RISC-V space.

We made each type of fusion selectable independently in the tuning 
structure so
that designs which implemented some particular set of fusions could select 
just
the ones their design implemented.  This patch adds to that generic
infrastructure.

In particular we're introducing additional load fusions, store pair fusions,
bitfield extractions and a few B extension related fusions.

Conceptually for the new load fusions we're adding the ability to fuse most
add/shNadd instructions with a subsequent load.  There's a couple of
exceptions, but in general the expectation is that if we have add/shNadd for
address computation, then they can potentially use with the load where the
address gets used.

We've had limited forms of store pair fusion for a while.  Essentially we
required both stores to be 64 bits wide and land on opposite sides of a 128 
bit
cache line.  That was enough to help prologues and a few other things, but 
was
fairly restrictive.  The new cases capture store pairs where the two stores
have the same size and hit consecutive memory locations.  For example, 
storing
consecutive bytes with sb+sb is fusible.

For bitfield extractions we can fuse together a shift left followed by a 
shift
right for arbitrary shift counts where as previously we restricted the shift
counts to those implementing sign/zero extensions of 8, and 16 bit objects.

Finally some B extension fusions.  orc.b+not which shows up in string
comparisons, ctz+andi (deepsjeng?), neg+max (synthesized abs).

I hope these prove to be useful to other RISC-V designs.  I wouldn't be
surprised if we have to break down the new load fusions further for some
designs.  If we need to do that it wouldn't be hard.

FWIW, our data indicates the generalized store fusions followed by the 
expanded
load fusions are the most important cases for the new code.

These have been tested with crosses and bootstrapped on the BPI.

Waiting on pre-commit CI before moving forward (though it has been failing 
to
pick up some patches recently...)

gcc/
* config/riscv/riscv.cc (riscv_fusion_pairs): Add new cases.
(riscv_set_is_add): New function.
(riscv_set_is_addi, riscv_set_is_adduw, riscv_set_is_shNadd): 
Likewise.
(riscv_set_is_shNadduw): Likewise.
(riscv_macro_fusion_pair_p): Add new fusion cases.

Co-authored-by: Jeff Law  

Diff:
---
 gcc/config/riscv/riscv.cc | 383 +-
 1 file changed, 382 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b868a503a35f..023adc3284df 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -283,6 +283,10 @@ enum riscv_fusion_pairs
   RISCV_FUSE_AUIPC_LD = (1 << 7),
   RISCV_FUSE_LDPREINCREMENT = (1 << 8),
   RISCV_FUSE_ALIGNED_STD = (1 << 9),
+  RISCV_FUSE_CACHE_ALIGNED_STD = (1 << 10),
+  RISCV_FUSE_BFEXT = (1 << 11),
+  RISCV_FUSE_EXPANDED_LD = (1 << 12),
+  RISCV_FUSE_B_ALUI = (1 << 13),
 };
 
 /* Costs of various operations on the different architectures.  */
@@ -10205,6 +10209,81 @@ riscv_fusion_enabled_p(enum riscv_fusion_pairs op)
   return tune_param->fusible_ops & op;
 }
 
+/* Matches an add:
+   (set (reg:DI rd) (plus:SI (reg:SI rs1) (reg:SI rs2))) */
+
+static bool
+riscv_set_is_add (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && REG_P (XEXP (SET_SRC (set), 0))
+ && REG_P (XEXP (SET_SRC (set), 1))
+ && REG_P (SET_DEST (set)));
+}
+
+/* Matches an addi:
+   (set (reg:DI rd) (plus:SI (reg:SI rs1) (const_int imm))) */
+
+static bool
+riscv_set_is_addi (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && REG_P (XEXP (SET_SRC (set), 0))
+ && CONST_INT_P (XEXP (SET_SRC (set), 1))
+ && REG_P (SET_DEST (set)));
+}
+
+/* Matches an add.uw:
+  (set (reg:DI rd)
+(plus:DI (zero_extend:DI (reg:SI rs1)) (reg:DI rs2))) */
+
+static bool
+riscv_set_is_adduw (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && GET_CODE (XEXP (SET_SRC (set), 0)) == ZERO_EXTEND
+ && REG_P (XEXP (XEXP (SET_SRC (set), 0), 0))
+ && REG_P (XEXP (SET_SRC (set), 1))

[gcc r16-2176] Fixes to auto-profile and Gimple matching.

2025-07-10 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:50f3a6a437ad4f2438191b6d9aa9aed8575b9372

commit r16-2176-g50f3a6a437ad4f2438191b6d9aa9aed8575b9372
Author: Jan Hubicka 
Date:   Thu Jul 10 16:56:21 2025 +0200

Fixes to auto-profile and Gimple matching.

This patch fixes several issues I noticed in gimple matching and 
-Wauto-profile
warning.  One problem is that we mismatched symbols with user names, such as
"*strlen" instead of "strlen". I added raw_symbol_name to strip extra '*' 
which
is ok on ELF targets which are only targets we support with auto-profile, 
but
eventually we will want to add the user prefix.  There is sorry about this.
Also I think dwarf2out is wrong:

static void
add_linkage_attr (dw_die_ref die, tree decl)
{
  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME 
(decl));

  /* Mimic what assemble_name_raw does with a leading '*'.  */
  if (name[0] == '*')
name = &name[1];

The patch also fixes locations of warning.  I used location of problematic
statement as warning_at parmaeter but also included info about the 
containing
funtction.  This makes warning_at to ignore the fist location that is fixed 
now.

I also fixed the ICE with -Wno-auto-profile disussed earlier.

Bootstrapped/regtested x86_64-linux.  Autoprofiled bootstrap now fails for
weird reasons for me (it does not bild the training stage), so I will try to
debug this before comitting.

gcc/ChangeLog:

* auto-profile.cc: Include output.h.
(function_instance::set_call_location): Also sanity check
that location is known.
(raw_symbol_name): Two new static functions.
(dump_inline_stack): Use it.
(string_table::get_index_by_decl): Likewise.
(function_instance::get_cgraph_node): Likewise.
(function_instance::get_function_instance_by_decl): Fix typo
in warning; use raw names; fix lineno decoding.
(match_with_target): Add containing funciton parameter;
correctly output function and call location in warning.
(function_instance::lookup_count): Fix warning locations.
(function_instance::match): Fix warning locations; avoid
crash with mismatched callee; do not warn about broken callsites
twice.
(autofdo_source_profile::offline_external_functions): Use
raw_assembler_name.
(walk_block): Use raw_assembler_name.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-inline.c: Add user symbol names.

Diff:
---
 gcc/auto-profile.cc  | 231 +--
 gcc/testsuite/gcc.dg/tree-prof/afdo-inline.c |   9 ++
 2 files changed, 156 insertions(+), 84 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 219676012e76..5226e4550257 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "auto-profile.h"
 #include "tree-pretty-print.h"
 #include "gimple-pretty-print.h"
+#include "output.h"
 
 /* The following routines implements AutoFDO optimization.
 
@@ -430,7 +431,8 @@ public:
   void
   set_call_location (location_t l)
   {
-gcc_checking_assert (call_location_ == UNKNOWN_LOCATION);
+gcc_checking_assert (call_location_ == UNKNOWN_LOCATION
+&& l != UNKNOWN_LOCATION);
 call_location_= l;
   }
 
@@ -685,6 +687,26 @@ dump_afdo_loc (FILE *f, unsigned loc)
 fprintf (f, "%i", loc >> 16);
 }
 
+/* Return assembler name as in symbol table and DW_AT_linkage_name.  */
+
+static const char *
+raw_symbol_name (const char *asmname)
+{
+  /* If we start supporting user_label_prefixes, add_linkage_attr will also
+ need to be fixed.  */
+  if (strlen (user_label_prefix))
+sorry ("auto-profile is not supported for targets with user label prefix");
+  return asmname + (asmname[0] == '*');
+}
+
+/* Convenience wrapper that looks up assembler name.  */
+
+static const char *
+raw_symbol_name (tree decl)
+{
+  return raw_symbol_name (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+}
+
 /* Dump STACK to F.  */
 
 static void
@@ -695,7 +717,7 @@ dump_inline_stack (FILE *f, inline_stack *stack)
 {
   fprintf (f, "%s%s:",
   first ? "" : "; ",
-  IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (p.decl)));
+  raw_symbol_name (p.decl));
   dump_afdo_loc (f, p.afdo_loc);
   first = false;
 }
@@ -817,7 +839,7 @@ string_table::get_index (const char *name) const
 int
 string_table::get_index_by_decl (tree decl) const
 {
-  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  const char *name = raw_symbol_name (decl);
   int ret = get_index (name);
   if (ret != -1)
 return ret;
@@ -880,10 +902,9 @@ function_instance::~function_instan

[gcc r16-2177] cobol: Add PUSH and POP to CDF.

2025-07-10 Thread James K. Lowden via Gcc-cvs
https://gcc.gnu.org/g:3f59a1cac717f8af84e884e9ec0f6ef14e102e6e

commit r16-2177-g3f59a1cac717f8af84e884e9ec0f6ef14e102e6e
Author: James K. Lowden 
Date:   Wed Jul 9 18:14:40 2025 -0400

cobol: Add PUSH and POP to CDF.

Introduce cdf_directives_t class to centralize management of CDF
state. Move existing CDF state variables and functions into the new
class.

gcc/cobol/ChangeLog:

PR cobol/120765
* cdf.y: Extend grammar for new CDF syntax, relocate dictionary.
* cdfval.h (cdf_dictionary): Use new CDF dictionary.
* dts.h: Remove useless assignment, note incorrect behavior.
* except.cc: Remove obsolete EC state.
* gcobol.1: Document CDF in its own section.
* genapi.cc (parser_statement_begin): Use new EC state function.
(parser_file_merge): Same.
(parser_check_fatal_exception): Same.
* genutil.cc (get_and_check_refstart_and_reflen): Same.
(get_depending_on_value_from_odo): Same.
(get_data_offset): Same.
(process_this_exception): Same.
* lexio.cc (check_push_pop_directive): New function.
(check_source_format_directive): Restrict regex search to 1 line.
(cdftext::free_form_reference_format): Use new function.
* parse.y: Define new CDF tokens, use new CDF state.
* parse_ante.h (cdf_tokens): Use new CDF state.
(redefined_token): Same.
(class prog_descr_t): Remove obsolete CDF state.
(class program_stack_t): Same.
(current_call_convention): Same.
* scan.l: Recognize new CDF tokens.
* scan_post.h (is_cdf_token): Same.
* symbols.h (cdf_current_tokens): Change current_call_convention to 
return void.
* token_names.h: Regenerate.
* udf/stored-char-length.cbl: Use new PUSH/POP CDF functionality.
* util.cc (class cdf_directives_t): Define cdf_directives_t.
(current_call_convention): Same.
(cdf_current_tokens): Same.
(cdf_dictionary): Same.
(cdf_enabled_exceptions): Same.
(cdf_push): Same.
(cdf_push_call_convention): Same.
(cdf_push_current_tokens): Same.
(cdf_push_dictionary): Same.
(cdf_push_enabled_exceptions): Same.
(cdf_push_source_format): Same.
(cdf_pop): Same.
(cdf_pop_call_convention): Same.
(cdf_pop_current_tokens): Same.
(cdf_pop_dictionary): Same.
(cdf_pop_enabled_exceptions): Same.
(cdf_pop_source_format): Same.
* util.h (cdf_push): Declare cdf_directives_t.
(cdf_push_call_convention): Same.
(cdf_push_current_tokens): Same.
(cdf_push_dictionary): Same.
(cdf_push_enabled_exceptions): Same.
(cdf_push_source_format): Same.
(cdf_pop): Same.
(cdf_pop_call_convention): Same.
(cdf_pop_current_tokens): Same.
(cdf_pop_dictionary): Same.
(cdf_pop_source_format): Same.
(cdf_pop_enabled_exceptions): Same.

libgcobol/ChangeLog:

* common-defs.h (cdf_enabled_exceptions): Use new CDF state.

Diff:
---
 gcc/cobol/cdf.y  |   94 +-
 gcc/cobol/cdfval.h   |4 +
 gcc/cobol/dts.h  |   14 +-
 gcc/cobol/except.cc  |2 -
 gcc/cobol/gcobol.1   |  192 +--
 gcc/cobol/genapi.cc  |6 +-
 gcc/cobol/genutil.cc |7 +
 gcc/cobol/lexio.cc   |   72 +-
 gcc/cobol/parse.y|   21 +-
 gcc/cobol/parse_ante.h   |   48 +-
 gcc/cobol/scan.l |   13 +
 gcc/cobol/scan_post.h|2 +
 gcc/cobol/symbols.h  |3 +-
 gcc/cobol/token_names.h  | 2228 +-
 gcc/cobol/udf/stored-char-length.cbl |4 +
 gcc/cobol/util.cc|   90 +-
 gcc/cobol/util.h |   15 +
 libgcobol/common-defs.h  |2 +-
 18 files changed, 1541 insertions(+), 1276 deletions(-)

diff --git a/gcc/cobol/cdf.y b/gcc/cobol/cdf.y
index f1a791245854..840eb5033151 100644
--- a/gcc/cobol/cdf.y
+++ b/gcc/cobol/cdf.y
@@ -105,14 +105,14 @@ void input_file_status_notify();
 
   using std::map;
 
-  static map dictionary;
-
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wunused-function"
   static bool
   cdfval_add( const char name[],
   const cdfval_t& value, bool override = false )
   {
+cdf_values_t& dictionary( cdf_dictionary() );
+
 if( scanner_parsing() ) {
   if( ! override ) {
if( dictionary.find(name) != dictionary.end() ) return false;
@@ -123,6 +123,8 @@ void input_file_status_notify();
   }
   static void
 

[gcc r16-2178] aarch64: Fix LD1Q and ST1Q failures for big-endian

2025-07-10 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:e7f049471c6caf22c65ac48773d864fca7a4cdc4

commit r16-2178-ge7f049471c6caf22c65ac48773d864fca7a4cdc4
Author: Richard Sandiford 
Date:   Thu Jul 10 16:54:45 2025 +0100

aarch64: Fix LD1Q and ST1Q failures for big-endian

LD1Q gathers and ST1Q scatters are unusual in that they operate
on 128-bit blocks (effectively VNx1TI).  However, we don't have
modes or ACLE types for 128-bit integers, and 128-bit integers
are not the intended use case.  Instead, the instructions are
intended to be used in "hybrid VLA" operations, where each 128-bit
block is an Advanced SIMD vector.

The normal SVE modes therefore capture the intended use case better
than VNx1TI would.  For example, VNx2DI is effectively N copies
of V2DI, VNx4SI N copies of V4SI, etc.

Since there is only one LD1Q instruction and one ST1Q instruction,
the ACLE support used a single pattern for each, with the loaded or
stored data having mode VNx2DI.  The ST1Q pattern was generated by:

rtx data = e.args.last ();
e.args.last () = force_lowpart_subreg (VNx2DImode, data, GET_MODE 
(data));
e.prepare_gather_address_operands (1, false);
return e.use_exact_insn (CODE_FOR_aarch64_scatter_st1q);

where the force_lowpart_subreg bitcast the stored data to VNx2DI.
But such subregs require an element reverse on big-endian targets
(see the comment at the head of aarch64-sve.md), which wasn't the
intention.  The code should have used aarch64_sve_reinterpret instead.

The LD1Q pattern was used as follows:

e.prepare_gather_address_operands (1, false);
return e.use_exact_insn (CODE_FOR_aarch64_gather_ld1q);

which always returns a VNx2DI value, leaving the caller to bitcast
that to the correct mode.  That bitcast again uses subregs and has
the same problem as above.

However, for the reasons explained in the comment, using
aarch64_sve_reinterpret does not work well for LD1Q.  The patch
instead parameterises the LD1Q based on the required data mode.

gcc/
* config/aarch64/aarch64-sve2.md (aarch64_gather_ld1q): Replace 
with...
(@aarch64_gather_ld1q): ...this, parameterizing based on mode.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svld1q_gather_impl::expand): Update accordingly.
(svst1q_scatter_impl::expand): Use aarch64_sve_reinterpret
instead of force_lowpart_subreg.

Diff:
---
 gcc/config/aarch64/aarch64-sve-builtins-sve2.cc |  5 +++--
 gcc/config/aarch64/aarch64-sve2.md  | 21 +++--
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
index d9922de7ca5a..abe21a8b61c6 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
@@ -316,7 +316,8 @@ public:
   expand (function_expander &e) const override
   {
 e.prepare_gather_address_operands (1, false);
-return e.use_exact_insn (CODE_FOR_aarch64_gather_ld1q);
+auto icode = code_for_aarch64_gather_ld1q (e.tuple_mode (0));
+return e.use_exact_insn (icode);
   }
 };
 
@@ -722,7 +723,7 @@ public:
   expand (function_expander &e) const override
   {
 rtx data = e.args.last ();
-e.args.last () = force_lowpart_subreg (VNx2DImode, data, GET_MODE (data));
+e.args.last () = aarch64_sve_reinterpret (VNx2DImode, data);
 e.prepare_gather_address_operands (1, false);
 return e.use_exact_insn (CODE_FOR_aarch64_scatter_st1q);
   }
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 789ec0dd1a3c..660901d4b3f1 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -334,12 +334,21 @@
 ;; - LD1Q (SVE2p1)
 ;; -
 
-;; Model this as operating on the largest valid element size, which is DI.
-;; This avoids having to define move patterns & more for VNx1TI, which would
-;; be difficult without a non-gather form of LD1Q.
-(define_insn "aarch64_gather_ld1q"
-  [(set (match_operand:VNx2DI 0 "register_operand")
-   (unspec:VNx2DI
+;; For little-endian targets, it would be enough to use a single pattern,
+;; with a subreg to bitcast the result to whatever mode is needed.
+;; However, on big-endian targets, the bitcast would need to be an
+;; aarch64_sve_reinterpret instruction.  That would interact badly
+;; with the "&" and "?" constraints in this pattern: if the result
+;; of the reinterpret needs to be in the same register as the index,
+;; the RA would tend to prefer to allocate a separate register for the
+;; intermediate (uncast) result, even if the reinterpret prefers tying.
+;;
+;; The index is logically VNx1DI rather than VNx2DI, but introducing
+;; and using VNx1DI would just create mor

[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] fortran: Amend descriptor bounds init if unallocated

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:7e72a078ae71594f6f34d406a80b47ca90cf876e

commit 7e72a078ae71594f6f34d406a80b47ca90cf876e
Author: Mikael Morin 
Date:   Wed Jul 9 09:40:32 2025 +0200

fortran: Amend descriptor bounds init if unallocated

Always generate the conditional initialization of unallocated variables
regardless of the basic variable allocation tracking done in the
frontend and with an additional always false condition.

The scalarizer used to always evaluate array bounds, including in the
case of unallocated arrays on the left hand side of an assignment.  This
was (correctly) causing uninitialized warnings, even if the
uninitialized values were in the end discarded.

Since the fix for PR fortran/108889, an initialization of the descriptor
bounds is added to silent the uninitialized warnings, conditional on the
array being unallocated.  This initialization is not useful in the
execution of the program, and it is removed if the compiler can prove
that the variable is unallocated (in the case of a local variable for
example).  Unfortunately, the compiler is not always able to prove it
and the useless initialization may remain in the final code.
Moreover, the generated code that was causing the evaluation of
uninitialized variables has ben changed to avoid them, so we can try to
remove or revisit that unallocated variable bounds initialization tweak.

Unfortunately, just removing the extra initialization restores the
warnings at -O0, as there is no dead code removal at that optimization
level.  Instead, this change keeps the initialization and modifies its
guarding condition with an extra always false variable, so that if
optimizations are enabled the whole initialization block is removed, and
if they are disabled it remains and is sufficient to prevent the
warning.

The new variable requires the code generation to be done earlier in the
function so that the variable declaration and usage are in the same
scope.

As the modified condition guarantees the removal of the block with
optimizations, we can emit it more broadly and remove the basic
allocation tracking that was done in the frontend to limit its emission.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_symbol): Remove field allocated_in_scope.
* trans-array.cc (gfc_array_allocate): Don't set it.
(gfc_alloc_allocatable_for_assignment): Likewise.
Generate the unallocated descriptor bounds initialisation
before the opening of the reallocation code block.  Create a
variable and use it as additional condition to the unallocated
descriptor bounds initialisation.

Diff:
---
 gcc/fortran/gfortran.h |  4 --
 gcc/fortran/trans-array.cc | 91 --
 2 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6848bd1762d3..69367e638c5b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2028,10 +2028,6 @@ typedef struct gfc_symbol
   /* Set if this should be passed by value, but is not a VALUE argument
  according to the Fortran standard.  */
   unsigned pass_as_value:1;
-  /* Set if an allocatable array variable has been allocated in the current
- scope. Used in the suppression of uninitialized warnings in reallocation
- on assignment.  */
-  unsigned allocated_in_scope:1;
   /* Set if an external dummy argument is called with different argument lists.
  This is legal in Fortran, but can cause problems with autogenerated
  C prototypes for C23.  */
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7b83d3fab8d7..52888c1e1f1b 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -6800,8 +6800,6 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree 
status, tree errmsg,
   else
   gfc_add_expr_to_block (&se->pre, set_descriptor);
 
-  expr->symtree->n.sym->allocated_in_scope = 1;
-
   return true;
 }
 
@@ -11495,14 +11493,60 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo 
*loop,
   && !expr2->value.function.isym)
 expr2->ts.u.cl->backend_decl = rss->info->string_length;
 
-  gfc_start_block (&fblock);
-
   /* Since the lhs is allocatable, this must be a descriptor type.
  Get the data and array size.  */
   desc = linfo->descriptor;
   gcc_assert (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (desc)));
   array1 = gfc_conv_descriptor_data_get (desc);
 
+  /* If the data is null, set the descriptor bounds and offset.  This 
suppresses
+ the maybe used uninitialized warning.  Note that the always false variable
+ prevents this block from from ever being executed.  The whole block should
+ be removed by optimizations.  Component references are not subject to the
+ warnings, so we don't uselessly complicate the generated code 

[gcc r16-2164] aarch64: Extend HVLA permutations to big-endian

2025-07-10 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:3b870131487d786a74f27a89d0415c8207770f14

commit r16-2164-g3b870131487d786a74f27a89d0415c8207770f14
Author: Richard Sandiford 
Date:   Thu Jul 10 10:57:28 2025 +0100

aarch64: Extend HVLA permutations to big-endian

TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
"hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
This matching was conditional on !BYTES_BIG_ENDIAN.

The ACLE code also lowered the associated SVE2.1 intrinsics into
suitable VEC_PERM_EXPRs.  This lowering was not conditional on
!BYTES_BIG_ENDIAN.

The mismatch led to lots of ICEs in the ACLE tests on big-endian
targets: we lowered to VEC_PERM_EXPRs that are not supported.

I think the !BYTES_BIG_ENDIAN restriction was unnecessary.
SVE maps the first memory element to the least significant end of
the register for both endiannesses, so no endian correction or lane
number adjustment is necessary.

This is in some ways a bit counterintuitive.  ZIPQ1 is conceptually
"apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does
matter when choosing between Advanced SIMD ZIP1 and ZIP2.  For example,
the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little-
endian and ZIP2 for big-endian.  But the difference between the hybrid
VLA and Advanced SIMD permute selectors is a consequence of the
difference between the SVE and Advanced SIMD element orders.

The same thing applies to ACLE intrinsics.  The current lowering of
svzipq1 etc. is correct for both endiannesses.  If ACLE code does:

  2x svld1_s32 + svzipq1_s32 + svst1_s32

then the byte-for-byte result is the same for both endiannesses.
On big-endian targets, this is different from using the Advanced SIMD
sequence below for each 128-bit block:

  2x LDR + ZIP1 + STR

In contrast, the byte-for-byte result of:

  2x svld1q_gather_s32 + svzipq1_s32 + svst11_scatter_s32

depends on endianness, since the quadword gathers and scatters use
Advanced SIMD byte ordering for each 128-bit block.  This gather/scatter
sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR
sequence for both endiannesses.

Programmers writing ACLE code have to be aware of this difference
if they want to support both endiannesses.

The patch includes some new execution tests to verify the expansion
of the VEC_PERM_EXPRs.

gcc/
* doc/sourcebuild.texi (aarch64_sve2_hw, aarch64_sve2p1_hw): 
Document.
* config/aarch64/aarch64.cc (aarch64_evpc_hvla): Extend to
BYTES_BIG_ENDIAN.

gcc/testsuite/
* lib/target-supports.exp 
(check_effective_target_aarch64_sve2p1_hw):
New proc.
* gcc.target/aarch64/sve2/dupq_1.c: Extend to big-endian.  Add
noipa attributes.
* gcc.target/aarch64/sve2/extq_1.c: Likewise.
* gcc.target/aarch64/sve2/uzpq_1.c: Likewise.
* gcc.target/aarch64/sve2/zipq_1.c: Likewise.
* gcc.target/aarch64/sve2/dupq_1_run.c: New test.
* gcc.target/aarch64/sve2/extq_1_run.c: Likewise.
* gcc.target/aarch64/sve2/uzpq_1_run.c: Likewise.
* gcc.target/aarch64/sve2/zipq_1_run.c: Likewise.

Diff:
---
 gcc/config/aarch64/aarch64.cc  |  1 -
 gcc/doc/sourcebuild.texi   |  6 ++
 gcc/testsuite/gcc.target/aarch64/sve2/dupq_1.c | 26 +++
 gcc/testsuite/gcc.target/aarch64/sve2/dupq_1_run.c | 87 ++
 gcc/testsuite/gcc.target/aarch64/sve2/extq_1.c | 20 ++---
 gcc/testsuite/gcc.target/aarch64/sve2/extq_1_run.c | 73 ++
 gcc/testsuite/gcc.target/aarch64/sve2/uzpq_1.c | 18 ++---
 gcc/testsuite/gcc.target/aarch64/sve2/uzpq_1_run.c | 78 +++
 gcc/testsuite/gcc.target/aarch64/sve2/zipq_1.c | 18 ++---
 gcc/testsuite/gcc.target/aarch64/sve2/zipq_1_run.c | 78 +++
 gcc/testsuite/lib/target-supports.exp  | 17 +
 11 files changed, 380 insertions(+), 42 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 477cbece6c98..27c315fc35e8 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26801,7 +26801,6 @@ aarch64_evpc_hvla (struct expand_vec_perm_d *d)
   machine_mode vmode = d->vmode;
   if (!TARGET_SVE2p1
   || !TARGET_NON_STREAMING
-  || BYTES_BIG_ENDIAN
   || d->vec_flags != VEC_SVE_DATA
   || GET_MODE_UNIT_BITSIZE (vmode) > 64)
 return false;
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6c5586e4b034..85fb810d96c5 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2373,6 +2373,12 @@ whether it does so by default).
 @itemx aarch64_sve1024_hw
 @itemx aarch64_sve2048_hw
 Like @code{aarch64_sve_hw}, but also test for an exact ha

[gcc r16-2167] Avoid vect_is_simple_use call from get_load_store_type

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:13beea469554efcffd0f2cda6f0484a603577f27

commit r16-2167-g13beea469554efcffd0f2cda6f0484a603577f27
Author: Richard Biener 
Date:   Thu Jul 10 10:25:03 2025 +0200

Avoid vect_is_simple_use call from get_load_store_type

This isn't the required refactoring of vect_check_gather_scatter
but it avoids a now unnecessary call to vect_is_simple_use which
is problematic because it looks at STMT_VINFO_VECTYPE which we
want to get rid of.  SLP build already ensures vect_is_simple_use
on all lane defs, so all we need is to populate the offset_vectype
and offset_dt which is not always set by vect_check_gather_scatter.
That's both easy to get from the SLP child directly.

* tree-vect-stmts.cc (get_load_store_type): Do not use
vect_is_simple_use to fill gather/scatter offset operand
vectype and dt.

Diff:
---
 gcc/tree-vect-stmts.cc | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e5971e4a357b..4aa69da2218b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2466,17 +2466,10 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
 vls_type == VLS_LOAD ? "gather" : "scatter");
  return false;
}
-  else if (!vect_is_simple_use (gs_info->offset, vinfo,
-   &gs_info->offset_dt,
-   &gs_info->offset_vectype))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"%s index use not simple.\n",
-vls_type == VLS_LOAD ? "gather" : "scatter");
- return false;
-   }
-  else if (gs_info->ifn == IFN_LAST && !gs_info->decl)
+  slp_tree offset_node = SLP_TREE_CHILDREN (slp_node)[0];
+  gs_info->offset_dt = SLP_TREE_DEF_TYPE (offset_node);
+  gs_info->offset_vectype = SLP_TREE_VECTYPE (offset_node);
+  if (gs_info->ifn == IFN_LAST && !gs_info->decl)
{
  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ()
  || !TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype).is_constant ()


[gcc r16-2168] Avoid vect_is_simple_use call from vectorizable_reduction

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:31c96621cc307fed1a0a01c0c2f18afaaf50b256

commit r16-2168-g31c96621cc307fed1a0a01c0c2f18afaaf50b256
Author: Richard Biener 
Date:   Thu Jul 10 11:21:26 2025 +0200

Avoid vect_is_simple_use call from vectorizable_reduction

When analyzing the reduction cycle we look to determine the
reduction input vector type, for lane-reducing ops we look
at the input but instead of using vect_is_simple_use which
is problematic for SLP we should simply get at the SLP
operands vector type.  If that's not set and we make up one
we should also ensure it stays so.

* tree-vect-loop.cc (vectorizable_reduction): Avoid
vect_is_simple_use and record a vector type if we come
up with one.

Diff:
---
 gcc/tree-vect-loop.cc | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7b260c34a846..8ea0f45d79fc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7378,23 +7378,20 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
  if (lane_reducing_op_p (op.code))
{
- enum vect_def_type dt;
- tree vectype_op;
-
  /* The last operand of lane-reducing operation is for
 reduction.  */
  gcc_assert (reduc_idx > 0 && reduc_idx == (int) op.num_ops - 1);
 
- if (!vect_is_simple_use (op.ops[0], loop_vinfo, &dt, &vectype_op))
-   return false;
-
+ slp_tree op_node = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
+ tree vectype_op = SLP_TREE_VECTYPE (op_node);
  tree type_op = TREE_TYPE (op.ops[0]);
-
  if (!vectype_op)
{
  vectype_op = get_vectype_for_scalar_type (loop_vinfo,
type_op);
- if (!vectype_op)
+ if (!vectype_op
+ || !vect_maybe_update_slp_op_vectype (op_node,
+   vectype_op))
return false;
}


[gcc r15-9949] Fix 'main' function in 'gcc.dg/builtin-dynamic-object-size-pr120780.c'

2025-07-10 Thread Siddhesh Poyarekar via Gcc-cvs
https://gcc.gnu.org/g:57eae2c32f2ce654053f5ce4b6fb4eb79381d7da

commit r15-9949-g57eae2c32f2ce654053f5ce4b6fb4eb79381d7da
Author: Thomas Schwinge 
Date:   Wed Jul 9 10:06:39 2025 +0200

Fix 'main' function in 'gcc.dg/builtin-dynamic-object-size-pr120780.c'

Fix-up for commit 72e85d46472716e670cbe6e967109473b8d12d38
"tree-optimization/120780: Support object size for containing objects".
'size_t sz' is unused here, and GCC/nvptx doesn't accept this:

spawn -ignore SIGHUP [...]/nvptx-none-run 
./builtin-dynamic-object-size-pr120780.exe
error   : Prototype doesn't match for 'main' in 'input file 1 at offset 
1924', first defined in 'input file 1 at offset 1924'
nvptx-run: cuLinkAddData failed: unknown error (CUDA_ERROR_UNKNOWN, 999)
FAIL: gcc.dg/builtin-dynamic-object-size-pr120780.c execution test

gcc/testsuite/
* gcc.dg/builtin-dynamic-object-size-pr120780.c: Fix 'main' 
function.

(cherry picked from commit c6ca6e57004653b787d2d6243fe5ee00cda8aad0)

Diff:
---
 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
index 0d6593ec8289..12e6c29569c7 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
@@ -207,7 +207,7 @@ test5 (size_t sz)
 }
 
 int
-main (size_t sz)
+main (void)
 {
   test1 (sizeof (struct container));
   test1 (sizeof (struct container) - sizeof (int));


[gcc(refs/users/omachota/heads/rtl-ssa-dce)] rtl-ssa-dce: format code

2025-07-10 Thread Ondrej Machota via Gcc-cvs
https://gcc.gnu.org/g:e5a639732d48e976af0466bb3721d0e0df3da8da

commit e5a639732d48e976af0466bb3721d0e0df3da8da
Author: Ondřej Machota 
Date:   Thu Jul 10 18:08:51 2025 +0200

rtl-ssa-dce: format code

Diff:
---
 gcc/dce.cc | 61 +++--
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/gcc/dce.cc b/gcc/dce.cc
index 67fb42541d84..4691901f56d6 100644
--- a/gcc/dce.cc
+++ b/gcc/dce.cc
@@ -1386,6 +1386,7 @@ private:
   sbitmap m_marked_phis;
 };
 
+// Return true if INSN cannot be deleted.
 bool
 rtl_ssa_dce::is_rtx_pattern_prelive (const_rtx insn)
 {
@@ -1393,7 +1394,7 @@ rtl_ssa_dce::is_rtx_pattern_prelive (const_rtx insn)
 {
 case PREFETCH:
 case UNSPEC:
-case TRAP_IF: /* testsuite/gcc.c-torture/execute/20020418-1.c */
+case TRAP_IF:
   return true;
 
 default:
@@ -1401,7 +1402,7 @@ rtl_ssa_dce::is_rtx_pattern_prelive (const_rtx insn)
 }
 }
 
-// Return true if an call INSN can be deleted
+// Return true if an call INSN can be deleted.
 bool
 rtl_ssa_dce::can_delete_call (const_rtx insn)
 {
@@ -1418,6 +1419,7 @@ rtl_ssa_dce::can_delete_call (const_rtx insn)
 && cfun->can_delete_dead_exceptions && insn_nothrow_p (insn);
 }
 
+// Return true if rtx INSN is prelive.
 bool
 rtl_ssa_dce::is_rtx_prelive (const_rtx insn)
 {
@@ -1461,33 +1463,34 @@ rtl_ssa_dce::is_rtx_prelive (const_rtx insn)
 }
 }
 
+// Return true if INSN is prelive - cannot be deleted.
 bool
 rtl_ssa_dce::is_prelive (insn_info *insn)
 {
   // Bb head and end contain artificial uses that we need to mark as prelive.
   // Debug instructions are also prelive, however, they are not added to the
-  // worklist
+  // worklist.
   if (insn->is_bb_head () || insn->is_bb_end () || insn->is_debug_insn ())
 return true;
 
-  // Phi instructions are never prelive
+  // Phi instructions are never prelive.
   if (insn->is_artificial ())
 return false;
 
-  gcc_assert (insn->is_real ());
+  gcc_checking_assert (insn->is_real ());
   for (def_info *def : insn->defs ())
 {
-  // The purpose of this pass is not to eliminate memory stores...
+  // The purpose of this pass is not to eliminate memory stores.
   if (def->is_mem ())
return true;
 
   gcc_checking_assert (def->is_reg ());
 
-  // We should not delete the frame pointer because of the dward2frame pass
+  // We should not delete the frame pointer because of the dward2frame 
pass.
   if (frame_pointer_needed && def->regno () == HARD_FRAME_POINTER_REGNUM)
return true;
 
-  // Skip clobbers, they are handled inside is_rtx_prelive
+  // Skip clobbers, they are handled inside is_rtx_prelive.
   if (def->kind () == access_kind::CLOBBER)
continue;
 
@@ -1509,7 +1512,7 @@ rtl_ssa_dce::is_prelive (insn_info *insn)
 // Mark SET as visited and return true if SET->insn() is not nullptr and SET
 // has not been visited. Otherwise return false.
 bool
-rtl_ssa_dce::mark_if_not_visited (const set_info *set)
+rtl_ssa_dce::mark_if_not_visited (set_info *set)
 {
   insn_info *insn = set->insn ();
   if (!insn)
@@ -1517,20 +1520,20 @@ rtl_ssa_dce::mark_if_not_visited (const set_info *set)
 
   if (insn->is_phi ())
 {
-  const phi_info *phi = static_cast (set);
-  auto uid = phi->uid ();
+  phi_info *phi = static_cast (set);
+  unsigned int uid = phi->uid ();
 
   if (bitmap_bit_p (m_marked_phis, uid))
return false;
 
   bitmap_set_bit (m_marked_phis, uid);
   if (dump_file)
-   fprintf (dump_file, "Phi node %d:%d marked as live\n", set->regno (),
+   fprintf (dump_file, "Phi node %d in insn %d marked as live\n", uid,
 insn->uid ());
 }
   else
 {
-  auto uid = insn->uid ();
+  unsigned int uid = insn->uid ();
   if (m_marked.get_bit (uid))
return false;
 
@@ -1550,8 +1553,6 @@ rtl_ssa_dce::append_not_visited_sets (auto_vec &worklist,
 {
   for (use_info *use : uses)
 {
-  // This seems to be a good idea, however there is a problem is
-  // process_uses_of_deleted_def
   if (use->only_occurs_in_notes ())
continue;
 
@@ -1562,27 +1563,23 @@ rtl_ssa_dce::append_not_visited_sets (auto_vec &worklist,
   if (!mark_if_not_visited (parent_set))
continue;
 
-  // mark_if_not_visited will return false if insn is nullptr
-  // insn_info *insn = parent_set->insn ();
-  // gcc_checking_assert (insn);
-
-  // if (dump_file)
-  //   fprintf (dump_file, "Adding insn %d to worklist\n", insn->uid ());
   worklist.safe_push (parent_set);
 }
 }
 
-// Mark INSN and add its uses to WORKLIST if INSN is not a debug instruction
+// Mark INSN and add its uses to WORKLIST if INSN is not a debug instruction.
 void
 rtl_ssa_dce::mark_prelive_insn (insn_info *insn, auto_vec 
&worklist)
 {
   if (dump_file)
 fprintf (dump_file, "Insn %d marked as prelive\n", insn->uid ());
 
-  // A phi node will never be pre

[gcc r16-2165] aarch64: PR target/120999: Adjust operands for movprfx alternative of NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov via Gcc-cvs
https://gcc.gnu.org/g:b7bd72ce71df5266e7a7039da318e49862389a72

commit r16-2165-gb7bd72ce71df5266e7a7039da318e49862389a72
Author: Kyrylo Tkachov 
Date:   Wed Jul 9 10:04:01 2025 -0700

aarch64: PR target/120999: Adjust operands for movprfx alternative of NBSL 
implementation of NOR

While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
due to its tied operands, the destination of the movprfx cannot be also
a source operand.  But the offending pattern in aarch64-sve2.md tries
to do exactly that for the "=?&w,w,w" alternative and gas warns for the
attached testcase.

This patch adjusts that alternative to avoid taking operand 0 as an input
in the NBSL again.

So for the testcase in the patch we now generate:
nor_z:
movprfx z0, z1
nbslz0.d, z0.d, z2.d, z1.d
ret

instead of the previous:
nor_z:
movprfx z0, z1
nbslz0.d, z0.d, z2.d, z0.d
ret

which generated a gas warning.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov 

gcc/

PR target/120999
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
Adjust movprfx alternative.

gcc/testsuite/

PR target/120999
* gcc.target/aarch64/sve2/pr120999.c: New test.

Diff:
---
 gcc/config/aarch64/aarch64-sve2.md   |  2 +-
 gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c | 17 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 62524f36de65..789ec0dd1a3c 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -1628,7 +1628,7 @@
   "TARGET_SVE2"
   {@ [ cons: =0 , %1 , 2 ; attrs: movprfx ]
  [ w, 0  , w ; *  ] nbsl\t%0.d, %0.d, %2.d, %0.d
- [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, %0.d, 
%2.d, %0.d
+ [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, %0.d, 
%2.d, %1.d
   }
   "&& !CONSTANT_P (operands[3])"
   {
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
new file mode 100644
index ..2dca36aea228
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
@@ -0,0 +1,17 @@
+/* PR target/120999.  */
+/* { dg-do assemble } */
+/* { dg-options "-O2 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+#define NOR(x, y)   (~((x) | (y)))
+
+/*
+** nor_z:
+** movprfx z0, z1
+** nbslz0.d, z0.d, z2.d, z1.d
+** ret
+*/
+svuint64_t nor_z(svuint64_t c, svuint64_t a, svuint64_t b) { return NOR(a, b); 
}
+


[gcc r15-9948] tree-optimization/120780: Support object size for containing objects

2025-07-10 Thread Siddhesh Poyarekar via Gcc-cvs
https://gcc.gnu.org/g:63c4d4f59a92007c6d0f35e4d7aa1a97691306db

commit r15-9948-g63c4d4f59a92007c6d0f35e4d7aa1a97691306db
Author: Siddhesh Poyarekar 
Date:   Thu Jun 26 17:46:00 2025 -0400

tree-optimization/120780: Support object size for containing objects

MEM_REF cast of a subobject to its containing object has negative
offsets, which objsz sees as an invalid access.  Support this use case
by peeking into the structure to validate that the containing object
indeed contains a type of the subobject at that offset and if present,
adjust the wholesize for the object to allow the negative offset.

gcc/ChangeLog:

PR tree-optimization/120780
* tree-object-size.cc (inner_at_offset,
get_wholesize_for_memref): New functions.
(addr_object_size): Call get_wholesize_for_memref.

gcc/testsuite/ChangeLog:

PR tree-optimization/120780
* gcc.dg/builtin-dynamic-object-size-pr120780.c: New test case.

Signed-off-by: Siddhesh Poyarekar 
(cherry picked from commit 72e85d46472716e670cbe6e967109473b8d12d38)

Diff:
---
 .../gcc.dg/builtin-dynamic-object-size-pr120780.c  | 233 +
 gcc/tree-object-size.cc|  90 +++-
 2 files changed, 322 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
new file mode 100644
index ..0d6593ec8289
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
@@ -0,0 +1,233 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+typedef __SIZE_TYPE__ size_t;
+#define NUM_MCAST_RATE 6
+
+#define MIN(a,b) ((a) < (b) ? (a) : (b))
+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+
+struct inner
+{
+  int dummy[4];
+};
+
+struct container
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  struct inner mesh;
+};
+
+static void
+test1_child (struct inner *ifmsh, size_t expected)
+{ 
+  struct container *sdata =
+(struct container *) ((void *) ifmsh
+ - __builtin_offsetof (struct container, mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (&sdata->mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test1 (size_t sz)
+{
+  struct container *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->mesh;
+
+  test1_child (ifmsh,
+  (sz > sizeof (sdata->mcast_rate)
+   ? sz - sizeof (sdata->mcast_rate) : 0));
+
+  __builtin_free (sdata);
+}
+
+struct container2
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  union
+{
+  int dummy;
+  double dbl;
+  struct inner mesh;
+} u;
+};
+
+static void
+test2_child (struct inner *ifmsh, size_t sz)
+{ 
+  struct container2 *sdata =
+(struct container2 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container2, u.mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  size_t diff = sizeof (*sdata) - sz;
+  size_t expected = MIN(sizeof (double), MAX (sizeof (sdata->u), diff) - diff);
+
+  if (__builtin_dynamic_object_size (&sdata->u.dbl, 1) != expected)
+FAIL ();
+
+  expected = MAX (sizeof (sdata->u.mesh), diff) - diff;
+  if (__builtin_dynamic_object_size (&sdata->u.mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test2 (size_t sz)
+{
+  struct container2 *sdata = __builtin_malloc (sz);
+  struct inner *ifmsh = &sdata->u.mesh;
+
+  test2_child (ifmsh, sz);;
+
+  __builtin_free (sdata);
+}
+
+struct container3
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  char mesh[8];
+};
+
+static void
+test3_child (char ifmsh[], size_t expected)
+{ 
+  struct container3 *sdata =
+(struct container3 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container3, mesh));
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_rate))
+FAIL ();
+
+  if (__builtin_dynamic_object_size (sdata->mesh, 1) != expected)
+FAIL ();
+}
+
+void
+__attribute__((noinline))
+test3 (size_t sz)
+{
+  struct container3 *sdata = __builtin_malloc (sz);
+  char *ifmsh = sdata->mesh;
+  size_t diff = sizeof (*sdata) - sz;
+
+  test3_child (ifmsh, MAX(sizeof (sdata->mesh), diff) - diff);
+
+  __builtin_free (sdata);
+}
+
+
+struct container4
+{
+  int mcast_rate[NUM_MCAST_RATE];
+  struct
+{
+  int dummy;
+  struct inner mesh;
+} s;
+};
+
+static void
+test4_child (struct inner *ifmsh, size_t expected)
+{ 
+  struct container4 *sdata =
+(struct container4 *) ((void *) ifmsh
+  - __builtin_offsetof (struct container4, s.mesh));
+
+
+  if (__builtin_dynamic_object_size (sdata->mcast_rate, 1)
+  != sizeof (sdata->mcast_ra

[gcc r16-2166] Pass SLP node down to cost hook for reduction cost

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b57c6b5d27dd1840e9d466a5717476280287a322

commit r16-2166-gb57c6b5d27dd1840e9d466a5717476280287a322
Author: Richard Biener 
Date:   Thu Jul 10 10:08:23 2025 +0200

Pass SLP node down to cost hook for reduction cost

The following arranges vector reduction costs to hand down the
SLP node (of the reduction stmt) to the cost hooks, not only the
stmt_info.  This also avoids accessing STMT_VINFO_VECTYPE of an
unrelated stmt to the node that is subject to code generation.

* tree-vect-loop.cc (vect_model_reduction_cost): Get SLP
node instead of stmt_info and use that when recording costs.

Diff:
---
 gcc/tree-vect-loop.cc | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6f9765b54594..7b260c34a846 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5001,7 +5001,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 
 static void
 vect_model_reduction_cost (loop_vec_info loop_vinfo,
-  stmt_vec_info stmt_info, internal_fn reduc_fn,
+  slp_tree node, internal_fn reduc_fn,
   vect_reduction_type reduction_type,
   int ncopies, stmt_vector_for_cost *cost_vec)
 {
@@ -5017,9 +5017,10 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
   if (reduction_type == COND_REDUCTION)
 ncopies *= 2;
 
-  vectype = STMT_VINFO_VECTYPE (stmt_info);
+  vectype = SLP_TREE_VECTYPE (node);
   mode = TYPE_MODE (vectype);
-  stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
+  stmt_vec_info orig_stmt_info
+= vect_orig_stmt (SLP_TREE_REPRESENTATIVE (node));
 
   gimple_match_op op;
   if (!gimple_extract_op (orig_stmt_info->stmt, &op))
@@ -5037,16 +5038,16 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
   if (reduc_fn != IFN_LAST)
/* Count one reduction-like operation per vector.  */
inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
-   stmt_info, 0, vect_body);
+   node, 0, vect_body);
   else
{
  /* Use NELEMENTS extracts and NELEMENTS scalar ops.  */
  unsigned int nelements = ncopies * vect_nunits_for_cost (vectype);
  inside_cost = record_stmt_cost (cost_vec, nelements,
- vec_to_scalar, stmt_info, 0,
+ vec_to_scalar, node, 0,
  vect_body);
  inside_cost += record_stmt_cost (cost_vec, nelements,
-  scalar_stmt, stmt_info, 0,
+  scalar_stmt, node, 0,
   vect_body);
}
 }
@@ -5063,7 +5064,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
/* We need the initial reduction value.  */
prologue_stmts = 1;
   prologue_cost += record_stmt_cost (cost_vec, prologue_stmts,
-scalar_to_vec, stmt_info, 0,
+scalar_to_vec, node, 0,
 vect_prologue);
 }
 
@@ -5080,24 +5081,24 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
{
  /* An EQ stmt and an COND_EXPR stmt.  */
  epilogue_cost += record_stmt_cost (cost_vec, 2,
-vector_stmt, stmt_info, 0,
+vector_stmt, node, 0,
 vect_epilogue);
  /* Reduction of the max index and a reduction of the found
 values.  */
  epilogue_cost += record_stmt_cost (cost_vec, 2,
-vec_to_scalar, stmt_info, 0,
+vec_to_scalar, node, 0,
 vect_epilogue);
  /* A broadcast of the max value.  */
  epilogue_cost += record_stmt_cost (cost_vec, 1,
-scalar_to_vec, stmt_info, 0,
+scalar_to_vec, node, 0,
 vect_epilogue);
}
  else
{
  epilogue_cost += record_stmt_cost (cost_vec, 1, vector_stmt,
-stmt_info, 0, vect_epilogue);
+node, 0, vect_epilogue);
  epilogue_cost += record_stmt_cost (cost_vec, 1,
-vec_to_scalar, stmt_info, 0,
+vec_to_scalar, node, 0,

[gcc r16-2170] Handle failed gcond pattern gracefully

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:2f2e9bcfb0fd9cbf46e2d0d03b3f32f7df8d4fff

commit r16-2170-g2f2e9bcfb0fd9cbf46e2d0d03b3f32f7df8d4fff
Author: Richard Biener 
Date:   Thu Jul 10 11:26:04 2025 +0200

Handle failed gcond pattern gracefully

SLP analysis of early break conditions asserts pattern recognition
canonicalized all of them.  But the pattern can fail, for example
when vector types cannot be computed.  So be graceful here, so
we don't ICE when we didn't yet compute vector types.

* tree-vect-slp.cc (vect_analyze_slp): Fail for non-canonical
gconds.

Diff:
---
 gcc/tree-vect-slp.cc | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 5ef45fd60f57..ad75386926a8 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5068,9 +5068,15 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size,
  tree args0 = gimple_cond_lhs (stmt);
  tree args1 = gimple_cond_rhs (stmt);
 
- /* These should be enforced by cond lowering.  */
- gcc_assert (gimple_cond_code (stmt) == NE_EXPR);
- gcc_assert (zerop (args1));
+ /* These should be enforced by cond lowering, but if it failed
+bail.  */
+ if (gimple_cond_code (stmt) != NE_EXPR
+ || TREE_TYPE (args0) != boolean_type_node
+ || !integer_zerop (args1))
+   {
+ roots.release ();
+ continue;
+   }
 
  /* An argument without a loop def will be codegened from vectorizing 
the
 root gcond itself.  As such we don't need to try to build an SLP 
tree


[gcc r16-2169] Adjust reduction with conversion SLP build

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:2b99395c312883ccf114476347a7f5174fde436d

commit r16-2169-g2b99395c312883ccf114476347a7f5174fde436d
Author: Richard Biener 
Date:   Thu Jul 10 11:23:59 2025 +0200

Adjust reduction with conversion SLP build

The following adjusts how we set SLP_TREE_VECTYPE for the conversion
node we build when fixing up the reduction with conversion SLP instance.
This should probably see more TLC, but the following avoids relying
on STMT_VINFO_VECTYPE for this.

* tree-vect-slp.cc (vect_build_slp_instance): Do not use
SLP_TREE_VECTYPE to determine the conversion back to the
reduction IV.

Diff:
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 68ef1ddda77a..5ef45fd60f57 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4067,7 +4067,12 @@ vect_build_slp_instance (vec_info *vinfo,
  for (unsigned i = 0; i < group_size; ++i)
scalar_stmts.quick_push (next_info);
  slp_tree conv = vect_create_new_slp_node (scalar_stmts, 1);
- SLP_TREE_VECTYPE (conv) = STMT_VINFO_VECTYPE (next_info);
+ SLP_TREE_VECTYPE (conv)
+   = get_vectype_for_scalar_type (vinfo,
+  TREE_TYPE
+(gimple_assign_lhs
+  (scalar_def)),
+  group_size);
  SLP_TREE_CHILDREN (conv).quick_push (node);
  SLP_INSTANCE_TREE (new_instance) = conv;
  /* We also have to fake this conversion stmt as SLP reduction


[gcc r16-2163] Remove dead code dealing with non-SLP

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:18c48295afb424bfc5c1fbb812e68119e9eb4ccb

commit r16-2163-g18c48295afb424bfc5c1fbb812e68119e9eb4ccb
Author: Richard Biener 
Date:   Thu Jul 10 09:44:50 2025 +0200

Remove dead code dealing with non-SLP

After vect_analyze_loop_operations is gone we can clean up
vect_analyze_stmt as it is no longer called out of SLP context.

* tree-vectorizer.h (vect_analyze_stmt): Remove stmt-info
and need_to_vectorize arguments.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
Adjust.
* tree-vect-stmts.cc (can_vectorize_live_stmts): Remove
stmt_info argument and remove non-SLP path.
(vect_analyze_stmt): Remove stmt_info and need_to_vectorize
argument and prune paths no longer reachable.
(vect_transform_stmt): Adjust.

Diff:
---
 gcc/tree-vect-slp.cc   |   6 +-
 gcc/tree-vect-stmts.cc | 180 ++---
 gcc/tree-vectorizer.h  |   3 +-
 3 files changed, 38 insertions(+), 151 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f97a3635cff1..68ef1ddda77a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7898,8 +7898,6 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
slp_instance node_instance,
stmt_vector_for_cost *cost_vec)
 {
-  stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
-
   /* Calculate the number of vector statements to be created for the scalar
  stmts in this node.  It is the number of scalar elements in one scalar
  iteration (DR_GROUP_SIZE) multiplied by VF divided by the number of
@@ -7928,9 +7926,7 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
   return true;
 }
 
-  bool dummy;
-  return vect_analyze_stmt (vinfo, stmt_info, &dummy,
-   node, node_instance, cost_vec);
+  return vect_analyze_stmt (vinfo, node, node_instance, cost_vec);
 }
 
 static int
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 081dd653fd46..e5971e4a357b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13186,37 +13186,27 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
VEC_STMT_P is as for vectorizable_live_operation.  */
 
 static bool
-can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
+can_vectorize_live_stmts (vec_info *vinfo,
  slp_tree slp_node, slp_instance slp_node_instance,
  bool vec_stmt_p,
  stmt_vector_for_cost *cost_vec)
 {
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
-  if (slp_node)
-{
-  stmt_vec_info slp_stmt_info;
-  unsigned int i;
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
-   {
- if (slp_stmt_info
- && (STMT_VINFO_LIVE_P (slp_stmt_info)
- || (loop_vinfo
- && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
- && STMT_VINFO_DEF_TYPE (slp_stmt_info)
- == vect_induction_def))
- && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
-  slp_node_instance, i,
-  vec_stmt_p, cost_vec))
-   return false;
-   }
+  stmt_vec_info slp_stmt_info;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
+{
+  if (slp_stmt_info
+ && (STMT_VINFO_LIVE_P (slp_stmt_info)
+ || (loop_vinfo
+ && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+ && STMT_VINFO_DEF_TYPE (slp_stmt_info)
+ == vect_induction_def))
+ && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
+  slp_node_instance, i,
+  vec_stmt_p, cost_vec))
+   return false;
 }
-  else if ((STMT_VINFO_LIVE_P (stmt_info)
-   || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-   && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
-  && !vectorizable_live_operation (vinfo, stmt_info,
-   slp_node, slp_node_instance, -1,
-   vec_stmt_p, cost_vec))
-return false;
 
   return true;
 }
@@ -13225,115 +13215,42 @@ can_vectorize_live_stmts (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
 opt_result
 vect_analyze_stmt (vec_info *vinfo,
-  stmt_vec_info stmt_info, bool *need_to_vectorize,
   slp_tree node, slp_instance node_instance,
   stmt_vector_for_cost *cost_vec)
 {
+  stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   enum vect_relevant relevance = STMT_VINFO_RELEVANT (stmt_

[gcc r16-2161] Change bellow in comments to below

2025-07-10 Thread Jakub Jelinek via Libstdc++-cvs
https://gcc.gnu.org/g:0931cea0e3a67e6a17790aeb676c793bccb2039a

commit r16-2161-g0931cea0e3a67e6a17790aeb676c793bccb2039a
Author: Jakub Jelinek 
Date:   Thu Jul 10 10:16:43 2025 +0200

Change bellow in comments to below

While I'm not a native English speaker, I believe all the uses
of bellow (roar/bark/...) in comments in gcc are meant to be
below (beneath/under/...).

2025-07-10  Jakub Jelinek  

gcc/
* tree-vect-loop.cc (scale_profile_for_vect_loop): Comment
spelling fix: bellow -> below.
* ipa-polymorphic-call.cc (record_known_type): Likewise.
* config/i386/x86-tune.def: Likewise.
* config/riscv/vector.md (*vsetvldi_no_side_effects_si_extend):
Likewise.
* tree-scalar-evolution.cc (iv_can_overflow_p): Likewise.
* ipa-devirt.cc (add_type_duplicate): Likewise.
* tree-ssa-loop-niter.cc (maybe_lower_iteration_bound): Likewise.
* gimple-ssa-sccopy.cc: Likewise.
* cgraphunit.cc: Likewise.
* graphite.h (struct poly_dr): Likewise.
* ipa-reference.cc (ignore_edge_p): Likewise.
* tree-ssa-alias.cc (ao_compare::compare_ao_refs): Likewise.
* profile-count.h (profile_probability::probably_reliable_p):
Likewise.
* ipa-inline-transform.cc (inline_call): Likewise.
gcc/ada/
* par-load.adb: Comment spelling fix: bellow -> below.
* libgnarl/s-taskin.ads: Likewise.
gcc/testsuite/
* gfortran.dg/g77/980310-3.f: Comment spelling fix: bellow -> below.
* jit.dg/test-debuginfo.c: Likewise.
libstdc++-v3/
* testsuite/22_locale/codecvt/codecvt_unicode.h
(ucs2_to_utf8_out_error): Comment spelling fix: bellow -> below.
(utf16_to_ucs2_in_error): Likewise.

Diff:
---
 gcc/ada/libgnarl/s-taskin.ads  | 2 +-
 gcc/ada/par-load.adb   | 2 +-
 gcc/cgraphunit.cc  | 2 +-
 gcc/config/i386/x86-tune.def   | 2 +-
 gcc/config/riscv/vector.md | 2 +-
 gcc/gimple-ssa-sccopy.cc   | 2 +-
 gcc/graphite.h | 2 +-
 gcc/ipa-devirt.cc  | 2 +-
 gcc/ipa-inline-transform.cc| 2 +-
 gcc/ipa-polymorphic-call.cc| 2 +-
 gcc/ipa-reference.cc   | 2 +-
 gcc/profile-count.h| 2 +-
 gcc/testsuite/gfortran.dg/g77/980310-3.f   | 2 +-
 gcc/testsuite/jit.dg/test-debuginfo.c  | 2 +-
 gcc/tree-scalar-evolution.cc   | 2 +-
 gcc/tree-ssa-alias.cc  | 2 +-
 gcc/tree-ssa-loop-niter.cc | 2 +-
 gcc/tree-vect-loop.cc  | 2 +-
 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h | 4 ++--
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/libgnarl/s-taskin.ads b/gcc/ada/libgnarl/s-taskin.ads
index d68e199e6262..dbf2e7bf91ec 100644
--- a/gcc/ada/libgnarl/s-taskin.ads
+++ b/gcc/ada/libgnarl/s-taskin.ads
@@ -390,7 +390,7 @@ package System.Tasking is
System_Domain : Dispatching_Domain_Access;
--  All processors belong to default system dispatching domain at start up.
--  We use a pointer which creates the actual variable for the reasons
-   --  explained bellow in Dispatching_Domain_Tasks.
+   --  explained below in Dispatching_Domain_Tasks.
 
Dispatching_Domains_Frozen : Boolean := False;
--  True when the main procedure has been called. Hence, no new dispatching
diff --git a/gcc/ada/par-load.adb b/gcc/ada/par-load.adb
index 96fa7e85938d..4a97f14ffb51 100644
--- a/gcc/ada/par-load.adb
+++ b/gcc/ada/par-load.adb
@@ -83,7 +83,7 @@ procedure Load is
--  withed units and the second round handles Ada 2005 limited-withed units.
--  This is required to allow the low-level circuitry that detects circular
--  dependencies of units the correct notification of errors (see comment
-   --  bellow). This variable is used to indicate that the second round is
+   --  below). This variable is used to indicate that the second round is
--  required.
 
function Same_File_Name_Except_For_Case
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index fa54a59d02b8..8e8d85562b03 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -63,7 +63,7 @@ along with GCC; see the file COPYING3.  If not see
   final assembler is generated.  This is done in the following way. Note
   that with link time optimization the process is split into three
   stages (compile time, linktime analysis and parallel linktime as
-  indicated bellow)

[gcc r16-2162] Comment spelling fix: tunning -> tuning

2025-07-10 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:60a7c817d2deb640e9649825a8e4e05293a7ba2d

commit r16-2162-g60a7c817d2deb640e9649825a8e4e05293a7ba2d
Author: Jakub Jelinek 
Date:   Thu Jul 10 10:23:31 2025 +0200

Comment spelling fix: tunning -> tuning

Kyrylo noticed another spelling bug and like usually, the same mistake
happens in multiple places.

2025-07-10  Jakub Jelinek  

* config/i386/x86-tune.def: Change "Tunning the" to "tuning" in
comment and use semicolon instead of dot in comment.
* loop-unroll.cc (decide_unroll_stupid): Comment spelling fix,
tunning -> tuning.

Diff:
---
 gcc/config/i386/x86-tune.def | 2 +-
 gcc/loop-unroll.cc   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index a039db3cfced..a86cbad281c1 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -31,7 +31,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
- Updating ix86_issue_rate and ix86_adjust_cost in i386.md
- possibly updating ia32_multipass_dfa_lookahead, ix86_sched_reorder
  and ix86_sched_init_global if those tricks are needed.
-- Tunning the flags below. Those are split into sections and each
+- tuning flags below; those are split into sections and each
   section is very roughly ordered by importance.  */
 
 /*/
diff --git a/gcc/loop-unroll.cc b/gcc/loop-unroll.cc
index 6149cecb28de..c80a6cb6cd0c 100644
--- a/gcc/loop-unroll.cc
+++ b/gcc/loop-unroll.cc
@@ -1185,7 +1185,7 @@ decide_unroll_stupid (class loop *loop, int flags)
 
   /* Do not unroll loops with branches inside -- it increases number
  of mispredicts.
- TODO: this heuristic needs tunning; call inside the loop body
+ TODO: this heuristic needs tuning; call inside the loop body
  is also relatively good reason to not unroll.  */
   if (num_loop_branches (loop) > 1)
 {


[gcc r15-9947] aarch64: Add support for NVIDIA GB10

2025-07-10 Thread Kyrylo Tkachov via Gcc-cvs
https://gcc.gnu.org/g:aad37494dc0b96e95501190b93a32ff7c85debfc

commit r15-9947-gaad37494dc0b96e95501190b93a32ff7c85debfc
Author: Kyrylo Tkachov 
Date:   Mon Jun 2 07:08:12 2025 -0700

aarch64: Add support for NVIDIA GB10

This adds support for -mcpu=gb10.  This is a big.LITTLE configuration
involving Cortex-X925 and Cortex-A725 cores.  The appropriate MIDR numbers
are added to detect them in -mcpu=native.  We did not add an
-mcpu=cortex-x925.cortex-a725 option because GB10 does include the crypto
instructions which we want on by default, and the current convention is to 
not
enable such extensions for Arm Cortex cores in -mcpu where they are optional
in the IP.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov 

gcc/

* config/aarch64/aarch64-cores.def (gb10): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.

(cherry picked from commit 9ff6ade24cae5a51d1ee9d9ad4b4a5c682e4a5ed)

Diff:
---
 gcc/config/aarch64/aarch64-cores.def | 3 +++
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 24b7cd362aaf..8040409d2830 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -226,6 +226,9 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, 
BF16, SVE2_BITPERM, RNG,
 /* NVIDIA ('N') cores. */
 AARCH64_CORE("olympus", olympus, cortexa57, V9_2A, (SVE2_BITPERM, RNG, LS64, 
MEMTAG, PROFILE, FAMINMAX, FP8FMA, FP8DOT2, FP8DOT4, LUT, SVE2_AES, SVE2_SHA3, 
SVE2_SM4), neoversev3, 0x4e, 0x10, -1)
 
+/* Armv9-A big.LITTLE processors.  */
+AARCH64_CORE("gb10",  gb10, cortexa57, V9_2A,  (SVE2_BITPERM, SVE2_AES, 
SVE2_SHA3, SVE2_SM4, MEMTAG, PROFILE), cortexx925, 0x41, AARCH64_BIG_LITTLE 
(0xd85, 0xd87), -1)
+
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
 AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A, (), 
generic_armv8_a, 0x0, 0x0, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 982074c2c21e..40ff147d6f83 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexr82ae,applea12,applem1_0,applem1_1,applem1_2,applem1_3,applem2_0,applem2_1,applem2_2,applem2_3,applem3_0,cortexa510,cortexa520,cortexa520ae,cortexa710,cortexa715,cortexa720,cortexa720ae,cortexa725,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cobalt100,neoversen3,neoversev2
 
,grace,neoversev3,neoversev3ae,demeter,olympus,generic,generic_armv8_a,generic_armv9_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexr82ae,applea12,applem1_0,applem1_1,applem1_2,applem1_3,applem2_0,applem2_1,applem2_2,applem2_3,applem3_0,cortexa510,cortexa520,cortexa520ae,cortexa710,cortexa715,cortexa720,cortexa720ae,cortexa725,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cobalt100,neoversen3,neoversev2
 
,grace,neoversev3,neoversev3ae,demeter,olympus,gb10,generic,generic_armv8_a,generic_armv9_a"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 14750aed64db..eca871b93d97 100644
--- a/gcc/doc/invoke.texi

[gcc] Created tag 'releases/gcc-12.5.0'

2025-07-10 Thread Richard Biener via Gcc-cvs
The signed tag 'releases/gcc-12.5.0' was created pointing to:

 c17d40bb3778... Update ChangeLog and version files for release

Tagger: Richard Biener 
Date: Fri Jul 11 06:34:05 2025 +

GCC 12.5.0 release


[gcc r16-2188] Stop updating gcc-12 branch

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:14076f15bf618d8febd1e4c6a86995f057408de8

commit r16-2188-g14076f15bf618d8febd1e4c6a86995f057408de8
Author: Richard Biener 
Date:   Fri Jul 11 08:32:26 2025 +0200

Stop updating gcc-12 branch

contrib/
* gcc-changelog/git_update_version.py: Stop updating gcc-12
branch.

Diff:
---
 contrib/gcc-changelog/git_update_version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_update_version.py 
b/contrib/gcc-changelog/git_update_version.py
index aa9adee58fef..b3ea33bb5161 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -85,7 +85,7 @@ def prepend_to_changelog_files(repo, folder, git_commit, 
add_to_git):
 repo.git.add(full_path)
 
 
-active_refs = ['master', 'releases/gcc-12',
+active_refs = ['master',
'releases/gcc-13', 'releases/gcc-14', 'releases/gcc-15']
 
 parser = argparse.ArgumentParser(description='Update DATESTAMP and generate '


[gcc r12-11261] Update ChangeLog and version files for release

2025-07-10 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c17d40bb3778bca5e81595f033df9222b66658eb

commit r12-11261-gc17d40bb3778bca5e81595f033df9222b66658eb
Author: Richard Biener 
Date:   Fri Jul 11 06:33:59 2025 +

Update ChangeLog and version files for release

Diff:
---
 ChangeLog | 4 
 c++tools/ChangeLog| 4 
 config/ChangeLog  | 4 
 contrib/ChangeLog | 4 
 contrib/header-tools/ChangeLog| 4 
 contrib/reghunt/ChangeLog | 4 
 contrib/regression/ChangeLog  | 4 
 fixincludes/ChangeLog | 4 
 gcc/BASE-VER  | 2 +-
 gcc/ChangeLog | 4 
 gcc/ada/ChangeLog | 4 
 gcc/analyzer/ChangeLog| 4 
 gcc/c-family/ChangeLog| 4 
 gcc/c/ChangeLog   | 4 
 gcc/cp/ChangeLog  | 4 
 gcc/d/ChangeLog   | 4 
 gcc/fortran/ChangeLog | 4 
 gcc/go/ChangeLog  | 4 
 gcc/jit/ChangeLog | 4 
 gcc/lto/ChangeLog | 4 
 gcc/objc/ChangeLog| 4 
 gcc/objcp/ChangeLog   | 4 
 gcc/po/ChangeLog  | 4 
 gcc/testsuite/ChangeLog   | 4 
 gnattools/ChangeLog   | 4 
 gotools/ChangeLog | 4 
 include/ChangeLog | 4 
 intl/ChangeLog| 4 
 libada/ChangeLog  | 4 
 libatomic/ChangeLog   | 4 
 libbacktrace/ChangeLog| 4 
 libcc1/ChangeLog  | 4 
 libcody/ChangeLog | 4 
 libcpp/ChangeLog  | 4 
 libcpp/po/ChangeLog   | 4 
 libdecnumber/ChangeLog| 4 
 libffi/ChangeLog  | 4 
 libgcc/ChangeLog  | 4 
 libgcc/config/avr/libf7/ChangeLog | 4 
 libgcc/config/libbid/ChangeLog| 4 
 libgfortran/ChangeLog | 4 
 libgomp/ChangeLog | 4 
 libiberty/ChangeLog   | 4 
 libitm/ChangeLog  | 4 
 libobjc/ChangeLog | 4 
 liboffloadmic/ChangeLog   | 4 
 libphobos/ChangeLog   | 4 
 libquadmath/ChangeLog | 4 
 libsanitizer/ChangeLog| 4 
 libssp/ChangeLog  | 4 
 libstdc++-v3/ChangeLog| 4 
 libvtv/ChangeLog  | 4 
 lto-plugin/ChangeLog  | 4 
 maintainer-scripts/ChangeLog  | 4 
 zlib/ChangeLog| 4 
 55 files changed, 217 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 3e0631117ef8..9ccf9972fcfa 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2024-06-20  Release Manager
 
* GCC 12.4.0 released.
diff --git a/c++tools/ChangeLog b/c++tools/ChangeLog
index 5e961ce0c226..dd92a5011707 100644
--- a/c++tools/ChangeLog
+++ b/c++tools/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2025-04-20  John David Anglin  
 
PR other/107616
diff --git a/config/ChangeLog b/config/ChangeLog
index 40932d6a39af..44e0c840e1fa 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2024-06-20  Release Manager
 
* GCC 12.4.0 released.
diff --git a/contrib/ChangeLog b/contrib/ChangeLog
index c412d61b5795..1ee43d73955d 100644
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2024-06-20  Release Manager
 
* GCC 12.4.0 released.
diff --git a/contrib/header-tools/ChangeLog b/contrib/header-tools/ChangeLog
index c834c0a87c16..0665fbfd0f91 100644
--- a/contrib/header-tools/ChangeLog
+++ b/contrib/header-tools/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2024-06-20  Release Manager
 
* GCC 12.4.0 released.
diff --git a/contrib/reghunt/ChangeLog b/contrib/reghunt/ChangeLog
index 1de203aa1b06..8a77aeae4f56 100644
--- a/contrib/reghunt/ChangeLog
+++ b/contrib/reghunt/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2024-06-20  Release Manager
 
* GCC 12.4.0 released.
diff --git a/contrib/regression/ChangeLog b/contrib/regression/ChangeLog
index fbea8904965c..6845969a18c2 100644
--- a/contrib/regression/ChangeLog
+++ b/contrib/regression/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.5.0 released.
+
 2024-06-20  Release Manager
 
* GCC 12.4.0 released.
diff --git a/fixincludes/ChangeLog b/fixincludes/ChangeLog
index d693ea068ae1..efc0bdf10c5c 100644
--- a/fixincludes/ChangeLog
+++ b/fixincludes/ChangeLog
@@ -1,3 +1,7 @@
+2025-07-11  Release Manager
+
+   * GCC 12.

[gcc r16-2185] c++: Fix up final handling in C++98 [PR120628]

2025-07-10 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:8f063b40e5b8f23cb89fee21afaa71deedbdf2aa

commit r16-2185-g8f063b40e5b8f23cb89fee21afaa71deedbdf2aa
Author: Jakub Jelinek 
Date:   Thu Jul 10 23:47:42 2025 +0200

c++: Fix up final handling in C++98 [PR120628]

The following patch is on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686210.html
patch which stopped treating override as conditional keyword in
class properties.
This PR mentions another problem; we emit a bogus warning on code like
struct C {}; struct C final = {};
in C++98.  In this case we parse final as conditional keyword in C++
(including pedwarn) but the caller then immediately aborts the tentative
parse because it isn't followed by { nor (in some cases) : .
I think we certainly shouldn't pedwarn on it, but I think we even shouldn't
warn for it say for -Wc++11-compat, because we don't actually treat the
identifier as conditional keyword even in C++11 and later.
The patch only does this if final is the only class property conditional
keyword, if one uses
struct S __final final __final = {};
one gets the warning and duplicate diagnostics and later parsing errors.

2025-07-10  Jakub Jelinek  

PR c++/120628
* parser.cc (cp_parser_elaborated_type_specifier): Use
cp_parser_nth_token_starts_class_definition_p with extra argument 1
instead of cp_parser_next_token_starts_class_definition_p.
(cp_parser_class_property_specifier_seq_opt): For final conditional
keyword in C++98 check if the token after it isn't
cp_parser_nth_token_starts_class_definition_p nor CPP_NAME and in
that case break without consuming it nor warning.
(cp_parser_class_head): Use
cp_parser_nth_token_starts_class_definition_p with extra argument 1
instead of cp_parser_next_token_starts_class_definition_p.
(cp_parser_next_token_starts_class_definition_p): Renamed to ...
(cp_parser_nth_token_starts_class_definition_p): ... this.  Add N
argument.  Use cp_lexer_peek_nth_token instead of 
cp_lexer_peek_token.

* g++.dg/cpp0x/final1.C: New test.
* g++.dg/cpp0x/final2.C: New test.
* g++.dg/cpp0x/override6.C: New test.

Diff:
---
 gcc/cp/parser.cc   | 21 ++---
 gcc/testsuite/g++.dg/cpp0x/final1.C| 11 +++
 gcc/testsuite/g++.dg/cpp0x/final2.C| 26 ++
 gcc/testsuite/g++.dg/cpp0x/override6.C | 26 ++
 4 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1f58425a70b6..21bec72c7961 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -3091,8 +3091,8 @@ static cp_token *cp_parser_require_keyword
   (cp_parser *, enum rid, required_token);
 static bool cp_parser_token_starts_function_definition_p
   (cp_token *);
-static bool cp_parser_next_token_starts_class_definition_p
-  (cp_parser *);
+static bool cp_parser_nth_token_starts_class_definition_p
+  (cp_parser *, size_t);
 static bool cp_parser_next_token_ends_template_argument_p
   (cp_parser *);
 static bool cp_parser_nth_token_starts_template_argument_list_p
@@ -22031,7 +22031,7 @@ cp_parser_elaborated_type_specifier (cp_parser* parser,
 
  bool template_p =
(template_parm_lists_apply
-&& (cp_parser_next_token_starts_class_definition_p (parser)
+&& (cp_parser_nth_token_starts_class_definition_p (parser, 1)
 || cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON)));
  /* An unqualified name was used to reference this type, so
 there were no qualifying templates.  */
@@ -28095,6 +28095,13 @@ cp_parser_class_property_specifier_seq_opt (cp_parser 
*parser)
break;
   if (id_equal (token->u.value, "final"))
{
+ /* For C++98, quietly ignore final in e.g.
+struct S final = 24;  */
+ if (cxx_dialect == cxx98
+ && virt_specifiers == VIRT_SPEC_UNSPECIFIED
+ && !cp_parser_nth_token_starts_class_definition_p (parser, 2)
+ && !cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME))
+   break;
  maybe_warn_cpp0x (CPP0X_OVERRIDE_CONTROLS);
  virt_specifier = VIRT_SPEC_FINAL;
}
@@ -28318,7 +28325,7 @@ cp_parser_class_head (cp_parser* parser,
  class-head, since a class-head only appears as part of a
  class-specifier.  We have to detect this situation before calling
  xref_tag, since that has irreversible side-effects.  */
-  if (!cp_parser_next_token_starts_class_definition_p (parser))
+  if (!cp_parser_nth_token_starts_class_definition_p (parser, 1))
 {
   cp_parser_error (parser, "expected %<{%> or %<:%>");
   type = error_mark_node;
@@ -35696,15 +35703,15 @@ cp_parser_token_starts_function_definition_p 
(cp_token* 

[gcc r16-2186] c++: Save 8 further bytes from lang_type allocations

2025-07-10 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:bdb0a6be69b3b3e8f94aa72a9263810a80cb9a5f

commit r16-2186-gbdb0a6be69b3b3e8f94aa72a9263810a80cb9a5f
Author: Jakub Jelinek 
Date:   Fri Jul 11 00:05:23 2025 +0200

c++: Save 8 further bytes from lang_type allocations

The following patch implements the
/* FIXME reuse another field?  */
comment on the lambda_expr member.
I think (and asserts in the patch seem to confirm) CLASSTYPE_KEY_METHOD
is only ever non-NULL for TYE_POLYMORPHIC_P and on the other side
CLASSTYPE_LAMBDA_EXPR is only used on closure types which are never
polymorphic.

So, the patch just uses one member for both, with the accessor macros
changed to be no longer lvalues and adding SET_* variants of the macros
for setters.

2025-07-11  Jakub Jelinek  

* cp-tree.h (struct lang_type): Add comment before key_method.
Remove lambda_expr.
(CLASSTYPE_KEY_METHOD): Give NULL_TREE if not TYPE_POLYMORPHIC_P.
(SET_CLASSTYPE_KEY_METHOD): Define.
(CLASSTYPE_LAMBDA_EXPR): Give NULL_TREE if TYPE_POLYMORPHIC_P.
Use key_method member instead of lambda_expr.
(SET_CLASSTYPE_LAMBDA_EXPR): Define.
* class.cc (determine_key_method): Use SET_CLASSTYPE_KEY_METHOD
macro.
* decl.cc (xref_tag): Use SET_CLASSTYPE_LAMBDA_EXPR macro.
* lambda.cc (begin_lambda_type): Likewise.
* module.cc (trees_in::read_class_def): Use 
SET_CLASSTYPE_LAMBDA_EXPR
and SET_CLASSTYPE_KEY_METHOD macros, assert lambda is NULL if
TYPE_POLYMORPHIC_P and otherwise assert key_method is NULL.

Diff:
---
 gcc/cp/class.cc  |  2 +-
 gcc/cp/cp-tree.h | 19 +++
 gcc/cp/decl.cc   |  2 +-
 gcc/cp/lambda.cc |  2 +-
 gcc/cp/module.cc | 10 --
 5 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 9a41c00788a8..151ee2bc4714 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -7452,7 +7452,7 @@ determine_key_method (tree type)
&& ! DECL_DECLARED_INLINE_P (method)
&& ! DECL_PURE_VIRTUAL_P (method))
   {
-   CLASSTYPE_KEY_METHOD (type) = method;
+   SET_CLASSTYPE_KEY_METHOD (type, method);
break;
   }
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 43705733d514..90816281224d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2519,11 +2519,11 @@ struct GTY(()) lang_type {
   vec *pure_virtuals;
   tree friend_classes;
   vec * GTY((reorder ("resort_type_member_vec"))) members;
+  /* CLASSTYPE_KEY_METHOD for TYPE_POLYMORPHIC_P types, CLASSTYPE_LAMBDA_EXPR
+ otherwise.  */
   tree key_method;
   tree decl_list;
   tree befriending_classes;
-  /* FIXME reuse another field?  */
-  tree lambda_expr;
   union maybe_objc_info {
 /* If not c_dialect_objc, this part is not even allocated.  */
 char GTY((tag ("0"))) non_objc;
@@ -2646,7 +2646,13 @@ struct GTY(()) lang_type {
 /* The member function with which the vtable will be emitted:
the first noninline non-pure-virtual member function.  NULL_TREE
if there is no key function or if this is a class template */
-#define CLASSTYPE_KEY_METHOD(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->key_method)
+#define CLASSTYPE_KEY_METHOD(NODE) \
+  (TYPE_POLYMORPHIC_P (NODE)   \
+   ? LANG_TYPE_CLASS_CHECK (NODE)->key_method  \
+   : NULL_TREE)
+#define SET_CLASSTYPE_KEY_METHOD(NODE, VALUE) \
+  (gcc_checking_assert (TYPE_POLYMORPHIC_P (NODE)),\
+   LANG_TYPE_CLASS_CHECK (NODE)->key_method = (VALUE))
 
 /* Vector of members.  During definition, it is unordered and only
member functions are present.  After completion it is sorted and
@@ -2778,7 +2784,12 @@ struct GTY(()) lang_type {
 
 /* The associated LAMBDA_EXPR that made this class.  */
 #define CLASSTYPE_LAMBDA_EXPR(NODE) \
-  (LANG_TYPE_CLASS_CHECK (NODE)->lambda_expr)
+  (TYPE_POLYMORPHIC_P (NODE)   \
+   ? NULL_TREE \
+   : LANG_TYPE_CLASS_CHECK (NODE)->key_method)
+#define SET_CLASSTYPE_LAMBDA_EXPR(NODE, VALUE) \
+  (gcc_checking_assert (!TYPE_POLYMORPHIC_P (NODE)),   \
+   LANG_TYPE_CLASS_CHECK (NODE)->key_method = (VALUE))
 /* The extra mangling scope for this closure type.  */
 #define LAMBDA_TYPE_EXTRA_SCOPE(NODE) \
   (LAMBDA_EXPR_EXTRA_SCOPE (CLASSTYPE_LAMBDA_EXPR (NODE)))
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 664dbbec2796..843f0e4fd160 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17289,7 +17289,7 @@ xref_tag (enum tag_types tag_code, tree name,
   if (IDENTIFIER_LAMBDA_P (name))
/* Mark it as a lambda type right now.  Our caller will
   correct the value.  */
-   CLASSTYPE_LAMBDA_EXPR (t) = error_mark_node;
+   SET_CLASSTYPE_LAMBDA_EXPR (t, error_mark_node);
   t = pushtag (name, t, how);
 }
   else
diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
ind

[gcc r16-2184] c++: Don't incorrectly reject override after class head name [PR120569]

2025-07-10 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:bcb51fe0e26bed7e2c44c4822ca6dec135ba61f3

commit r16-2184-gbcb51fe0e26bed7e2c44c4822ca6dec135ba61f3
Author: Jakub Jelinek 
Date:   Thu Jul 10 23:41:56 2025 +0200

c++: Don't incorrectly reject override after class head name [PR120569]

While the

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2786r13.html#c03-compatibility-changes-for-annex-c-diff.cpp03.dcl.dcl
hunk dropped because
struct C {}; struct C final {};
is actually not valid C++98 (which didn't have list initialization), we
actually also reject
struct D {}; struct D override {};
and that IMHO is valid all the way from C++11 onwards.
Especially in the light of P2786R13 adding new contextual keywords, I think
it is better to use a separate routine for parsing the
class-virt-specifier-seq (in C++11, there was export next to final),
class-virt-specifier (in C++14 to C++23) and
class-property-specifier-seq (in C++26) instead of using the same function
for virt-specifier-seq and class-property-specifier-seq.

2025-07-10  Jakub Jelinek  

PR c++/120569
* parser.cc (cp_parser_class_property_specifier_seq_opt): New
function.
(cp_parser_class_head): Use it instead of
cp_parser_property_specifier_seq_opt.  Don't diagnose
VIRT_SPEC_OVERRIDE here.  Formatting fix.

* g++.dg/cpp0x/override2.C: Expect different diagnostics with
override or duplicate final.
* g++.dg/cpp0x/override5.C: New test.
* g++.dg/cpp0x/duplicate1.C: Expect different diagnostics with
duplicate final.

Diff:
---
 gcc/cp/parser.cc| 68 ++---
 gcc/testsuite/g++.dg/cpp0x/duplicate1.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/override2.C  |  6 +--
 gcc/testsuite/g++.dg/cpp0x/override5.C  | 26 +
 4 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d96fdf8f9271..1f58425a70b6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -28068,6 +28068,57 @@ cp_parser_class_specifier (cp_parser* parser)
   return type;
 }
 
+/* Parse an (optional) class-property-specifier-seq.
+
+   class-property-specifier-seq:
+ class-property-specifier class-property-specifier-seq [opt]
+
+   class-property-specifier:
+ final
+
+   Returns a bitmask representing the class-property-specifiers.  */
+
+static cp_virt_specifiers
+cp_parser_class_property_specifier_seq_opt (cp_parser *parser)
+{
+  cp_virt_specifiers virt_specifiers = VIRT_SPEC_UNSPECIFIED;
+
+  while (true)
+{
+  cp_token *token;
+  cp_virt_specifiers virt_specifier;
+
+  /* Peek at the next token.  */
+  token = cp_lexer_peek_token (parser->lexer);
+  /* See if it's a class-property-specifier.  */
+  if (token->type != CPP_NAME)
+   break;
+  if (id_equal (token->u.value, "final"))
+   {
+ maybe_warn_cpp0x (CPP0X_OVERRIDE_CONTROLS);
+ virt_specifier = VIRT_SPEC_FINAL;
+   }
+  else if (id_equal (token->u.value, "__final"))
+   virt_specifier = VIRT_SPEC_FINAL;
+  else
+   break;
+
+  if (virt_specifiers & virt_specifier)
+   {
+ gcc_rich_location richloc (token->location);
+ richloc.add_fixit_remove ();
+ error_at (&richloc, "duplicate %qD specifier", token->u.value);
+ cp_lexer_purge_token (parser->lexer);
+   }
+  else
+   {
+ cp_lexer_consume_token (parser->lexer);
+ virt_specifiers |= virt_specifier;
+   }
+}
+  return virt_specifiers;
+}
+
 /* Parse a class-head.
 
class-head:
@@ -28258,12 +28309,10 @@ cp_parser_class_head (cp_parser* parser,
   pop_deferring_access_checks ();
 
   if (id)
-{
-  cp_parser_check_for_invalid_template_id (parser, id,
-  class_key,
-   type_start_token->location);
-}
-  virt_specifiers = cp_parser_virt_specifier_seq_opt (parser);
+cp_parser_check_for_invalid_template_id (parser, id,
+class_key,
+type_start_token->location);
+  virt_specifiers = cp_parser_class_property_specifier_seq_opt (parser);
 
   /* If it's not a `:' or a `{' then we can't really be looking at a
  class-head, since a class-head only appears as part of a
@@ -28279,13 +28328,6 @@ cp_parser_class_head (cp_parser* parser,
   /* At this point, we're going ahead with the class-specifier, even
  if some other problem occurs.  */
   cp_parser_commit_to_tentative_parse (parser);
-  if (virt_specifiers & VIRT_SPEC_OVERRIDE)
-{
-  cp_parser_error (parser,
-   "cannot specify % for a class");
-  type = error_mark_node;
-  goto out;
-}
   /* Issue the error about the overly-qualified name now.  */
   if (qualified

[gcc r16-2181] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Qing Zhao via Gcc-cvs
https://gcc.gnu.org/g:e53f481141f1415847329f3bef906e5eb91226ad

commit r16-2181-ge53f481141f1415847329f3bef906e5eb91226ad
Author: Qing Zhao 
Date:   Wed Jul 9 21:31:55 2025 +

Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

This is an improvement to the design of internal function .ACCESS_WITH_SIZE.

Currently, the .ACCESS_WITH_SIZE is designed as:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
 TYPE_OF_SIZE, ACCESS_MODE, TYPE_SIZE_UNIT for element)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE 
represents
 0: the number of bytes.
 1: the number of the elements of the object type;
   4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as 
the
 TYPE of the object referenced by REF_TO_SIZE
   5th argument ACCESS_MODE:
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write
   6th argument: The TYPE_SIZE_UNIT of the element TYPE of the FAM when 3rd
  argument is 1. NULL when 3rd argument is 0.

Among the 6 arguments:
 A. The 3rd argument CLASS_OF_SIZE is not needed. If the REF_TO_SIZE 
represents
the number of bytes, simply pass 1 to the TYPE_SIZE_UNIT argument.
 B. The 4th and the 5th arguments can be combined into 1 argument, whose 
TYPE
represents the TYPE_OF_SIZE, and the constant value represents the
ACCESS_MODE.

As a result, the new design of the .ACCESS_WITH_SIZE is:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE,
 TYPE_OF_SIZE + ACCESS_MODE, TYPE_SIZE_UNIT for element)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument TYPE_OF_SIZE + ACCESS_MODE: An integer constant with a 
pointer
 TYPE.
 The pointee TYPE of the pointer TYPE is the TYPE of the object 
referenced
by REF_TO_SIZE.
 The integer constant value represents the ACCESS_MODE:
0: none
1: read_only
2: write_only
3: read_write
   4th argument: The TYPE_SIZE_UNIT of the element TYPE of the array.

gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
of the arguments per the new design.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Update 
comments.
Adjust the arguments per the new design.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update comments.
* internal-fn.def (ACCESS_WITH_SIZE): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update 
comments.
Adjust the arguments per the new design.

Diff:
---
 gcc/c-family/c-ubsan.cc | 10 ++
 gcc/c/c-typeck.cc   | 18 +-
 gcc/internal-fn.cc  | 28 +---
 gcc/internal-fn.def |  2 +-
 gcc/tree-object-size.cc | 34 +-
 5 files changed, 38 insertions(+), 54 deletions(-)

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 78b786854699..a4dc31066afb 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -397,8 +397,7 @@ get_bound_from_access_with_size (tree call)
 return NULL_TREE;
 
   tree ref_to_size = CALL_EXPR_ARG (call, 1);
-  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
-  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree type = TREE_TYPE (TREE_TYPE (CALL_EXPR_ARG (call, 2)));
   tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
   build_int_cst (ptr_type_node, 0));
   /* If size is negative value, treat it as zero.  */
@@ -410,12 +409,7 @@ get_bound_from_access_with_size (tree call)
build_zero_cst (type), size);
   }
 
-  /* Only when class_of_size is 1, i.e, the number of the elements of
- the object type, return the size.  */
-  if (class_of_size != 1)
-return NULL_TREE;
-  else
-size = fold_convert (sizetype, size);
+  size = fold_convert (sizetype, size);
 
   return size;
 }
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index de3d6c78db88..9a5eb0da3a1d 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2982,7 +2982,7 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, (* TYPE_OF_SIZE)0,
   

[gcc r16-2180] Passing TYPE_SIZE_UNIT of the element as the 6th argument to .ACCESS_WITH_SIZE (PR121000)

2025-07-10 Thread Qing Zhao via Gcc-cvs
https://gcc.gnu.org/g:1cf8d08a977f528c6e81601b7586ccf8bc8aa2a6

commit r16-2180-g1cf8d08a977f528c6e81601b7586ccf8bc8aa2a6
Author: Qing Zhao 
Date:   Wed Jul 9 20:10:30 2025 +

Passing TYPE_SIZE_UNIT of the element as the 6th argument to 
.ACCESS_WITH_SIZE (PR121000)

The size of the element of the FAM _cannot_ reliably depends on the original
TYPE of the FAM that we passed as the 6th parameter to the 
.ACCESS_WITH_SIZE:

 TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (gimple_call_arg (call, 5

when the element of the FAM has a variable length type. Since the variable
 that represents TYPE_SIZE_UNIT has no explicit usage in the original IL,
compiler transformations (such as DSE) that are applied before object_size
phase might eliminate the whole definition to the variable that represents
the TYPE_SIZE_UNIT of the element of the FAM.

In order to resolve this issue, instead of passing the original TYPE of the
FAM as the 6th argument to .ACCESS_WITH_SIZE, we should explicitly pass the
original TYPE_SIZE_UNIT of the element TYPE of the FAM as the 6th argument
to the call to  .ACCESS_WITH_SIZE.

PR middle-end/121000

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Update 
comments.
Pass TYPE_SIZE_UNIT of the element as the 6th argument.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update comments.
* internal-fn.def (ACCESS_WITH_SIZE): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update 
comments.
Get the element_size from the 6th argument directly.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-pr121000.c: New test.

Diff:
---
 gcc/c/c-typeck.cc  | 10 +++--
 gcc/internal-fn.cc | 10 ++---
 gcc/internal-fn.def|  2 +-
 .../gcc.dg/flex-array-counted-by-pr121000.c| 43 ++
 gcc/tree-object-size.cc| 28 +++---
 5 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e24629be918b..de3d6c78db88 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2983,7 +2983,7 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
to:
 
(*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
-   (TYPE_OF_ARRAY *)0))
+   TYPE_SIZE_UNIT for element)
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2995,8 +2995,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
-   The 6th argument of the call is a constant 0 with the pointer TYPE
-   to the original flexible array type.
+   The 6th argument of the call is the TYPE_SIZE_UNIT of the element TYPE
+   of the FAM.
 
   */
 static tree
@@ -3007,6 +3007,8 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
   /* The result type of the call is a pointer to the flexible array type.  */
   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
+  tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ref)));
+
   tree first_param
 = c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
   tree second_param
@@ -3020,7 +3022,7 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
build_int_cst (integer_type_node, -1),
-   build_int_cst (result_type, 0));
+   element_size);
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index ed6ef0e4c647..c6e705cb6f5e 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3443,7 +3443,7 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
 
 /* Expand the IFN_ACCESS_WITH_SIZE function:
ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
-TYPE_OF_SIZE, ACCESS_MODE)
+TYPE_OF_SIZE, ACCESS_MODE, TYPE_SIZE_UNIT for element)
which returns the REF_TO_OBJ same as the 1st argument;
 
1st argument REF_TO_OBJ: The reference to the object;
@@ -3451,16 +3451,16 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE 
represents
  0: the number of 

[gcc] Deleted branch 'mikael/heads/stabilisation_descriptor_v01' in namespace 'refs/users'

2025-07-10 Thread Mikael Morin via Gcc-cvs
The branch 'mikael/heads/stabilisation_descriptor_v01' in namespace 
'refs/users' was deleted.
It previously pointed to:

 7e72a078ae71... fortran: Amend descriptor bounds init if unallocated

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  7e72a07... fortran: Amend descriptor bounds init if unallocated
  f6115ed... fortran: Delay evaluation of array bounds after reallocatio
  a1e8410... fortran: generate array reallocation out of loops
  41b730b... Correction array_constructor_1


[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] fortran: Generate array reallocation out of loops

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:3d82df83f96e3c0bef0fe042bdcf1a2f71b78045

commit 3d82df83f96e3c0bef0fe042bdcf1a2f71b78045
Author: Mikael Morin 
Date:   Thu Jul 10 21:32:46 2025 +0200

fortran: Generate array reallocation out of loops

Regression tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Generate the array reallocation on assignment code before entering the
scalarization loops.  This doesn't move the generated code itself,
which was already put before the outermost loop, but only changes the
current scope at the time the code is generated.  This is a prerequisite
for a followup patch that makes the reallocation code create new
variables.  Without this change the new variables would be declared in
the innermost loop body and couldn't be used outside of it.

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_trans_assignment_1): Generate array
reallocation code before entering the scalarisation loops.

Diff:
---
 gcc/fortran/trans-expr.cc | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 3e0d763d2fb0..760c8c4e72bd 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12943,6 +12943,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
   rhs_caf_attr = gfc_caf_attr (expr2, false, &rhs_refs_comp);
 }
 
+  tree reallocation = NULL_TREE;
   if (lss != gfc_ss_terminator)
 {
   /* The assignment needs scalarization.  */
@@ -13011,6 +13012,15 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
  ompws_flags |= OMPWS_SCALARIZER_WS | OMPWS_SCALARIZER_BODY;
}
 
+  /* F2003: Allocate or reallocate lhs of allocatable array.  */
+  if (realloc_flag)
+   {
+ realloc_lhs_warning (expr1->ts.type, true, &expr1->where);
+ ompws_flags &= ~OMPWS_SCALARIZER_WS;
+ reallocation = gfc_alloc_allocatable_for_assignment (&loop, expr1,
+  expr2);
+   }
+
   /* Start the scalarized loop body.  */
   gfc_start_scalarized_body (&loop, &body);
 }
@@ -13319,15 +13329,8 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
  gfc_add_expr_to_block (&body, tmp);
}
 
-  /* F2003: Allocate or reallocate lhs of allocatable array.  */
-  if (realloc_flag)
-   {
- realloc_lhs_warning (expr1->ts.type, true, &expr1->where);
- ompws_flags &= ~OMPWS_SCALARIZER_WS;
- tmp = gfc_alloc_allocatable_for_assignment (&loop, expr1, expr2);
- if (tmp != NULL_TREE)
-   gfc_add_expr_to_block (&loop.code[expr1->rank - 1], tmp);
-   }
+  if (reallocation != NULL_TREE)
+   gfc_add_expr_to_block (&loop.code[loop.dimen - 1], reallocation);
 
   if (maybe_workshare)
ompws_flags &= ~OMPWS_SCALARIZER_BODY;


[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] fortran: Delay evaluation of array bounds after reallocation

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:8e09c2418a0bcba8e170398a6173b2d950b47ac4

commit 8e09c2418a0bcba8e170398a6173b2d950b47ac4
Author: Mikael Morin 
Date:   Thu Jul 10 21:32:57 2025 +0200

fortran: Delay evaluation of array bounds after reallocation

Regression tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Delay the evaluation of bounds, offset, etc after the reallocation,
for the scalarization of allocatable arrays on the left hand side of
assignments.

Before this change, the code preceding the scalarization loop is like:

D.4757 = ref2.offset;
D.4759 = ref2.dim[0].ubound;
D.4762 = ref2.dim[0].lbound;
{
  if (ref2.data == 0B) goto realloc;
  if (ref2.dim[0].lbound + 4 != ref2.dim[0].ubound) goto realloc;
  goto L.10;
  realloc:
  ... change offset and bounds ...
  D.4757 = ref2.offset;
  D.4762 = NON_LVALUE_EXPR ;
  ... reallocation ...
  L.10:;
}
while (1)
  {
... scalarized code ...

so the bounds etc are evaluated first to variables, and the reallocation
code takes care to update the variables during the reallocation.  This
is problematic because the variables' initialization references the
array bounds, which for unallocated arrays are uninitialized at the
evaluation point.  This used to (correctly) cause uninitialized warnings
(see PR fortran/108889), and a workaround for variables was found, that
initializes the bounds of arrays variables to some value beforehand if
they are unallocated.  For allocatable components, there is no warning
but the problem remains, some uninitialized values are used, even if
discarded later.

After this change the code becomes:

{
  if (ref2.data == 0B) goto realloc;
  if (ref2.dim[0].lbound + 4 != ref2.dim[0].ubound) goto realloc;
  goto L.10;
  realloc:;
  ... change offset and bounds ...
  ... reallocation ...
  L.10:;
}
D.4762 = ref2.offset;
D.4763 = ref2.dim[0].lbound;
D.4764 = ref2.dim[0].ubound;
while (1)
  {
... scalarized code

so the scalarizer avoids storing the values to variables at the time it
evaluates them, if the array is reallocatable on assignment.  Instead,
it keeps expressions with references to the array descriptor fields,
expressions that remain valid through reallocation.  After the
reallocation code has been generated, the expressions stored by the
scalarizer are evaluated in place to variables.

The decision to delay evaluation is based on the existing field
is_alloc_lhs, which requires a few tweaks to be alway correct wrt to
what its name suggests.  Namely it should be set even if the assignment
right hand side is an intrinsic function, and it should not be set if
the right hand side is a scalar and neither if the -fno-realloc-lhs flag
is passed to the compiler.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_descriptor): Don't evaluate
offset and data to a variable if is_alloc_lhs is set.  Move the
existing evaluation decision condition for data...
(save_descriptor_data): ... here as a new predicate.
(evaluate_bound): Add argument save_value.  Omit the evaluation
of the value to a variable if that argument isn't set.
(gfc_conv_expr_descriptor): Update caller.
(gfc_conv_section_startstride): Update caller.  Set save_value
if is_alloc_lhs is not set.  Omit the evaluation of stride to a
variable if save_value isn't set.
(gfc_set_delta): Omit the evaluation of delta to a variable
if is_alloc_lhs is set.
(gfc_is_reallocatable_lhs): Return false if flag_realloc_lhs
isn't set.
(gfc_alloc_allocatable_for_assignment): Don't update
the variables that may be stored in saved_offset, delta, and
data.  Call instead...
(update_reallocated_descriptor): ... this new procedure.
* trans-expr.cc (gfc_trans_assignment_1): Don't omit setting the
is_alloc_lhs flag if the right hand side is an intrinsic
function.  Clear the flag if the right hand side is scalar.

Diff:
---
 gcc/fortran/trans-array.cc | 137 -
 gcc/fortran/trans-expr.cc  |  14 ++---
 2 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7be2d7b11a62..7b83d3fab8d7 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3420,6 +3420,23 @@ gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, 
bool subscript,
 }
 
 
+/* Given an array descriptor expression DESCR and its data pointer D

[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] fortran: Amend descriptor bounds init if unallocated

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:a4b9621bfff13aa051078e07f6dec483faeb631b

commit a4b9621bfff13aa051078e07f6dec483faeb631b
Author: Mikael Morin 
Date:   Thu Jul 10 21:33:09 2025 +0200

fortran: Amend descriptor bounds init if unallocated

Regression tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Always generate the conditional initialization of unallocated variables
regardless of the basic variable allocation tracking done in the
frontend and with an additional always false condition.

The scalarizer used to always evaluate array bounds, including in the
case of unallocated arrays on the left hand side of an assignment.  This
was (correctly) causing uninitialized warnings, even if the
uninitialized values were in the end discarded.

Since the fix for PR fortran/108889, an initialization of the descriptor
bounds is added to silent the uninitialized warnings, conditional on the
array being unallocated.  This initialization is not useful in the
execution of the program, and it is removed if the compiler can prove
that the variable is unallocated (in the case of a local variable for
example).  Unfortunately, the compiler is not always able to prove it
and the useless initialization may remain in the final code.
Moreover, the generated code that was causing the evaluation of
uninitialized variables has ben changed to avoid them, so we can try to
remove or revisit that unallocated variable bounds initialization tweak.

Unfortunately, just removing the extra initialization restores the
warnings at -O0, as there is no dead code removal at that optimization
level.  Instead, this change keeps the initialization and modifies its
guarding condition with an extra always false variable, so that if
optimizations are enabled the whole initialization block is removed, and
if they are disabled it remains and is sufficient to prevent the
warning.

The new variable requires the code generation to be done earlier in the
function so that the variable declaration and usage are in the same
scope.

As the modified condition guarantees the removal of the block with
optimizations, we can emit it more broadly and remove the basic
allocation tracking that was done in the frontend to limit its emission.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_symbol): Remove field allocated_in_scope.
* trans-array.cc (gfc_array_allocate): Don't set it.
(gfc_alloc_allocatable_for_assignment): Likewise.
Generate the unallocated descriptor bounds initialisation
before the opening of the reallocation code block.  Create a
variable and use it as additional condition to the unallocated
descriptor bounds initialisation.

Diff:
---
 gcc/fortran/gfortran.h |  4 --
 gcc/fortran/trans-array.cc | 92 --
 2 files changed, 49 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6848bd1762d3..69367e638c5b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2028,10 +2028,6 @@ typedef struct gfc_symbol
   /* Set if this should be passed by value, but is not a VALUE argument
  according to the Fortran standard.  */
   unsigned pass_as_value:1;
-  /* Set if an allocatable array variable has been allocated in the current
- scope. Used in the suppression of uninitialized warnings in reallocation
- on assignment.  */
-  unsigned allocated_in_scope:1;
   /* Set if an external dummy argument is called with different argument lists.
  This is legal in Fortran, but can cause problems with autogenerated
  C prototypes for C23.  */
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7b83d3fab8d7..1561936daf1c 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -6800,8 +6800,6 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree 
status, tree errmsg,
   else
   gfc_add_expr_to_block (&se->pre, set_descriptor);
 
-  expr->symtree->n.sym->allocated_in_scope = 1;
-
   return true;
 }
 
@@ -11495,14 +11493,61 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo 
*loop,
   && !expr2->value.function.isym)
 expr2->ts.u.cl->backend_decl = rss->info->string_length;
 
-  gfc_start_block (&fblock);
-
   /* Since the lhs is allocatable, this must be a descriptor type.
  Get the data and array size.  */
   desc = linfo->descriptor;
   gcc_assert (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (desc)));
   array1 = gfc_conv_descriptor_data_get (desc);
 
+  /* If the data is null, set the descriptor bounds and offset.  This 
suppresses
+ the maybe used uninitialized warning.  Note that the always false variable
+ prevents this block from ever being executed, and makes sure that the
+ optimizers are able to remove it.  Component references ar

[gcc(refs/users/mikael/heads/stabilisation_descriptor_v01)] Correction array_constructor_1

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:41b730b8a79522e8e5a6115f01a02968a571e85b

commit 41b730b8a79522e8e5a6115f01a02968a571e85b
Author: Mikael Morin 
Date:   Sat Jul 5 15:05:20 2025 +0200

Correction array_constructor_1

Diff:
---
 gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 
b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
index 45eafacd5a67..a0c55076a9ae 100644
--- a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
+++ b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
@@ -9,6 +9,8 @@ program grow_type_array
 
 type(container), allocatable :: list(:)
 
+allocate(list(0))
+
 list = [list, new_elem(5)]
 
 deallocate(list)


[gcc] Created branch 'mikael/heads/stabilisation_descriptor_v01' in namespace 'refs/users'

2025-07-10 Thread Mikael Morin via Gcc-cvs
The branch 'mikael/heads/stabilisation_descriptor_v01' was created in namespace 
'refs/users' pointing to:

 a4b9621bfff1... fortran: Amend descriptor bounds init if unallocated


[gcc r16-2182] aarch64: Guard VF-based costing with !m_costing_for_scalar

2025-07-10 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:a1e616955e9971fda54a160a49e6cf70dd838a0c

commit r16-2182-ga1e616955e9971fda54a160a49e6cf70dd838a0c
Author: Richard Sandiford 
Date:   Thu Jul 10 22:00:41 2025 +0100

aarch64: Guard VF-based costing with !m_costing_for_scalar

g:4b47acfe2b626d1276e229a0cf165e934813df6c caused a segfault
in aarch64_vector_costs::analyze_loop_vinfo when costing scalar
code, since we'd end up dividing by a zero VF.

Much of the structure of the aarch64 costing code dates from
a stage 4 patch, when we had to work within the bounds of what
the target-independent code did.  Some of it could do with a
rework now that we're not so constrained.

This patch is therefore an emergency fix rather than the best
long-term solution.  I'll revisit when I have more time to think
about it.

gcc/
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
Guard VF-based costing with !m_costing_for_scalar.

Diff:
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 27c315fc35e8..10b8ed5d3874 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -17932,7 +17932,7 @@ aarch64_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
 
   /* Do one-time initialization based on the vinfo.  */
   loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
-  if (!m_analyzed_vinfo)
+  if (!m_analyzed_vinfo && !m_costing_for_scalar)
 {
   if (loop_vinfo)
analyze_loop_vinfo (loop_vinfo);


[gcc r16-2179] testsuite: Fix unallocated array usage in test

2025-07-10 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:ca034694757f0fb3461a1d0c22708a3e4c0e40fa

commit r16-2179-gca034694757f0fb3461a1d0c22708a3e4c0e40fa
Author: Mikael Morin 
Date:   Sat Jul 5 15:05:20 2025 +0200

testsuite: Fix unallocated array usage in test

gcc/testsuite/ChangeLog:

* gfortran.dg/asan/array_constructor_1.f90: Allocate array
before using it.

Diff:
---
 gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 
b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
index 45eafacd5a67..a0c55076a9ae 100644
--- a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
+++ b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90
@@ -9,6 +9,8 @@ program grow_type_array
 
 type(container), allocatable :: list(:)
 
+allocate(list(0))
+
 list = [list, new_elem(5)]
 
 deallocate(list)