[gcc r16-1607] Prevent possible overflows in ipa-profile

2025-06-21 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:6e38bef16bbfaa7743d1ec8937ed9dfba669136d

commit r16-1607-g6e38bef16bbfaa7743d1ec8937ed9dfba669136d
Author: Jan Hubicka 
Date:   Sun Jun 22 03:32:29 2025 +0200

Prevent possible overflows in ipa-profile

The bug in scaling profile of fnsplit produced clones made
some afdo counts during gcc bootstrap very large (2^59).
This made computations in ipa-profile to silently overflow
which triggered hot count to be identified as 1 instead of
sane value.

While fixing the fnsplit bug prevents overflow, I think the histogram code
should be made safe too.  sreal is not very fitting here since mantisa is 
32bit
and not very good for many additions of many numbers which are possibly of 
very
different order.  So I use widest_int while 128bit arithmetics would be safe
(we are summing 60 bit counts multiplied by time estimates).  I don't think
we have readily available 128bit type and code is not really time critical 
since
the histogram is computed once.

gcc/ChangeLog:

* ipa-profile.cc (ipa_profile): Use widest_int to avoid
possible overflows.

Diff:
---
 gcc/ipa-profile.cc | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/gcc/ipa-profile.cc b/gcc/ipa-profile.cc
index 08638667f65c..c8b8529e38b7 100644
--- a/gcc/ipa-profile.cc
+++ b/gcc/ipa-profile.cc
@@ -764,7 +764,8 @@ ipa_profile (void)
   int order_pos;
   bool something_changed = false;
   int i;
-  gcov_type overall_time = 0, cutoff = 0, cumulated = 0, overall_size = 0;
+  widest_int overall_time = 0, cumulated = 0;
+  gcov_type overall_size = 0;
   struct cgraph_node *n,*n2;
   int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
   int nmismatch = 0, nimpossible = 0;
@@ -775,37 +776,44 @@ ipa_profile (void)
 dump_histogram (dump_file, histogram);
   for (i = 0; i < (int)histogram.length (); i++)
 {
-  overall_time += histogram[i]->count * histogram[i]->time;
+  overall_time += ((widest_int)histogram[i]->count) * histogram[i]->time;
+  /* Watch for overflow.  */
+  gcc_assert (overall_time >= 0);
   overall_size += histogram[i]->size;
 }
   threshold = 0;
-  if (overall_time)
+  if (overall_time != 0)
 {
   gcc_assert (overall_size);
 
-  cutoff = (overall_time * param_hot_bb_count_ws_permille + 500) / 1000;
+  widest_int cutoff = ((overall_time * param_hot_bb_count_ws_permille
+  + 500) / 1000);
+  /* Watch for overflow.  */
+  gcc_assert (cutoff >= 0);
   for (i = 0; cumulated < cutoff; i++)
{
- cumulated += histogram[i]->count * histogram[i]->time;
+ cumulated += ((widest_int)histogram[i]->count) * histogram[i]->time;
   threshold = histogram[i]->count;
}
   if (!threshold)
threshold = 1;
   if (dump_file)
{
- gcov_type cumulated_time = 0, cumulated_size = 0;
+ widest_int cumulated_time = 0;
+ gcov_type cumulated_size = 0;
 
   for (i = 0;
   i < (int)histogram.length () && histogram[i]->count >= threshold;
   i++)
{
- cumulated_time += histogram[i]->count * histogram[i]->time;
+ cumulated_time += ((widest_int)histogram[i]->count)
+* histogram[i]->time;
  cumulated_size += histogram[i]->size;
}
  fprintf (dump_file, "Determined min count: %" PRId64
   " Time:%3.2f%% Size:%3.2f%%\n",
   (int64_t)threshold,
-  cumulated_time * 100.0 / overall_time,
+  (cumulated_time * 1 / overall_time).to_shwi () / 100.0,
   cumulated_size * 100.0 / overall_size);
}


[gcc r16-1605] Add GUESSED_GLOBAL0_AFDO

2025-06-21 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:eb8ee105706569c9a03f3de9519f6ab8006c3f1e

commit r16-1605-geb8ee105706569c9a03f3de9519f6ab8006c3f1e
Author: Jan Hubicka 
Date:   Sun Jun 22 03:12:55 2025 +0200

Add GUESSED_GLOBAL0_AFDO

This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can
be used to preserve local counts of functions which have 0 AFDO
profile.

I originally did not include it as it was not clear it will be useful and
it turns quality from 3bits to 4bits which means that we need to steal 
another
bit from the actual counters.

It is likely not a problem for profile_count since counting up to 2^60 still
takes a while.  However with profile_probability I run into a problem that
gimple FE testcases encodes probability with current representation and thus
changing profile_probability::n_bits would require updating all testcases.

Since probabilities never use GLOBAL0 qualities (they are always local)
addoing bits is not necessary, but it requires encoding quality in
the data type that required adding accessors.

While working on this I also noticed that GUESSED_GLOBAL0 and
GUESSED_GLOBAL0_ADJUSED are misordered. Qualities should be in
increasing order.  This is also fixed.

auto-profile will be updated later.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

* cgraph.cc (cgraph_node::make_profile_global0): Support
GUESSED_GLOBAL0_AFDO
* ipa-cp.cc (update_profiling_info): Use
GUESSED_GLOBAL0_AFDO.
* profile-count.cc (profile_probability::dump): Use
quality ().
(profile_probability::stream_in): Use m_adjusted_quality.
(profile_probability::stream_out): Use m_adjusted_quality.
(profile_count::combine_with_ipa_count): Use quality ().
(profile_probability::sqrt): Likewise.
* profile-count.h (enum profile_quality): Add
GUESSED_GLOBAL0_AFDO; reoder GUESSED_GLOBAL0_ADJUSTED and
GUESSED_GLOBAL0.
(profile_probability): Add min_quality; replase m_quality
by m_adjused_quality; add set_quality; update all users
of quality.
(profile_count): Set n_bits to 60; make m_quality 4 bits;
update uses of quality.
(profile_count::afdo_zero, profile_count::globa0afdo): New.

Diff:
---
 gcc/cgraph.cc|  15 -
 gcc/ipa-cp.cc|   4 ++
 gcc/profile-count.cc |  22 ---
 gcc/profile-count.h  | 163 ---
 4 files changed, 133 insertions(+), 71 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 2441c9e866d5..94a2e6e61058 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -200,7 +200,8 @@ cgraph_node::make_profile_local ()
 }
 
 /* Turn profile to global0.  Walk into inlined functions.
-   QUALITY must be GUESSED_GLOBAL0 or GUESSED_GLOBAL0_ADJUSTED  */
+   QUALITY must be GUESSED_GLOBAL0, GUESSED_GLOBAL0_ADJUSTED
+   or GUESSED_GLOBAL0_AFDO  */
 void
 cgraph_node::make_profile_global0 (profile_quality quality)
 {
@@ -219,6 +220,14 @@ cgraph_node::make_profile_global0 (profile_quality quality)
return;
   count = count.global0adjusted ();
 }
+  else if (quality == GUESSED_GLOBAL0_AFDO)
+{
+  if (count.quality () == GUESSED_GLOBAL0
+ || count.quality () == GUESSED_GLOBAL0_ADJUSTED
+ || count.quality () == GUESSED_GLOBAL0_AFDO)
+   return;
+  count = count.global0afdo ();
+}
   else
 gcc_unreachable ();
   for (cgraph_edge *e = callees; e; e = e->next_callee)
@@ -231,6 +240,8 @@ cgraph_node::make_profile_global0 (profile_quality quality)
e->count = e->count.global0 ();
   else if (quality == GUESSED_GLOBAL0_ADJUSTED)
e->count = e->count.global0adjusted ();
+  else if (quality == GUESSED_GLOBAL0_AFDO)
+   e->count = e->count.global0afdo ();
   else
gcc_unreachable ();
 }
@@ -241,6 +252,8 @@ cgraph_node::make_profile_global0 (profile_quality quality)
   e->count = e->count.global0 ();
 else if (quality == GUESSED_GLOBAL0_ADJUSTED)
   e->count = e->count.global0adjusted ();
+else if (quality == GUESSED_GLOBAL0_AFDO)
+  e->count = e->count.global0afdo ();
 else
   gcc_unreachable ();
 }
diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 3bf0117612e5..3e073af662a6 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -4773,6 +4773,8 @@ update_profiling_info (struct cgraph_node *orig_node,
 to global0adjusted or to local if we have partial training.  */
   if (opt_for_fn (orig_node->decl, flag_profile_partial_training))
orig_node->make_profile_local ();
+  if (new_sum.quality () == AFDO)
+   orig_node->make_profile_global0 (GUESSED_GLOBAL0_AFDO);
   else
orig_node->make_profile_global0 (GUESSED_GLOBAL0_ADJUSTED);
   orig_edges_processed = true;
@@ -480

[gcc r16-1608] [PR modula2/120731] error in Strings.Pos causing sigsegv

2025-06-21 Thread Gaius Mulley via Gcc-cvs
https://gcc.gnu.org/g:fc276742e0db337c4d13e6c474abafd4796a6b69

commit r16-1608-gfc276742e0db337c4d13e6c474abafd4796a6b69
Author: Gaius Mulley 
Date:   Sun Jun 22 04:13:26 2025 +0100

[PR modula2/120731] error in Strings.Pos causing sigsegv

This patch corrects the m2log library procedure function
Strings.Pos which incorrectly sliced the wrong component
of the source string.  The incorrect slice could cause
a sigsegv if negative slice indices were generated.

gcc/m2/ChangeLog:

PR modula2/120731
* gm2-libs-log/Strings.def (Delete): Rewrite comment.
* gm2-libs-log/Strings.mod (Pos): Rewrite.
(PosLower): New procedure function.

gcc/testsuite/ChangeLog:

PR modula2/120731
* gm2/pimlib/logitech/run/pass/teststrings.mod: New test.

Signed-off-by: Gaius Mulley 

Diff:
---
 gcc/m2/gm2-libs-log/Strings.def|  4 +-
 gcc/m2/gm2-libs-log/Strings.mod| 77 ++
 .../gm2/pimlib/logitech/run/pass/teststrings.mod   | 16 +
 3 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/gcc/m2/gm2-libs-log/Strings.def b/gcc/m2/gm2-libs-log/Strings.def
index aea35f894889..2be4e4236d73 100644
--- a/gcc/m2/gm2-libs-log/Strings.def
+++ b/gcc/m2/gm2-libs-log/Strings.def
@@ -53,7 +53,9 @@ PROCEDURE Delete (VAR str: ARRAY OF CHAR; index: CARDINAL; 
length: CARDINAL) ;
 
 
 (*
-   Pos - return the first position of, substr, in, str.
+   Pos - return the first position of substr in str.
+ If substr is not found in str then it returns
+ HIGH (str) + 1.
 *)
 
 PROCEDURE Pos (substr, str: ARRAY OF CHAR) : CARDINAL ;
diff --git a/gcc/m2/gm2-libs-log/Strings.mod b/gcc/m2/gm2-libs-log/Strings.mod
index 6046a102a52e..44f47b322505 100644
--- a/gcc/m2/gm2-libs-log/Strings.mod
+++ b/gcc/m2/gm2-libs-log/Strings.mod
@@ -83,39 +83,62 @@ END Delete ;
 
 
 (*
-   Pos - return the first position of, substr, in, str.
+   PosLower - return the first position of substr in str.
 *)
 
-PROCEDURE Pos (substr, str: ARRAY OF CHAR) : CARDINAL ;
+PROCEDURE PosLower (substr, str: ARRAY OF CHAR) : CARDINAL ;
 VAR
-   i, k, l   : INTEGER ;
-   s1, s2, s3: DynamicStrings.String ;
+   i, strLen, substrLen   : INTEGER ;
+   strS, substrS, scratchS: DynamicStrings.String ;
 BEGIN
-   s1 := DynamicStrings.InitString(str) ;
-   s2 := DynamicStrings.InitString(substr) ;
-   k := DynamicStrings.Length(s1) ;
-   l := DynamicStrings.Length(s2) ;
+   strS := DynamicStrings.InitString (str) ;
+   substrS := DynamicStrings.InitString (substr) ;
+   strLen := DynamicStrings.Length (strS) ;
+   substrLen := DynamicStrings.Length (substrS) ;
i := 0 ;
REPEAT
-  i := DynamicStrings.Index(s1, DynamicStrings.char(s2, 0), i) ;
-  IF i>=0
+  i := DynamicStrings.Index (strS, DynamicStrings.char (substrS, 0), i) ;
+  IF i < 0
+  THEN
+ (* No match on first character therefore return now.  *)
+ strS := DynamicStrings.KillString (strS) ;
+ substrS := DynamicStrings.KillString (substrS) ;
+ scratchS := DynamicStrings.KillString (scratchS) ;
+ RETURN( HIGH (str) + 1 )
+  ELSIF i + substrLen <= strLen
   THEN
- s3 := DynamicStrings.Slice(s1, i, l) ;
- IF DynamicStrings.Equal(s3, s2)
+ scratchS := DynamicStrings.Slice (strS, i, i + substrLen) ;
+ IF DynamicStrings.Equal (scratchS, substrS)
  THEN
-s1 := DynamicStrings.KillString(s1) ;
-s2 := DynamicStrings.KillString(s2) ;
-s3 := DynamicStrings.KillString(s3) ;
+strS := DynamicStrings.KillString (strS) ;
+substrS := DynamicStrings.KillString (substrS) ;
+scratchS := DynamicStrings.KillString (scratchS) ;
 RETURN( i )
  END ;
- s3 := DynamicStrings.KillString(s3)
+ scratchS := DynamicStrings.KillString (scratchS)
   END ;
-  INC(i)
-   UNTIL i>=k ;
-   s1 := DynamicStrings.KillString(s1) ;
-   s2 := DynamicStrings.KillString(s2) ;
-   s3 := DynamicStrings.KillString(s3) ;
-   RETURN( HIGH(str)+1 )
+  INC (i)
+   UNTIL i >= strLen ;
+   strS := DynamicStrings.KillString (strS) ;
+   substrS := DynamicStrings.KillString (substrS) ;
+   scratchS := DynamicStrings.KillString (scratchS) ;
+   RETURN( HIGH (str) + 1 )
+END PosLower ;
+
+
+(*
+   Pos - return the first position of substr in str.
+ If substr is not found in str then it returns
+ HIGH (str) + 1.
+*)
+
+PROCEDURE Pos (substr, str: ARRAY OF CHAR) : CARDINAL ;
+BEGIN
+   IF Length (substr) <= Length (str)
+   THEN
+  RETURN PosLower (substr, str)
+   END ;
+   RETURN( HIGH (str) + 1 )
 END Pos ;
 
 
@@ -129,11 +152,11 @@ PROCEDURE Copy (str: ARRAY OF CHAR;
 VAR
s1, s2: DynamicStrings.String ;
 BEGIN
-   s1 := DynamicStrings.InitString(str) ;
-   s2 := DynamicStrings.Slice(s1, index, index+length) ;
-   DynamicStrings.CopyOut(res

[gcc(refs/users/omachota/heads/rtl-ssa-dce)] rtl-ssa-dce: add timevar

2025-06-21 Thread Ondrej Machota via Gcc-cvs
https://gcc.gnu.org/g:eb73be9bde7c752273663162786b60cde41328f2

commit eb73be9bde7c752273663162786b60cde41328f2
Author: Ondřej Machota 
Date:   Sat Jun 21 14:36:30 2025 +0200

rtl-ssa-dce: add timevar

Diff:
---
 gcc/dce.cc | 33 +
 gcc/passes.def |  4 ++--
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/gcc/dce.cc b/gcc/dce.cc
index 77ff0a4d2791..1aa5c6539664 100644
--- a/gcc/dce.cc
+++ b/gcc/dce.cc
@@ -1323,7 +1323,8 @@ make_pass_fast_rtl_dce (gcc::context *ctxt)
   return new pass_fast_rtl_dce (ctxt);
 }
 
-// offset_bitmap is a wrapper around sbitmap which takes care of negative 
indices
+namespace {
+// offset_bitmap is a wrapper around sbitmap that also handles negative 
indices from RTL SSA
 struct offset_bitmap
 {
 private:
@@ -1381,7 +1382,7 @@ private:
   void sweep ();
 
   offset_bitmap m_marked;
-  sbitmap mm_marked_phis;
+  sbitmap m_marked_phis;
 };
 
 bool
@@ -1412,7 +1413,7 @@ rtl_ssa_dce::can_delete_call (const_rtx insn)
 bool
 rtl_ssa_dce::is_rtx_prelive (const_rtx insn)
 {
-  gcc_assert (insn != nullptr);
+  gcc_checking_assert (insn != nullptr);
 
   if (CALL_P (insn)
   // We cannot delete pure or const sibling calls because it is
@@ -1481,7 +1482,7 @@ rtl_ssa_dce::is_prelive (insn_info *insn)
   if (def->is_mem ())
return true;
 
-  gcc_assert (def->is_reg ());
+  gcc_checking_assert (def->is_reg ());
 
   // We should not delete the frame pointer because of the dward2frame pass
   if (frame_pointer_needed && def->regno () == HARD_FRAME_POINTER_REGNUM)
@@ -1491,7 +1492,7 @@ rtl_ssa_dce::is_prelive (insn_info *insn)
   if (def->kind () == access_kind::CLOBBER)
continue;
 
-  gcc_assert (def->kind () == access_kind::SET);
+  gcc_checking_assert (def->kind () == access_kind::SET);
 
   // Needed by gcc.c-torture/execute/pr51447.c
   if (HARD_REGISTER_NUM_P (def->regno ()) && global_regs[def->regno ()])
@@ -1572,10 +1573,10 @@ rtl_ssa_dce::mark ()
  phi_info *phi = static_cast (set);
  auto phi_uid = phi->uid ();
  // Skip already visited phi node.
- if (bitmap_bit_p(mm_marked_phis, phi_uid))
+ if (bitmap_bit_p(m_marked_phis, phi_uid))
continue;
 
-bitmap_set_bit (mm_marked_phis, phi_uid);
+bitmap_set_bit (m_marked_phis, phi_uid);
  uses = phi->inputs ();
}
 
@@ -1629,7 +1630,7 @@ rtl_ssa_dce::reset_dead_debug ()
 bool is_parent_marked = false;
 if (parent_insn->is_phi ()) {
   auto phi = static_cast (def);
-  is_parent_marked = bitmap_bit_p(mm_marked_phis, phi->uid ());
+  is_parent_marked = bitmap_bit_p(m_marked_phis, phi->uid ());
 } else {
   is_parent_marked = m_marked.get_bit(parent_insn->uid ());
 }
@@ -1654,7 +1655,8 @@ rtl_ssa_dce::sweep ()
   // Previously created debug instructions won't be visited here
   for (insn_info *insn : crtl->ssa->nondebug_insns ())
 {
-  // Artificial or marked insns should not be deleted.
+  // Artificial (bb_head, bb_end, phi), marked or debug instructions
+  // should not be deleted.
   if (insn->is_artificial () || m_marked.get_bit (insn->uid ()))
continue;
 
@@ -1679,6 +1681,7 @@ rtl_ssa_dce::sweep ()
 unsigned int
 rtl_ssa_dce::execute (function *fn)
 {
+  auto_timevar timer(TV_RTL_SSA_DCE);
   calculate_dominance_info (CDI_DOMINATORS);
   crtl->ssa = new rtl_ssa::function_info (fn);
   int real_max = 0, artificial_min = 0;
@@ -1695,10 +1698,10 @@ rtl_ssa_dce::execute (function *fn)
   unsigned int phi_node_max = 0;
   for (ebb_info *ebb : crtl->ssa->ebbs ())
 for (phi_info *phi : ebb->phis ())
-phi_node_max = std::max (phi_node_max, phi->uid ());
+  phi_node_max = std::max (phi_node_max, phi->uid ());
 
-  mm_marked_phis = sbitmap_alloc(phi_node_max + 1);
-  bitmap_clear(mm_marked_phis);
+  m_marked_phis = sbitmap_alloc(phi_node_max + 1);
+  bitmap_clear(m_marked_phis);
 
   mark ();
   if (MAY_HAVE_DEBUG_BIND_INSNS)
@@ -1709,19 +1712,17 @@ rtl_ssa_dce::execute (function *fn)
   if (crtl->ssa->perform_pending_updates ())
 cleanup_cfg (0);
 
-  sbitmap_free(mm_marked_phis);
+  sbitmap_free(m_marked_phis);
   delete crtl->ssa;
   crtl->ssa = nullptr;
   return 0;
 }
 
-namespace {
-
 const pass_data pass_data_rtl_ssa_dce = {
   RTL_PASS,  /* type */
   "rtl_ssa_dce",  /* name */
   OPTGROUP_NONE,  /* optinfo_flags */
-  TV_DCE,/* tv_id */
+  TV_RTL_SSA_DCE,/* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */
diff --git a/gcc/passes.def b/gcc/passes.def
index a4cd1eb741b2..9b3b880b4cc9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -489,7 +489,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_inc_dec);
   NEXT_PASS (pass_initialize_regs);
   NEXT_PASS (pass_rtl_ssa_dce);
-  // __NEXT_PAS (pass_ud_rtl_dce);
+  // NEXT_PASS (

[gcc r16-1602] [modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod

2025-06-21 Thread Gaius Mulley via Gcc-cvs
https://gcc.gnu.org/g:7a7cc65b8987b9b05fb8fb75824e2000861e6c30

commit r16-1602-g7a7cc65b8987b9b05fb8fb75824e2000861e6c30
Author: Gaius Mulley 
Date:   Sat Jun 21 17:18:04 2025 +0100

[modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod

This patch reformats three comments in the GNU GCC style.

gcc/m2/ChangeLog:

* gm2-compiler/M2GCCDeclare.mod (StartDeclareModuleScopeSeparate):
Reformat statement comments.
(StartDeclareModuleScopeWholeProgram): Ditto.

Signed-off-by: Gaius Mulley 

Diff:
---
 gcc/m2/gm2-compiler/M2GCCDeclare.mod | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/gcc/m2/gm2-compiler/M2GCCDeclare.mod 
b/gcc/m2/gm2-compiler/M2GCCDeclare.mod
index be80695d3e8a..860a89ac8b0c 100644
--- a/gcc/m2/gm2-compiler/M2GCCDeclare.mod
+++ b/gcc/m2/gm2-compiler/M2GCCDeclare.mod
@@ -2927,14 +2927,12 @@ BEGIN
   DeclareTypesConstantsProcedures(scope) ; (* will resolved TYPEs and 
CONSTs on the ToDo  *)
(* lists.   
   *)
   ForeachModuleDo(DeclareProcedure) ;
-  (*
- now that all types have been resolved it is safe to declare
- variables
-  *)
+  (* Now that all types have been resolved it is safe to declare
+ variables.  *)
   AssertAllTypesDeclared(scope) ;
   DeclareGlobalVariables(scope) ;
   ForeachImportedDo(scope, DeclareImportedVariables) ;
-  (* now it is safe to declare all procedures *)
+  (* Now it is safe to declare all procedures.  *)
   ForeachProcedureDo(scope, DeclareProcedure) ;
   ForeachInnerModuleDo(scope, WalkTypesInModule) ;
   ForeachInnerModuleDo(scope, DeclareTypesConstantsProcedures) ;
@@ -2963,14 +2961,12 @@ BEGIN
(* lists.   
   *)
   ForeachModuleDo(DeclareProcedure) ;
   ForeachModuleDo(DeclareModuleInit) ;
-  (*
- now that all types have been resolved it is safe to declare
- variables
-  *)
+  (* Now that all types have been resolved it is safe to declare
+ variables.  *)
   AssertAllTypesDeclared(scope) ;
   DeclareGlobalVariablesWholeProgram(scope) ;
   ForeachImportedDo(scope, DeclareImportedVariablesWholeProgram) ;
-  (* now it is safe to declare all procedures *)
+  (* Now it is safe to declare all procedures.  *)
   ForeachProcedureDo(scope, DeclareProcedure) ;
   ForeachInnerModuleDo(scope, WalkTypesInModule) ;
   ForeachInnerModuleDo(scope, DeclareTypesConstantsProcedures) ;


[gcc r16-1603] Fix profile after fnsplit

2025-06-21 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:cd589516b12e28ee30aefc4c51500f634f1b888e

commit r16-1603-gcd589516b12e28ee30aefc4c51500f634f1b888e
Author: Jan Hubicka 
Date:   Sat Jun 21 22:29:50 2025 +0200

Fix profile after fnsplit

when splitting functions, tree-inline determined correctly entry count of 
the
new function part, but then in case entry block of new function part is in a
loop it scales body which is not suposed to happen.

* tree-inline.cc (copy_cfg_body): Fix profile of split functions.

Diff:
---
 gcc/tree-inline.cc | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index b25705173433..dee2dfc26206 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -3084,7 +3084,7 @@ copy_cfg_body (copy_body_data * id,
   /* Register specific tree functions.  */
   gimple_register_cfg_hooks ();
 
-  /* If we are inlining just region of the function, make sure to connect
+  /* If we are offlining region of the function, make sure to connect
  new entry to ENTRY_BLOCK_PTR_FOR_FN (cfun).  Since new entry can be
  part of loop, we must compute frequency and probability of
  ENTRY_BLOCK_PTR_FOR_FN (cfun) based on the frequencies and
@@ -3093,12 +3093,14 @@ copy_cfg_body (copy_body_data * id,
 {
   edge e;
   edge_iterator ei;
-  den = profile_count::zero ();
+  ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ();
 
   FOR_EACH_EDGE (e, ei, new_entry->preds)
if (!e->src->aux)
- den += e->count ();
-  ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = den;
+ ENTRY_BLOCK_PTR_FOR_FN (cfun)->count += e->count ();
+  /* Do not scale - the profile of offlined region should
+remain unchanged.  */
+  num = den = profile_count::one ();
 }
 
   profile_count::adjust_for_ipa_scaling (&num, &den);


[gcc(refs/users/aoliva/heads/testme)] [lra] propagate fp2sp elimination offset after disabling it [PR120424]

2025-06-21 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:f5b3ed79efe40bae658954edd2e94f1a7dd29ec1

commit f5b3ed79efe40bae658954edd2e94f1a7dd29ec1
Author: Alexandre Oliva 
Date:   Sat Jun 21 05:58:22 2025 -0300

[lra] propagate fp2sp elimination offset after disabling it [PR120424]

Deactivating the fp2sp elimination early causes __do_global_ctors_aux
on arm-linux-gnueabihf to be miscompiled (with -fnon-call-exceptions
-fstack-clash-protection) because, after fp2sp is disabled, we'd
choose another elimination and use its -1 prev_offset.

But keeping it active causes e.g. pr103973-18.c to be miscompiled on
x86_64-linux-gnu (with ix86_frame_pointer_required modified to return
true when ix86_get_frame_size () > 0, to exercise the fp2sp
elimination deactivation), because two adjacent pushes of two adjacent
temporaries end up pushing the same slot, because one of them gets
offset by the sp variation.

The solution is to keep the early disabling of the elimination, which
fixes x86_64, but retain the code to propagate the offset to the next
FP elimination, which fixes arm.


(I was concerned that disabling the fp2sp elimination too soon could
drop offsets already applied to fp, that would have to be reverted or
passed on to the next fp elimination, but we know the elimination was
not used, and if there were any insns or offsets to propagate, the
previous round of lra_eliminate, in which update_reg_eliminations
would have selected the fp2sp elimination, would also have set
elimination_fp2sp_occured if any insn used the fp2sp elimination's
offset.  Since we know there weren't any, because
elimination_fp2sp_occured is clear, we know other fp eliminations can
safely use their initial offsets if needed.)

Diff:
---
 gcc/lra-eliminations.cc | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index f8a761e6eeaa..400857587b3a 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -312,6 +312,11 @@ move_plus_up (rtx x)
 /* Flag that we already did frame pointer to stack pointer elimination.  */
 static bool elimination_fp2sp_occured_p = false;
 
+/* Hold the fp2sp elimination that was disabled and inactivated, but
+   whose offsets we wish to propagate to any subsequently-chosen fp
+   elimination.  */
+static class lra_elim_table *disabled_fp2sp_elimination;
+
 /* Scan X and replace any eliminable registers (such as fp) with a
replacement (such as sp) if SUBST_P, plus an offset.  The offset is
a change in the offset between the eliminable register and its
@@ -1185,7 +1190,9 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
  setup_can_eliminate (ep, false);
  continue;
}
-  if (!ep->can_eliminate && elimination_map[ep->from] == ep)
+  if (!ep->can_eliminate
+ && (elimination_map[ep->from] == ep
+ || disabled_fp2sp_elimination == ep))
{
  /* We cannot use this elimination anymore -- find another
 one.  */
@@ -1233,6 +1240,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
 
   INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->offset);
 }
+  disabled_fp2sp_elimination = NULL;
   setup_elimination_map ();
   result = false;
   CLEAR_HARD_REG_SET (temp_hard_reg_set);
@@ -1319,6 +1327,7 @@ init_elimination (void)
   rtx_insn *insn;
   class lra_elim_table *ep;
 
+  disabled_fp2sp_elimination = NULL;
   init_elim_table ();
   FOR_EACH_BB_FN (bb, cfun)
 {
@@ -1432,7 +1441,13 @@ lra_update_fp2sp_elimination (void)
  during update_reg_eliminate.  */
   ep = elimination_map[FRAME_POINTER_REGNUM];
   if (ep->to == STACK_POINTER_REGNUM)
-setup_can_eliminate (ep, false);
+{
+  /* Avoid using this known-unused elimination when removing
+spilled pseudos.  */
+  elimination_map[FRAME_POINTER_REGNUM] = NULL;
+  disabled_fp2sp_elimination = ep;
+  setup_can_eliminate (ep, false);
+}
   else
 for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
   if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)


[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-21 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:70563b8f84af32bbe52e5759ed36b5bca3180e43

commit 70563b8f84af32bbe52e5759ed36b5bca3180e43
Author: Alexandre Oliva 
Date:   Thu Jun 19 11:05:36 2025 -0300

[lra] simplify disabling of fp2sp elimination [PR120424]

Whether with or without the lra fp2sp elimination accumulated
improvements, building a native arm-linux-gnueabihf toolchain with
{BOOT_CFLAGS,TFLAGS}='-O2 -g -fnon-call-exceptions
-fstack-clash-protection' doesn't get very far: crtbegin.o gets
miscompiled in __do_global_dtors_aux, as spilled pseudos get assigned
stack slots that get incorrectly adjusted after the fp2sp elimination
is disabled.

AFAICT eliminations are reversible before the final round, and ISTM
that deferring spilling of registers in disabled eliminations to
update_reg_eliminate would avoid the incorrect adjustments I saw in
spilling within or after lra_update_fp2sp_elimination: the logic to
deal with them was already there, we just have to set the stage for it
to do its job.

Diff:
---
 gcc/lra-eliminations.cc | 31 +--
 gcc/lra-int.h   |  2 +-
 gcc/lra-spills.cc   | 13 +++--
 3 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 6663d1c37e8b..f8a761e6eeaa 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1405,30 +1405,18 @@ process_insn_for_elimination (rtx_insn *insn, bool 
final_p, bool first_p)
permitted frame pointer elimination and now target reports that we can not
do this elimination anymore.  Record spilled pseudos in SPILLED_PSEUDOS
unless it is null, and return the recorded pseudos number.  */
-int
-lra_update_fp2sp_elimination (int *spilled_pseudos)
+void
+lra_update_fp2sp_elimination (void)
 {
-  int n;
-  HARD_REG_SET set;
   class lra_elim_table *ep;
 
   if (frame_pointer_needed || !targetm.frame_pointer_required ())
-return 0;
+return;
   gcc_assert (!elimination_fp2sp_occured_p);
-  ep = elimination_map[FRAME_POINTER_REGNUM];
-  if (ep->to == STACK_POINTER_REGNUM)
-{
-  elimination_map[FRAME_POINTER_REGNUM] = NULL;
-  setup_can_eliminate (ep, false);
-}
-  else
-ep = NULL;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
 " Frame pointer can not be eliminated anymore\n");
   frame_pointer_needed = true;
-  CLEAR_HARD_REG_SET (set);
-  add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
   /* If !lra_reg_spill_p, we likely have incomplete range information
  for pseudos assigned to the frame pointer that will have to be
  spilled, and so we may end up incorrectly sharing them unless we
@@ -1437,12 +1425,19 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
 /* If lives ranges changed, update the aggregate live ranges in
slots as well before spilling any further pseudos.  */
 lra_recompute_slots_live_ranges ();
-  n = spill_pseudos (set, spilled_pseudos);
-  if (!ep)
+  /* Conceivably, we wouldn't need to disable the fp2sp elimination
+ unless the target says so, but we have to vacate the frame
+ pointer register to make room for the frame pointer proper, and
+ disabling the elimination will spill everything assigned to it
+ during update_reg_eliminate.  */
+  ep = elimination_map[FRAME_POINTER_REGNUM];
+  if (ep->to == STACK_POINTER_REGNUM)
+setup_can_eliminate (ep, false);
+  else
 for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
   if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
setup_can_eliminate (ep, false);
-  return n;
+  return;
 }
 
 /* Return true if we have a pseudo assigned to hard frame pointer.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 0cf7266ce646..05d28af7c9e0 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -433,7 +433,7 @@ extern int lra_get_elimination_hard_regno (int);
 extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode,
 bool, bool, poly_int64, bool);
 extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64);
-extern int lra_update_fp2sp_elimination (int *spilled_pseudos);
+extern void lra_update_fp2sp_elimination (void);
 extern bool lra_fp_pseudo_p (void);
 extern void lra_eliminate (bool, bool);
 
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 7603d0dcf163..494ef3c96c8e 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -658,7 +658,7 @@ lra_need_for_spills_p (void)
 void
 lra_spill (void)
 {
-  int i, n, n2, curr_regno;
+  int i, n, curr_regno;
   int *pseudo_regnos;
 
   regs_num = max_reg_num ();
@@ -685,15 +685,8 @@ lra_spill (void)
   for (i = 0; i < n; i++)
 if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
   assign_mem_slot (pseudo_regnos[i]);
-  if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0)
-{
-  /* Assign stack slots to spilled pseudos assigned to fp.  */
-   

[gcc/aoliva/heads/testme] (4 commits) [lra] simplify disabling of fp2sp elimination [PR120424]

2025-06-21 Thread Alexandre Oliva via Gcc-cvs
The branch 'aoliva/heads/testme' was updated to point to:

 70563b8f84af... [lra] simplify disabling of fp2sp elimination [PR120424]

It previously pointed to:

 95913b192448... [lra] simplify disabling of fp2sp elimination [PR120424]

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  95913b1... [lra] simplify disabling of fp2sp elimination [PR120424]
  6eb0e6b... [lra] recompute ranges upon disabling fp2sp elimination [PR
  329376c... [genoutput] mark scratch outputs as eliminable [PR120424]
  1498558... [lra] inactivate disabled fp2sp elimination [PR120424]


Summary of changes (added commits):
---

  70563b8... [lra] simplify disabling of fp2sp elimination [PR120424]
  ed2c07a... [lra] recompute ranges upon disabling fp2sp elimination [PR
  89620fe... [genoutput] mark scratch outputs as eliminable [PR120424]
  440d462... [lra] inactivate disabled fp2sp elimination [PR120424]


[gcc(refs/users/aoliva/heads/testme)] [genoutput] mark scratch outputs as eliminable [PR120424]

2025-06-21 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:89620fe7db2e9d3360a18a402590ca0c6cfe18e1

commit 89620fe7db2e9d3360a18a402590ca0c6cfe18e1
Author: Alexandre Oliva 
Date:   Wed Jun 18 04:13:19 2025 -0300

[genoutput] mark scratch outputs as eliminable [PR120424]

acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2
-fstack-clash-protection -march=armv7-a -marm: a clobbered scratch
register in a *iorsi3_compare0_scratch pattern gets initially assigned
to the frame pointer register, but at some point during lra the frame
size grows to nonzero, arm_frame_pointer_required flips to true, and
the fp2sp elimination has to be disabled, so the scratch register gets
spilled to a stack slot.

It needs to get the sfp elimination at that point, because later
rounds of elimination will assume the previous round's offset has
already been applied.  But since scratch matches are not regarded as
eliminable by genoutput, we don't attempt elimination in the clobbered
stack slot MEM rtx.

Later on, lra issues a reload for that slot, using a new pseudo
allocated to a hardware register, that gets stored in the stack slot
after the original insn.  Elimination in that reload store insn
eventually updates the elimination offset, but it's an incremental
update, assuming that the offset so far has already been applied.

Without applying the initial offset, the store ends up overlapping
with the function's register save area, corrupting a caller's
call-saved register.

AFAICT the old reload's elimination wouldn't be harmed by allowing
elimination in scratch operands, so I'm enabling eliminable for them
regardless.  Should it be found to make a difference, we could
presumably set a different bit in eliminable to enable reload and lra
to tell them apart and behave accordingly.


for  gcc/ChangeLog

PR rtl-optimization/120424
* genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable.

Diff:
---
 gcc/genoutput.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index dd4e7b80c2a9..25d0b8b86467 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -478,7 +478,7 @@ scan_operands (class data *d, rtx part, int this_address_p,
   d->operand[opno].n_alternatives
= n_occurrences (',', d->operand[opno].constraint) + 1;
   d->operand[opno].address_p = 0;
-  d->operand[opno].eliminable = 0;
+  d->operand[opno].eliminable = 1;
   return;
 
 case MATCH_OPERATOR:


[gcc(refs/users/aoliva/heads/testme)] [lra] inactivate disabled fp2sp elimination [PR120424]

2025-06-21 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:440d462d321c0e7026a6eb8eb550b1f444eeb932

commit 440d462d321c0e7026a6eb8eb550b1f444eeb932
Author: Alexandre Oliva 
Date:   Fri Jun 6 02:03:31 2025 -0300

[lra] inactivate disabled fp2sp elimination [PR120424]

Even after we disable the fp2sp elimination when it is the active
elimination for the fp, spilling might use it before
update_reg_eliminate runs and inactivates it for good.  If it is used,
update_reg_eliminate will fail the check that fp2sp was not used.

Since we keep track of uses of this specific elimination, and
lra_update_fp2sp_elimination checks it before disabling it, we know it
hasn't been used, so we can inactivate it without any ill effects.

This fixes the pr118591-1.c avr-none regression exposed by the
PR120424 fix.


for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (lra_update_fp2sp_elimination):
Inactivate the unused fp2sp elimination right away.

Diff:
---
 gcc/lra-eliminations.cc | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index bb708b007a4e..ff05501dbbbd 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1415,6 +1415,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   if (frame_pointer_needed || !targetm.frame_pointer_required ())
 return 0;
   gcc_assert (!elimination_fp2sp_occured_p);
+  ep = elimination_map[FRAME_POINTER_REGNUM];
+  if (ep->to == STACK_POINTER_REGNUM)
+{
+  elimination_map[FRAME_POINTER_REGNUM] = NULL;
+  setup_can_eliminate (ep, false);
+}
+  else
+ep = NULL;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
 " Frame pointer can not be eliminated anymore\n");
@@ -1422,9 +1430,10 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
   n = spill_pseudos (set, spilled_pseudos);
-  for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
-if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
-  setup_can_eliminate (ep, false);
+  if (!ep)
+for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+  if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
+   setup_can_eliminate (ep, false);
   return n;
 }


[gcc r16-1601] [RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

2025-06-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:49199bb29628365fc6c60bd185808a1bad65086d

commit r16-1601-g49199bb29628365fc6c60bd185808a1bad65086d
Author: Jeff Law 
Date:   Sat Jun 21 08:24:58 2025 -0600

[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

The RISC-V prefetch support is broken in a few ways.  This addresses the 
data
side prefetch problems.  I'd mistakenly thought this BZ was a prefetch.i
related (which has deeper problems).

The basic problem is we were accepting any valid address when in fact there 
are
restrictions.  This patch more precisely defines the predicate such that we
allow

REG
REG+D

Where D must have the low 5 bits clear.  Note that absolute addresses fall 
into
the REG+D form using the x0 for the register operand since it always has the
value zero.  The test verifies REG, REG+D, ABS addressing modes that are 
valid
as well as REG+D and ABS which must be reloaded into a REG because the
displacement has low bits set.

An earlier version of this patch has gone through testing in my tester on 
rv32
and rv64.  Obviously I'll wait for pre-commit CI to do its thing before 
moving
forward.

This is a good backport candidate after simmering on the trunk for a bit.

PR target/118241
gcc/
* config/riscv/predicates.md (prefetch_operand): New predicate.
* config/riscv/constraints.md (Q): New constraint.
* config/riscv/riscv.md (prefetch): Use new predicate and 
constraint.
(riscv_prefetchi_): Similarly.

gcc/testsuite/
* gcc.target/riscv/pr118241.c: New test.

Diff:
---
 gcc/config/riscv/constraints.md   |  4 
 gcc/config/riscv/predicates.md| 12 
 gcc/config/riscv/riscv.md |  4 ++--
 gcc/testsuite/gcc.target/riscv/pr118241.c | 16 
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 58355cf03f2f..ccab1a2e29df 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -325,3 +325,7 @@
   "A 2-bit unsigned immediate."
   (and (match_code "const_int")
(match_test "IN_RANGE (ival, 0, 3)")))
+
+(define_constraint "Q"
+  "An address operand that is valid for a prefetch instruction"
+  (match_operand 0 "prefetch_operand"))
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 23690792b32e..8072d67fbd97 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,6 +27,18 @@
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
+;; REG or REG+D where D fits in a simm12 and has the low 4 bits
+;; off.  The REG+D form can be reloaded into a temporary if needed
+;; after FP elimination if that exposes an invalid offset.
+(define_predicate "prefetch_operand"
+  (ior (match_operand 0 "register_operand")
+   (and (match_test "const_arith_operand (op, VOIDmode)")
+   (match_test "(INTVAL (op) & 0xf) == 0"))
+   (and (match_code "plus")
+   (match_test "register_operand (XEXP (op, 0), word_mode)")
+   (match_test "const_arith_operand (XEXP (op, 1), VOIDmode)")
+   (match_test "(INTVAL (XEXP (op, 1)) & 0xf) == 0"
+
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3aed25c25880..3406b50518ed 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4402,7 +4402,7 @@
 )
 
 (define_insn "prefetch"
-  [(prefetch (match_operand 0 "address_operand" "r")
+  [(prefetch (match_operand 0 "prefetch_operand" "Q")
  (match_operand 1 "imm5_operand" "i")
  (match_operand 2 "const_int_operand" "n"))]
   "TARGET_ZICBOP"
@@ -4422,7 +4422,7 @@
  (const_string "4")))])
 
 (define_insn "riscv_prefetchi_"
-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
+  [(unspec_volatile:X [(match_operand:X 0 "prefetch_operand" "Q")
   (match_operand:X 1 "imm5_operand" "i")]
   UNSPECV_PREI)]
   "TARGET_ZICBOP"
diff --git a/gcc/testsuite/gcc.target/riscv/pr118241.c 
b/gcc/testsuite/gcc.target/riscv/pr118241.c
new file mode 100644
index ..f1dc44bce0ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr118241.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicbop" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicbop" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+void test1() { __builtin_prefetch((int *)2047); }
+void test2() { __builtin_prefetch((int *)1024); }
+void test3(char *x) { __builtin_prefetch(&x); }
+void test4(char *x) { __builtin_prefetch(&x[2]); }
+void test5(char *x) { __builtin_prefetch(&x

[gcc(refs/users/mikael/heads/non_lvalue_v03)] match: Simplify doubled not, negate and conjugate operators to a non_lvalue

2025-06-21 Thread Mikael Morin via Gcc-cvs
https://gcc.gnu.org/g:2eb0523c0d47826c0a2b04b655f6b6bf0a14e0a6

commit 2eb0523c0d47826c0a2b04b655f6b6bf0a14e0a6
Author: Mikael Morin 
Date:   Thu Jul 4 12:59:34 2024 +0200

match: Simplify doubled not, negate and conjugate operators to a non_lvalue

Regression tested on x86_64-linux.  OK for master?

Changes v1 -> v2:
 - Also handle conjugate operator.
 - Don't create the NON_LVALUE_EXPR if there is a type conversion between 
the
   doubled operators.

-- 8< --

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`, `conj(conj(X))`): Add a
NON_LVALUE_EXPR wrapper to the simplification of doubled unary
operators NEGATE_EXPR, BIT_NOT_EXPR and CONJ_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.

Diff:
---
 gcc/match.pd   | 10 +++---
 gcc/testsuite/gfortran.dg/non_lvalue_1.f90 | 32 ++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f53c162fce3..372d4657baac 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2357,7 +2357,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~~x -> x */
 (simplify
   (bit_not (bit_not @0))
-  @0)
+  (non_lvalue @0))
 
 /* zero_one_valued_p will match when a value is known to be either
0 or 1 including constants 0 or 1.
@@ -4037,7 +4037,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (negate (nop_convert? (negate @1)))
   (if (!TYPE_OVERFLOW_SANITIZED (type)
&& !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@1)))
-   (view_convert @1)))
+   (if (GENERIC && type == TREE_TYPE (@1))
+(non_lvalue @1)
+(view_convert @1
 
  /* We can't reassociate floating-point unless -fassociative-math
 or fixed-point plus or minus because of saturation to +-Inf.  */
@@ -5767,7 +5769,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (conj (convert? (conj @0)))
  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
-  (convert @0)))
+  (if (GENERIC && type == TREE_TYPE (@0))
+   (non_lvalue @0)
+   (convert @0
 
 /* conj({x,y}) -> {x,-y}  */
 (simplify
diff --git a/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 
b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
new file mode 100644
index ..61dad5a2ce1b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
@@ -0,0 +1,32 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! Check the generation of NON_LVALUE_EXPR expressions in cases where a unary
+! operator expression would simplify to a bare data reference.
+
+! A NON_LVALUE_EXPR is generated for a double negation that would simplify to
+! a bare data reference.
+function f1 (f1_arg1)
+  integer, value :: f1_arg1
+  integer :: f1
+  f1 = -(-f1_arg1)
+end function
+! { dg-final { scan-tree-dump "__result_f1 = NON_LVALUE_EXPR ;" 
"original" } }
+
+! A NON_LVALUE_EXPR is generated for a double complement that would simplify to
+! a bare data reference.
+function f2 (f2_arg1)
+  integer, value :: f2_arg1
+  integer :: f2
+  f2 = not(not(f2_arg1))
+end function
+! { dg-final { scan-tree-dump "__result_f2 = NON_LVALUE_EXPR ;" 
"original" } }
+
+! A NON_LVALUE_EXPR is generated for a double complex conjugate that would
+! simplify to a bare data reference.
+function f3 (f3_arg1)
+  complex, value :: f3_arg1
+  complex :: f3
+  f3 = conjg(conjg(f3_arg1))
+end function
+! { dg-final { scan-tree-dump "__result_f3 = NON_LVALUE_EXPR ;" 
"original" } }


[gcc] Created branch 'mikael/heads/non_lvalue_v03' in namespace 'refs/users'

2025-06-21 Thread Mikael Morin via Gcc-cvs
The branch 'mikael/heads/non_lvalue_v03' was created in namespace 'refs/users' 
pointing to:

 2eb0523c0d47... match: Simplify doubled not, negate and conjugate operators


[gcc r16-1600] value-range: Use int instead of uint for wi::ctz result [PR120746]

2025-06-21 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b76779c7cd6b92f167a80f16f11f599eea4dfc67

commit r16-1600-gb76779c7cd6b92f167a80f16f11f599eea4dfc67
Author: Jakub Jelinek 
Date:   Sat Jun 21 16:09:08 2025 +0200

value-range: Use int instead of uint for wi::ctz result [PR120746]

uint is some compatibility type in glibc sys/types.h enabled in misc/GNU
modes, so it doesn't exist on many hosts.
Furthermore, wi::ctz returns int rather than unsigned and the var is
only used in comparison to zero or as second argument of left shift, so
I think just using int instead of unsigned is better.

2025-06-21  Jakub Jelinek  

PR middle-end/120746
* value-range.cc (irange::snap): Use int type instead of uint.

Diff:
---
 gcc/value-range.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index ce13acc312d2..23a5c66ed5e3 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -2287,7 +2287,7 @@ bool
 irange::snap (const wide_int &lb, const wide_int &ub,
  wide_int &new_lb, wide_int &new_ub)
 {
-  uint z = wi::ctz (m_bitmask.mask ());
+  int z = wi::ctz (m_bitmask.mask ());
   if (z == 0)
 return false;


[gcc r16-1606] Scale up auto-profile counts

2025-06-21 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:dda86c80bca2300a47f91bcfc589951df9c7f1be

commit r16-1606-gdda86c80bca2300a47f91bcfc589951df9c7f1be
Author: Jan Hubicka 
Date:   Sun Jun 22 03:26:36 2025 +0200

Scale up auto-profile counts

This patch makes auto-profile counts to scale up when the train run has
too small maximal count.  This still happens on moderately sized train runs
such as SPEC2017 ref datasets and helps to avoid rounding errors in cases
where number of samples is small.

gcc/ChangeLog:

* auto-profile.cc (autofdo::afdo_count_scale): New.
(autofdo_source_profile::update_inlined_ind_target): Scale
counts.
(autofdo_source_profile::read): Set scale and dump
statistics.
(afdo_indirect_call): Scale.
(afdo_set_bb_count): Scale.
(afdo_find_equiv_class): Fix dumps.
(afdo_annotate_cfg): Scale.

Diff:
---
 gcc/auto-profile.cc | 46 ++
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index f4dda8272ed8..202f1083633d 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -353,6 +353,13 @@ static autofdo_source_profile *afdo_source_profile;
 /* gcov_summary structure to store the profile_info.  */
 static gcov_summary *afdo_profile_info;
 
+/* Scaling factor for afdo data.  Compared to normal profile
+   AFDO profile counts are much lower, depending on sampling
+   frequency.  We scale data up to reudce effects of roundoff
+   errors.  */
+
+static gcov_type afdo_count_scale = 1;
+
 /* Helper functions.  */
 
 /* Return the original name of NAME: strip the suffix that starts
@@ -650,8 +657,8 @@ function_instance::find_icall_target_map (tree fn, gcall 
*stmt,
   get_identifier (afdo_string_table->get_name (callee)));
   if (node == NULL)
 continue;
-  (*map)[callee] = iter->second->total_count ();
-  ret += iter->second->total_count ();
+  (*map)[callee] = iter->second->total_count () * afdo_count_scale;
+  ret += iter->second->total_count () * afdo_count_scale;
 }
   return ret;
 }
@@ -826,6 +833,7 @@ autofdo_source_profile::update_inlined_ind_target (gcall 
*stmt,
   for (icall_target_map::const_iterator iter = old_info.targets.begin ();
iter != old_info.targets.end (); ++iter)
 total += iter->second;
+  total *= afdo_count_scale;
 
   /* Program behavior changed, original promoted (and inlined) target is not
  hot any more. Will avoid promote the original target.
@@ -913,7 +921,7 @@ autofdo_source_profile::get_callsite_total_count (
 != s->name()))
 return 0;
 
-  return s->total_count ();
+  return s->total_count () * afdo_count_scale;
 }
 
 /* Read AutoFDO profile and returns TRUE on success.  */
@@ -972,6 +980,23 @@ autofdo_source_profile::read ()
  map_[fun_id]->merge (s);
}
 }
+  /* Scale up the profile, but leave some bits in case some counts gets
+ bigger than sum_max eventually.  */
+  if (afdo_profile_info->sum_max)
+afdo_count_scale
+  = MAX (((gcov_type)1 << (profile_count::n_bits / 2))
+/ afdo_profile_info->sum_max, 1);
+  if (dump_file)
+fprintf (dump_file, "Max count in profile %" PRIu64 "\n"
+   "Setting scale %" PRIu64 "\n"
+   "Scaled max count %" PRIu64 "\n"
+   "Hot count threshold %" PRIu64 "\n",
+(int64_t)afdo_profile_info->sum_max,
+(int64_t)afdo_count_scale,
+(int64_t)(afdo_profile_info->sum_max * afdo_count_scale),
+(int64_t)(afdo_profile_info->sum_max * afdo_count_scale
+  / param_hot_bb_count_fraction));
+  afdo_profile_info->sum_max *= afdo_count_scale;
   g->get_dumps ()->dump_finish (profile_pass_num);
   return true;
 }
@@ -1112,6 +1137,7 @@ afdo_indirect_call (gcall *stmt, const icall_target_map 
&map,
   if (max_iter == map.end () || max_iter->second < iter->second)
 max_iter = iter;
 }
+  total *= afdo_count_scale;
   struct cgraph_node *direct_call = cgraph_node::get_for_asmname (
   get_identifier (afdo_string_table->get_name (max_iter->first)));
   if (direct_call == NULL)
@@ -1145,7 +1171,7 @@ afdo_indirect_call (gcall *stmt, const icall_target_map 
&map,
   /* Value */
   hist->hvalue.counters[2] = direct_call->profile_id;
   /* Counter */
-  hist->hvalue.counters[3] = max_iter->second;
+  hist->hvalue.counters[3] = max_iter->second * afdo_count_scale;
 
   if (!direct_call->profile_id)
{
@@ -1313,11 +1339,13 @@ afdo_set_bb_count (basic_block bb)
 
   if (max_count)
 {
-  update_count_by_afdo_count (&bb->count, max_count);
+  update_count_by_afdo_count (&bb->count, max_count * afdo_count_scale);
   if (dump_file)
fprintf (dump_file,
-" Annotated bb %i with count %" PRId64 "\n",
-bb->index, (int6