[gcc r16-1607] Prevent possible overflows in ipa-profile
https://gcc.gnu.org/g:6e38bef16bbfaa7743d1ec8937ed9dfba669136d commit r16-1607-g6e38bef16bbfaa7743d1ec8937ed9dfba669136d Author: Jan Hubicka Date: Sun Jun 22 03:32:29 2025 +0200 Prevent possible overflows in ipa-profile The bug in scaling profile of fnsplit produced clones made some afdo counts during gcc bootstrap very large (2^59). This made computations in ipa-profile to silently overflow which triggered hot count to be identified as 1 instead of sane value. While fixing the fnsplit bug prevents overflow, I think the histogram code should be made safe too. sreal is not very fitting here since mantisa is 32bit and not very good for many additions of many numbers which are possibly of very different order. So I use widest_int while 128bit arithmetics would be safe (we are summing 60 bit counts multiplied by time estimates). I don't think we have readily available 128bit type and code is not really time critical since the histogram is computed once. gcc/ChangeLog: * ipa-profile.cc (ipa_profile): Use widest_int to avoid possible overflows. Diff: --- gcc/ipa-profile.cc | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/gcc/ipa-profile.cc b/gcc/ipa-profile.cc index 08638667f65c..c8b8529e38b7 100644 --- a/gcc/ipa-profile.cc +++ b/gcc/ipa-profile.cc @@ -764,7 +764,8 @@ ipa_profile (void) int order_pos; bool something_changed = false; int i; - gcov_type overall_time = 0, cutoff = 0, cumulated = 0, overall_size = 0; + widest_int overall_time = 0, cumulated = 0; + gcov_type overall_size = 0; struct cgraph_node *n,*n2; int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0; int nmismatch = 0, nimpossible = 0; @@ -775,37 +776,44 @@ ipa_profile (void) dump_histogram (dump_file, histogram); for (i = 0; i < (int)histogram.length (); i++) { - overall_time += histogram[i]->count * histogram[i]->time; + overall_time += ((widest_int)histogram[i]->count) * histogram[i]->time; + /* Watch for overflow. */ + gcc_assert (overall_time >= 0); overall_size += histogram[i]->size; } threshold = 0; - if (overall_time) + if (overall_time != 0) { gcc_assert (overall_size); - cutoff = (overall_time * param_hot_bb_count_ws_permille + 500) / 1000; + widest_int cutoff = ((overall_time * param_hot_bb_count_ws_permille + + 500) / 1000); + /* Watch for overflow. */ + gcc_assert (cutoff >= 0); for (i = 0; cumulated < cutoff; i++) { - cumulated += histogram[i]->count * histogram[i]->time; + cumulated += ((widest_int)histogram[i]->count) * histogram[i]->time; threshold = histogram[i]->count; } if (!threshold) threshold = 1; if (dump_file) { - gcov_type cumulated_time = 0, cumulated_size = 0; + widest_int cumulated_time = 0; + gcov_type cumulated_size = 0; for (i = 0; i < (int)histogram.length () && histogram[i]->count >= threshold; i++) { - cumulated_time += histogram[i]->count * histogram[i]->time; + cumulated_time += ((widest_int)histogram[i]->count) +* histogram[i]->time; cumulated_size += histogram[i]->size; } fprintf (dump_file, "Determined min count: %" PRId64 " Time:%3.2f%% Size:%3.2f%%\n", (int64_t)threshold, - cumulated_time * 100.0 / overall_time, + (cumulated_time * 1 / overall_time).to_shwi () / 100.0, cumulated_size * 100.0 / overall_size); }
[gcc r16-1605] Add GUESSED_GLOBAL0_AFDO
https://gcc.gnu.org/g:eb8ee105706569c9a03f3de9519f6ab8006c3f1e commit r16-1605-geb8ee105706569c9a03f3de9519f6ab8006c3f1e Author: Jan Hubicka Date: Sun Jun 22 03:12:55 2025 +0200 Add GUESSED_GLOBAL0_AFDO This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can be used to preserve local counts of functions which have 0 AFDO profile. I originally did not include it as it was not clear it will be useful and it turns quality from 3bits to 4bits which means that we need to steal another bit from the actual counters. It is likely not a problem for profile_count since counting up to 2^60 still takes a while. However with profile_probability I run into a problem that gimple FE testcases encodes probability with current representation and thus changing profile_probability::n_bits would require updating all testcases. Since probabilities never use GLOBAL0 qualities (they are always local) addoing bits is not necessary, but it requires encoding quality in the data type that required adding accessors. While working on this I also noticed that GUESSED_GLOBAL0 and GUESSED_GLOBAL0_ADJUSED are misordered. Qualities should be in increasing order. This is also fixed. auto-profile will be updated later. Bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: * cgraph.cc (cgraph_node::make_profile_global0): Support GUESSED_GLOBAL0_AFDO * ipa-cp.cc (update_profiling_info): Use GUESSED_GLOBAL0_AFDO. * profile-count.cc (profile_probability::dump): Use quality (). (profile_probability::stream_in): Use m_adjusted_quality. (profile_probability::stream_out): Use m_adjusted_quality. (profile_count::combine_with_ipa_count): Use quality (). (profile_probability::sqrt): Likewise. * profile-count.h (enum profile_quality): Add GUESSED_GLOBAL0_AFDO; reoder GUESSED_GLOBAL0_ADJUSTED and GUESSED_GLOBAL0. (profile_probability): Add min_quality; replase m_quality by m_adjused_quality; add set_quality; update all users of quality. (profile_count): Set n_bits to 60; make m_quality 4 bits; update uses of quality. (profile_count::afdo_zero, profile_count::globa0afdo): New. Diff: --- gcc/cgraph.cc| 15 - gcc/ipa-cp.cc| 4 ++ gcc/profile-count.cc | 22 --- gcc/profile-count.h | 163 --- 4 files changed, 133 insertions(+), 71 deletions(-) diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index 2441c9e866d5..94a2e6e61058 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -200,7 +200,8 @@ cgraph_node::make_profile_local () } /* Turn profile to global0. Walk into inlined functions. - QUALITY must be GUESSED_GLOBAL0 or GUESSED_GLOBAL0_ADJUSTED */ + QUALITY must be GUESSED_GLOBAL0, GUESSED_GLOBAL0_ADJUSTED + or GUESSED_GLOBAL0_AFDO */ void cgraph_node::make_profile_global0 (profile_quality quality) { @@ -219,6 +220,14 @@ cgraph_node::make_profile_global0 (profile_quality quality) return; count = count.global0adjusted (); } + else if (quality == GUESSED_GLOBAL0_AFDO) +{ + if (count.quality () == GUESSED_GLOBAL0 + || count.quality () == GUESSED_GLOBAL0_ADJUSTED + || count.quality () == GUESSED_GLOBAL0_AFDO) + return; + count = count.global0afdo (); +} else gcc_unreachable (); for (cgraph_edge *e = callees; e; e = e->next_callee) @@ -231,6 +240,8 @@ cgraph_node::make_profile_global0 (profile_quality quality) e->count = e->count.global0 (); else if (quality == GUESSED_GLOBAL0_ADJUSTED) e->count = e->count.global0adjusted (); + else if (quality == GUESSED_GLOBAL0_AFDO) + e->count = e->count.global0afdo (); else gcc_unreachable (); } @@ -241,6 +252,8 @@ cgraph_node::make_profile_global0 (profile_quality quality) e->count = e->count.global0 (); else if (quality == GUESSED_GLOBAL0_ADJUSTED) e->count = e->count.global0adjusted (); +else if (quality == GUESSED_GLOBAL0_AFDO) + e->count = e->count.global0afdo (); else gcc_unreachable (); } diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 3bf0117612e5..3e073af662a6 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -4773,6 +4773,8 @@ update_profiling_info (struct cgraph_node *orig_node, to global0adjusted or to local if we have partial training. */ if (opt_for_fn (orig_node->decl, flag_profile_partial_training)) orig_node->make_profile_local (); + if (new_sum.quality () == AFDO) + orig_node->make_profile_global0 (GUESSED_GLOBAL0_AFDO); else orig_node->make_profile_global0 (GUESSED_GLOBAL0_ADJUSTED); orig_edges_processed = true; @@ -480
[gcc r16-1608] [PR modula2/120731] error in Strings.Pos causing sigsegv
https://gcc.gnu.org/g:fc276742e0db337c4d13e6c474abafd4796a6b69 commit r16-1608-gfc276742e0db337c4d13e6c474abafd4796a6b69 Author: Gaius Mulley Date: Sun Jun 22 04:13:26 2025 +0100 [PR modula2/120731] error in Strings.Pos causing sigsegv This patch corrects the m2log library procedure function Strings.Pos which incorrectly sliced the wrong component of the source string. The incorrect slice could cause a sigsegv if negative slice indices were generated. gcc/m2/ChangeLog: PR modula2/120731 * gm2-libs-log/Strings.def (Delete): Rewrite comment. * gm2-libs-log/Strings.mod (Pos): Rewrite. (PosLower): New procedure function. gcc/testsuite/ChangeLog: PR modula2/120731 * gm2/pimlib/logitech/run/pass/teststrings.mod: New test. Signed-off-by: Gaius Mulley Diff: --- gcc/m2/gm2-libs-log/Strings.def| 4 +- gcc/m2/gm2-libs-log/Strings.mod| 77 ++ .../gm2/pimlib/logitech/run/pass/teststrings.mod | 16 + 3 files changed, 69 insertions(+), 28 deletions(-) diff --git a/gcc/m2/gm2-libs-log/Strings.def b/gcc/m2/gm2-libs-log/Strings.def index aea35f894889..2be4e4236d73 100644 --- a/gcc/m2/gm2-libs-log/Strings.def +++ b/gcc/m2/gm2-libs-log/Strings.def @@ -53,7 +53,9 @@ PROCEDURE Delete (VAR str: ARRAY OF CHAR; index: CARDINAL; length: CARDINAL) ; (* - Pos - return the first position of, substr, in, str. + Pos - return the first position of substr in str. + If substr is not found in str then it returns + HIGH (str) + 1. *) PROCEDURE Pos (substr, str: ARRAY OF CHAR) : CARDINAL ; diff --git a/gcc/m2/gm2-libs-log/Strings.mod b/gcc/m2/gm2-libs-log/Strings.mod index 6046a102a52e..44f47b322505 100644 --- a/gcc/m2/gm2-libs-log/Strings.mod +++ b/gcc/m2/gm2-libs-log/Strings.mod @@ -83,39 +83,62 @@ END Delete ; (* - Pos - return the first position of, substr, in, str. + PosLower - return the first position of substr in str. *) -PROCEDURE Pos (substr, str: ARRAY OF CHAR) : CARDINAL ; +PROCEDURE PosLower (substr, str: ARRAY OF CHAR) : CARDINAL ; VAR - i, k, l : INTEGER ; - s1, s2, s3: DynamicStrings.String ; + i, strLen, substrLen : INTEGER ; + strS, substrS, scratchS: DynamicStrings.String ; BEGIN - s1 := DynamicStrings.InitString(str) ; - s2 := DynamicStrings.InitString(substr) ; - k := DynamicStrings.Length(s1) ; - l := DynamicStrings.Length(s2) ; + strS := DynamicStrings.InitString (str) ; + substrS := DynamicStrings.InitString (substr) ; + strLen := DynamicStrings.Length (strS) ; + substrLen := DynamicStrings.Length (substrS) ; i := 0 ; REPEAT - i := DynamicStrings.Index(s1, DynamicStrings.char(s2, 0), i) ; - IF i>=0 + i := DynamicStrings.Index (strS, DynamicStrings.char (substrS, 0), i) ; + IF i < 0 + THEN + (* No match on first character therefore return now. *) + strS := DynamicStrings.KillString (strS) ; + substrS := DynamicStrings.KillString (substrS) ; + scratchS := DynamicStrings.KillString (scratchS) ; + RETURN( HIGH (str) + 1 ) + ELSIF i + substrLen <= strLen THEN - s3 := DynamicStrings.Slice(s1, i, l) ; - IF DynamicStrings.Equal(s3, s2) + scratchS := DynamicStrings.Slice (strS, i, i + substrLen) ; + IF DynamicStrings.Equal (scratchS, substrS) THEN -s1 := DynamicStrings.KillString(s1) ; -s2 := DynamicStrings.KillString(s2) ; -s3 := DynamicStrings.KillString(s3) ; +strS := DynamicStrings.KillString (strS) ; +substrS := DynamicStrings.KillString (substrS) ; +scratchS := DynamicStrings.KillString (scratchS) ; RETURN( i ) END ; - s3 := DynamicStrings.KillString(s3) + scratchS := DynamicStrings.KillString (scratchS) END ; - INC(i) - UNTIL i>=k ; - s1 := DynamicStrings.KillString(s1) ; - s2 := DynamicStrings.KillString(s2) ; - s3 := DynamicStrings.KillString(s3) ; - RETURN( HIGH(str)+1 ) + INC (i) + UNTIL i >= strLen ; + strS := DynamicStrings.KillString (strS) ; + substrS := DynamicStrings.KillString (substrS) ; + scratchS := DynamicStrings.KillString (scratchS) ; + RETURN( HIGH (str) + 1 ) +END PosLower ; + + +(* + Pos - return the first position of substr in str. + If substr is not found in str then it returns + HIGH (str) + 1. +*) + +PROCEDURE Pos (substr, str: ARRAY OF CHAR) : CARDINAL ; +BEGIN + IF Length (substr) <= Length (str) + THEN + RETURN PosLower (substr, str) + END ; + RETURN( HIGH (str) + 1 ) END Pos ; @@ -129,11 +152,11 @@ PROCEDURE Copy (str: ARRAY OF CHAR; VAR s1, s2: DynamicStrings.String ; BEGIN - s1 := DynamicStrings.InitString(str) ; - s2 := DynamicStrings.Slice(s1, index, index+length) ; - DynamicStrings.CopyOut(res
[gcc(refs/users/omachota/heads/rtl-ssa-dce)] rtl-ssa-dce: add timevar
https://gcc.gnu.org/g:eb73be9bde7c752273663162786b60cde41328f2 commit eb73be9bde7c752273663162786b60cde41328f2 Author: Ondřej Machota Date: Sat Jun 21 14:36:30 2025 +0200 rtl-ssa-dce: add timevar Diff: --- gcc/dce.cc | 33 + gcc/passes.def | 4 ++-- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/gcc/dce.cc b/gcc/dce.cc index 77ff0a4d2791..1aa5c6539664 100644 --- a/gcc/dce.cc +++ b/gcc/dce.cc @@ -1323,7 +1323,8 @@ make_pass_fast_rtl_dce (gcc::context *ctxt) return new pass_fast_rtl_dce (ctxt); } -// offset_bitmap is a wrapper around sbitmap which takes care of negative indices +namespace { +// offset_bitmap is a wrapper around sbitmap that also handles negative indices from RTL SSA struct offset_bitmap { private: @@ -1381,7 +1382,7 @@ private: void sweep (); offset_bitmap m_marked; - sbitmap mm_marked_phis; + sbitmap m_marked_phis; }; bool @@ -1412,7 +1413,7 @@ rtl_ssa_dce::can_delete_call (const_rtx insn) bool rtl_ssa_dce::is_rtx_prelive (const_rtx insn) { - gcc_assert (insn != nullptr); + gcc_checking_assert (insn != nullptr); if (CALL_P (insn) // We cannot delete pure or const sibling calls because it is @@ -1481,7 +1482,7 @@ rtl_ssa_dce::is_prelive (insn_info *insn) if (def->is_mem ()) return true; - gcc_assert (def->is_reg ()); + gcc_checking_assert (def->is_reg ()); // We should not delete the frame pointer because of the dward2frame pass if (frame_pointer_needed && def->regno () == HARD_FRAME_POINTER_REGNUM) @@ -1491,7 +1492,7 @@ rtl_ssa_dce::is_prelive (insn_info *insn) if (def->kind () == access_kind::CLOBBER) continue; - gcc_assert (def->kind () == access_kind::SET); + gcc_checking_assert (def->kind () == access_kind::SET); // Needed by gcc.c-torture/execute/pr51447.c if (HARD_REGISTER_NUM_P (def->regno ()) && global_regs[def->regno ()]) @@ -1572,10 +1573,10 @@ rtl_ssa_dce::mark () phi_info *phi = static_cast (set); auto phi_uid = phi->uid (); // Skip already visited phi node. - if (bitmap_bit_p(mm_marked_phis, phi_uid)) + if (bitmap_bit_p(m_marked_phis, phi_uid)) continue; -bitmap_set_bit (mm_marked_phis, phi_uid); +bitmap_set_bit (m_marked_phis, phi_uid); uses = phi->inputs (); } @@ -1629,7 +1630,7 @@ rtl_ssa_dce::reset_dead_debug () bool is_parent_marked = false; if (parent_insn->is_phi ()) { auto phi = static_cast (def); - is_parent_marked = bitmap_bit_p(mm_marked_phis, phi->uid ()); + is_parent_marked = bitmap_bit_p(m_marked_phis, phi->uid ()); } else { is_parent_marked = m_marked.get_bit(parent_insn->uid ()); } @@ -1654,7 +1655,8 @@ rtl_ssa_dce::sweep () // Previously created debug instructions won't be visited here for (insn_info *insn : crtl->ssa->nondebug_insns ()) { - // Artificial or marked insns should not be deleted. + // Artificial (bb_head, bb_end, phi), marked or debug instructions + // should not be deleted. if (insn->is_artificial () || m_marked.get_bit (insn->uid ())) continue; @@ -1679,6 +1681,7 @@ rtl_ssa_dce::sweep () unsigned int rtl_ssa_dce::execute (function *fn) { + auto_timevar timer(TV_RTL_SSA_DCE); calculate_dominance_info (CDI_DOMINATORS); crtl->ssa = new rtl_ssa::function_info (fn); int real_max = 0, artificial_min = 0; @@ -1695,10 +1698,10 @@ rtl_ssa_dce::execute (function *fn) unsigned int phi_node_max = 0; for (ebb_info *ebb : crtl->ssa->ebbs ()) for (phi_info *phi : ebb->phis ()) -phi_node_max = std::max (phi_node_max, phi->uid ()); + phi_node_max = std::max (phi_node_max, phi->uid ()); - mm_marked_phis = sbitmap_alloc(phi_node_max + 1); - bitmap_clear(mm_marked_phis); + m_marked_phis = sbitmap_alloc(phi_node_max + 1); + bitmap_clear(m_marked_phis); mark (); if (MAY_HAVE_DEBUG_BIND_INSNS) @@ -1709,19 +1712,17 @@ rtl_ssa_dce::execute (function *fn) if (crtl->ssa->perform_pending_updates ()) cleanup_cfg (0); - sbitmap_free(mm_marked_phis); + sbitmap_free(m_marked_phis); delete crtl->ssa; crtl->ssa = nullptr; return 0; } -namespace { - const pass_data pass_data_rtl_ssa_dce = { RTL_PASS, /* type */ "rtl_ssa_dce", /* name */ OPTGROUP_NONE, /* optinfo_flags */ - TV_DCE,/* tv_id */ + TV_RTL_SSA_DCE,/* tv_id */ 0, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ diff --git a/gcc/passes.def b/gcc/passes.def index a4cd1eb741b2..9b3b880b4cc9 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -489,7 +489,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_inc_dec); NEXT_PASS (pass_initialize_regs); NEXT_PASS (pass_rtl_ssa_dce); - // __NEXT_PAS (pass_ud_rtl_dce); + // NEXT_PASS (
[gcc r16-1602] [modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod
https://gcc.gnu.org/g:7a7cc65b8987b9b05fb8fb75824e2000861e6c30 commit r16-1602-g7a7cc65b8987b9b05fb8fb75824e2000861e6c30 Author: Gaius Mulley Date: Sat Jun 21 17:18:04 2025 +0100 [modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod This patch reformats three comments in the GNU GCC style. gcc/m2/ChangeLog: * gm2-compiler/M2GCCDeclare.mod (StartDeclareModuleScopeSeparate): Reformat statement comments. (StartDeclareModuleScopeWholeProgram): Ditto. Signed-off-by: Gaius Mulley Diff: --- gcc/m2/gm2-compiler/M2GCCDeclare.mod | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/gcc/m2/gm2-compiler/M2GCCDeclare.mod b/gcc/m2/gm2-compiler/M2GCCDeclare.mod index be80695d3e8a..860a89ac8b0c 100644 --- a/gcc/m2/gm2-compiler/M2GCCDeclare.mod +++ b/gcc/m2/gm2-compiler/M2GCCDeclare.mod @@ -2927,14 +2927,12 @@ BEGIN DeclareTypesConstantsProcedures(scope) ; (* will resolved TYPEs and CONSTs on the ToDo *) (* lists. *) ForeachModuleDo(DeclareProcedure) ; - (* - now that all types have been resolved it is safe to declare - variables - *) + (* Now that all types have been resolved it is safe to declare + variables. *) AssertAllTypesDeclared(scope) ; DeclareGlobalVariables(scope) ; ForeachImportedDo(scope, DeclareImportedVariables) ; - (* now it is safe to declare all procedures *) + (* Now it is safe to declare all procedures. *) ForeachProcedureDo(scope, DeclareProcedure) ; ForeachInnerModuleDo(scope, WalkTypesInModule) ; ForeachInnerModuleDo(scope, DeclareTypesConstantsProcedures) ; @@ -2963,14 +2961,12 @@ BEGIN (* lists. *) ForeachModuleDo(DeclareProcedure) ; ForeachModuleDo(DeclareModuleInit) ; - (* - now that all types have been resolved it is safe to declare - variables - *) + (* Now that all types have been resolved it is safe to declare + variables. *) AssertAllTypesDeclared(scope) ; DeclareGlobalVariablesWholeProgram(scope) ; ForeachImportedDo(scope, DeclareImportedVariablesWholeProgram) ; - (* now it is safe to declare all procedures *) + (* Now it is safe to declare all procedures. *) ForeachProcedureDo(scope, DeclareProcedure) ; ForeachInnerModuleDo(scope, WalkTypesInModule) ; ForeachInnerModuleDo(scope, DeclareTypesConstantsProcedures) ;
[gcc r16-1603] Fix profile after fnsplit
https://gcc.gnu.org/g:cd589516b12e28ee30aefc4c51500f634f1b888e commit r16-1603-gcd589516b12e28ee30aefc4c51500f634f1b888e Author: Jan Hubicka Date: Sat Jun 21 22:29:50 2025 +0200 Fix profile after fnsplit when splitting functions, tree-inline determined correctly entry count of the new function part, but then in case entry block of new function part is in a loop it scales body which is not suposed to happen. * tree-inline.cc (copy_cfg_body): Fix profile of split functions. Diff: --- gcc/tree-inline.cc | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc index b25705173433..dee2dfc26206 100644 --- a/gcc/tree-inline.cc +++ b/gcc/tree-inline.cc @@ -3084,7 +3084,7 @@ copy_cfg_body (copy_body_data * id, /* Register specific tree functions. */ gimple_register_cfg_hooks (); - /* If we are inlining just region of the function, make sure to connect + /* If we are offlining region of the function, make sure to connect new entry to ENTRY_BLOCK_PTR_FOR_FN (cfun). Since new entry can be part of loop, we must compute frequency and probability of ENTRY_BLOCK_PTR_FOR_FN (cfun) based on the frequencies and @@ -3093,12 +3093,14 @@ copy_cfg_body (copy_body_data * id, { edge e; edge_iterator ei; - den = profile_count::zero (); + ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero (); FOR_EACH_EDGE (e, ei, new_entry->preds) if (!e->src->aux) - den += e->count (); - ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = den; + ENTRY_BLOCK_PTR_FOR_FN (cfun)->count += e->count (); + /* Do not scale - the profile of offlined region should +remain unchanged. */ + num = den = profile_count::one (); } profile_count::adjust_for_ipa_scaling (&num, &den);
[gcc(refs/users/aoliva/heads/testme)] [lra] propagate fp2sp elimination offset after disabling it [PR120424]
https://gcc.gnu.org/g:f5b3ed79efe40bae658954edd2e94f1a7dd29ec1 commit f5b3ed79efe40bae658954edd2e94f1a7dd29ec1 Author: Alexandre Oliva Date: Sat Jun 21 05:58:22 2025 -0300 [lra] propagate fp2sp elimination offset after disabling it [PR120424] Deactivating the fp2sp elimination early causes __do_global_ctors_aux on arm-linux-gnueabihf to be miscompiled (with -fnon-call-exceptions -fstack-clash-protection) because, after fp2sp is disabled, we'd choose another elimination and use its -1 prev_offset. But keeping it active causes e.g. pr103973-18.c to be miscompiled on x86_64-linux-gnu (with ix86_frame_pointer_required modified to return true when ix86_get_frame_size () > 0, to exercise the fp2sp elimination deactivation), because two adjacent pushes of two adjacent temporaries end up pushing the same slot, because one of them gets offset by the sp variation. The solution is to keep the early disabling of the elimination, which fixes x86_64, but retain the code to propagate the offset to the next FP elimination, which fixes arm. (I was concerned that disabling the fp2sp elimination too soon could drop offsets already applied to fp, that would have to be reverted or passed on to the next fp elimination, but we know the elimination was not used, and if there were any insns or offsets to propagate, the previous round of lra_eliminate, in which update_reg_eliminations would have selected the fp2sp elimination, would also have set elimination_fp2sp_occured if any insn used the fp2sp elimination's offset. Since we know there weren't any, because elimination_fp2sp_occured is clear, we know other fp eliminations can safely use their initial offsets if needed.) Diff: --- gcc/lra-eliminations.cc | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index f8a761e6eeaa..400857587b3a 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -312,6 +312,11 @@ move_plus_up (rtx x) /* Flag that we already did frame pointer to stack pointer elimination. */ static bool elimination_fp2sp_occured_p = false; +/* Hold the fp2sp elimination that was disabled and inactivated, but + whose offsets we wish to propagate to any subsequently-chosen fp + elimination. */ +static class lra_elim_table *disabled_fp2sp_elimination; + /* Scan X and replace any eliminable registers (such as fp) with a replacement (such as sp) if SUBST_P, plus an offset. The offset is a change in the offset between the eliminable register and its @@ -1185,7 +1190,9 @@ update_reg_eliminate (bitmap insns_with_changed_offsets) setup_can_eliminate (ep, false); continue; } - if (!ep->can_eliminate && elimination_map[ep->from] == ep) + if (!ep->can_eliminate + && (elimination_map[ep->from] == ep + || disabled_fp2sp_elimination == ep)) { /* We cannot use this elimination anymore -- find another one. */ @@ -1233,6 +1240,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets) INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->offset); } + disabled_fp2sp_elimination = NULL; setup_elimination_map (); result = false; CLEAR_HARD_REG_SET (temp_hard_reg_set); @@ -1319,6 +1327,7 @@ init_elimination (void) rtx_insn *insn; class lra_elim_table *ep; + disabled_fp2sp_elimination = NULL; init_elim_table (); FOR_EACH_BB_FN (bb, cfun) { @@ -1432,7 +1441,13 @@ lra_update_fp2sp_elimination (void) during update_reg_eliminate. */ ep = elimination_map[FRAME_POINTER_REGNUM]; if (ep->to == STACK_POINTER_REGNUM) -setup_can_eliminate (ep, false); +{ + /* Avoid using this known-unused elimination when removing +spilled pseudos. */ + elimination_map[FRAME_POINTER_REGNUM] = NULL; + disabled_fp2sp_elimination = ep; + setup_can_eliminate (ep, false); +} else for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
[gcc(refs/users/aoliva/heads/testme)] [lra] simplify disabling of fp2sp elimination [PR120424]
https://gcc.gnu.org/g:70563b8f84af32bbe52e5759ed36b5bca3180e43 commit 70563b8f84af32bbe52e5759ed36b5bca3180e43 Author: Alexandre Oliva Date: Thu Jun 19 11:05:36 2025 -0300 [lra] simplify disabling of fp2sp elimination [PR120424] Whether with or without the lra fp2sp elimination accumulated improvements, building a native arm-linux-gnueabihf toolchain with {BOOT_CFLAGS,TFLAGS}='-O2 -g -fnon-call-exceptions -fstack-clash-protection' doesn't get very far: crtbegin.o gets miscompiled in __do_global_dtors_aux, as spilled pseudos get assigned stack slots that get incorrectly adjusted after the fp2sp elimination is disabled. AFAICT eliminations are reversible before the final round, and ISTM that deferring spilling of registers in disabled eliminations to update_reg_eliminate would avoid the incorrect adjustments I saw in spilling within or after lra_update_fp2sp_elimination: the logic to deal with them was already there, we just have to set the stage for it to do its job. Diff: --- gcc/lra-eliminations.cc | 31 +-- gcc/lra-int.h | 2 +- gcc/lra-spills.cc | 13 +++-- 3 files changed, 17 insertions(+), 29 deletions(-) diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index 6663d1c37e8b..f8a761e6eeaa 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -1405,30 +1405,18 @@ process_insn_for_elimination (rtx_insn *insn, bool final_p, bool first_p) permitted frame pointer elimination and now target reports that we can not do this elimination anymore. Record spilled pseudos in SPILLED_PSEUDOS unless it is null, and return the recorded pseudos number. */ -int -lra_update_fp2sp_elimination (int *spilled_pseudos) +void +lra_update_fp2sp_elimination (void) { - int n; - HARD_REG_SET set; class lra_elim_table *ep; if (frame_pointer_needed || !targetm.frame_pointer_required ()) -return 0; +return; gcc_assert (!elimination_fp2sp_occured_p); - ep = elimination_map[FRAME_POINTER_REGNUM]; - if (ep->to == STACK_POINTER_REGNUM) -{ - elimination_map[FRAME_POINTER_REGNUM] = NULL; - setup_can_eliminate (ep, false); -} - else -ep = NULL; if (lra_dump_file != NULL) fprintf (lra_dump_file, " Frame pointer can not be eliminated anymore\n"); frame_pointer_needed = true; - CLEAR_HARD_REG_SET (set); - add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM); /* If !lra_reg_spill_p, we likely have incomplete range information for pseudos assigned to the frame pointer that will have to be spilled, and so we may end up incorrectly sharing them unless we @@ -1437,12 +1425,19 @@ lra_update_fp2sp_elimination (int *spilled_pseudos) /* If lives ranges changed, update the aggregate live ranges in slots as well before spilling any further pseudos. */ lra_recompute_slots_live_ranges (); - n = spill_pseudos (set, spilled_pseudos); - if (!ep) + /* Conceivably, we wouldn't need to disable the fp2sp elimination + unless the target says so, but we have to vacate the frame + pointer register to make room for the frame pointer proper, and + disabling the elimination will spill everything assigned to it + during update_reg_eliminate. */ + ep = elimination_map[FRAME_POINTER_REGNUM]; + if (ep->to == STACK_POINTER_REGNUM) +setup_can_eliminate (ep, false); + else for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM) setup_can_eliminate (ep, false); - return n; + return; } /* Return true if we have a pseudo assigned to hard frame pointer. */ diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 0cf7266ce646..05d28af7c9e0 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -433,7 +433,7 @@ extern int lra_get_elimination_hard_regno (int); extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode, bool, bool, poly_int64, bool); extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64); -extern int lra_update_fp2sp_elimination (int *spilled_pseudos); +extern void lra_update_fp2sp_elimination (void); extern bool lra_fp_pseudo_p (void); extern void lra_eliminate (bool, bool); diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc index 7603d0dcf163..494ef3c96c8e 100644 --- a/gcc/lra-spills.cc +++ b/gcc/lra-spills.cc @@ -658,7 +658,7 @@ lra_need_for_spills_p (void) void lra_spill (void) { - int i, n, n2, curr_regno; + int i, n, curr_regno; int *pseudo_regnos; regs_num = max_reg_num (); @@ -685,15 +685,8 @@ lra_spill (void) for (i = 0; i < n; i++) if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX) assign_mem_slot (pseudo_regnos[i]); - if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0) -{ - /* Assign stack slots to spilled pseudos assigned to fp. */ -
[gcc/aoliva/heads/testme] (4 commits) [lra] simplify disabling of fp2sp elimination [PR120424]
The branch 'aoliva/heads/testme' was updated to point to: 70563b8f84af... [lra] simplify disabling of fp2sp elimination [PR120424] It previously pointed to: 95913b192448... [lra] simplify disabling of fp2sp elimination [PR120424] Diff: !!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST): --- 95913b1... [lra] simplify disabling of fp2sp elimination [PR120424] 6eb0e6b... [lra] recompute ranges upon disabling fp2sp elimination [PR 329376c... [genoutput] mark scratch outputs as eliminable [PR120424] 1498558... [lra] inactivate disabled fp2sp elimination [PR120424] Summary of changes (added commits): --- 70563b8... [lra] simplify disabling of fp2sp elimination [PR120424] ed2c07a... [lra] recompute ranges upon disabling fp2sp elimination [PR 89620fe... [genoutput] mark scratch outputs as eliminable [PR120424] 440d462... [lra] inactivate disabled fp2sp elimination [PR120424]
[gcc(refs/users/aoliva/heads/testme)] [genoutput] mark scratch outputs as eliminable [PR120424]
https://gcc.gnu.org/g:89620fe7db2e9d3360a18a402590ca0c6cfe18e1 commit 89620fe7db2e9d3360a18a402590ca0c6cfe18e1 Author: Alexandre Oliva Date: Wed Jun 18 04:13:19 2025 -0300 [genoutput] mark scratch outputs as eliminable [PR120424] acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2 -fstack-clash-protection -march=armv7-a -marm: a clobbered scratch register in a *iorsi3_compare0_scratch pattern gets initially assigned to the frame pointer register, but at some point during lra the frame size grows to nonzero, arm_frame_pointer_required flips to true, and the fp2sp elimination has to be disabled, so the scratch register gets spilled to a stack slot. It needs to get the sfp elimination at that point, because later rounds of elimination will assume the previous round's offset has already been applied. But since scratch matches are not regarded as eliminable by genoutput, we don't attempt elimination in the clobbered stack slot MEM rtx. Later on, lra issues a reload for that slot, using a new pseudo allocated to a hardware register, that gets stored in the stack slot after the original insn. Elimination in that reload store insn eventually updates the elimination offset, but it's an incremental update, assuming that the offset so far has already been applied. Without applying the initial offset, the store ends up overlapping with the function's register save area, corrupting a caller's call-saved register. AFAICT the old reload's elimination wouldn't be harmed by allowing elimination in scratch operands, so I'm enabling eliminable for them regardless. Should it be found to make a difference, we could presumably set a different bit in eliminable to enable reload and lra to tell them apart and behave accordingly. for gcc/ChangeLog PR rtl-optimization/120424 * genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable. Diff: --- gcc/genoutput.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc index dd4e7b80c2a9..25d0b8b86467 100644 --- a/gcc/genoutput.cc +++ b/gcc/genoutput.cc @@ -478,7 +478,7 @@ scan_operands (class data *d, rtx part, int this_address_p, d->operand[opno].n_alternatives = n_occurrences (',', d->operand[opno].constraint) + 1; d->operand[opno].address_p = 0; - d->operand[opno].eliminable = 0; + d->operand[opno].eliminable = 1; return; case MATCH_OPERATOR:
[gcc(refs/users/aoliva/heads/testme)] [lra] inactivate disabled fp2sp elimination [PR120424]
https://gcc.gnu.org/g:440d462d321c0e7026a6eb8eb550b1f444eeb932 commit 440d462d321c0e7026a6eb8eb550b1f444eeb932 Author: Alexandre Oliva Date: Fri Jun 6 02:03:31 2025 -0300 [lra] inactivate disabled fp2sp elimination [PR120424] Even after we disable the fp2sp elimination when it is the active elimination for the fp, spilling might use it before update_reg_eliminate runs and inactivates it for good. If it is used, update_reg_eliminate will fail the check that fp2sp was not used. Since we keep track of uses of this specific elimination, and lra_update_fp2sp_elimination checks it before disabling it, we know it hasn't been used, so we can inactivate it without any ill effects. This fixes the pr118591-1.c avr-none regression exposed by the PR120424 fix. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Inactivate the unused fp2sp elimination right away. Diff: --- gcc/lra-eliminations.cc | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index bb708b007a4e..ff05501dbbbd 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -1415,6 +1415,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos) if (frame_pointer_needed || !targetm.frame_pointer_required ()) return 0; gcc_assert (!elimination_fp2sp_occured_p); + ep = elimination_map[FRAME_POINTER_REGNUM]; + if (ep->to == STACK_POINTER_REGNUM) +{ + elimination_map[FRAME_POINTER_REGNUM] = NULL; + setup_can_eliminate (ep, false); +} + else +ep = NULL; if (lra_dump_file != NULL) fprintf (lra_dump_file, " Frame pointer can not be eliminated anymore\n"); @@ -1422,9 +1430,10 @@ lra_update_fp2sp_elimination (int *spilled_pseudos) CLEAR_HARD_REG_SET (set); add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM); n = spill_pseudos (set, spilled_pseudos); - for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) -if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM) - setup_can_eliminate (ep, false); + if (!ep) +for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) + if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM) + setup_can_eliminate (ep, false); return n; }
[gcc r16-1601] [RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V
https://gcc.gnu.org/g:49199bb29628365fc6c60bd185808a1bad65086d commit r16-1601-g49199bb29628365fc6c60bd185808a1bad65086d Author: Jeff Law Date: Sat Jun 21 08:24:58 2025 -0600 [RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V The RISC-V prefetch support is broken in a few ways. This addresses the data side prefetch problems. I'd mistakenly thought this BZ was a prefetch.i related (which has deeper problems). The basic problem is we were accepting any valid address when in fact there are restrictions. This patch more precisely defines the predicate such that we allow REG REG+D Where D must have the low 5 bits clear. Note that absolute addresses fall into the REG+D form using the x0 for the register operand since it always has the value zero. The test verifies REG, REG+D, ABS addressing modes that are valid as well as REG+D and ABS which must be reloaded into a REG because the displacement has low bits set. An earlier version of this patch has gone through testing in my tester on rv32 and rv64. Obviously I'll wait for pre-commit CI to do its thing before moving forward. This is a good backport candidate after simmering on the trunk for a bit. PR target/118241 gcc/ * config/riscv/predicates.md (prefetch_operand): New predicate. * config/riscv/constraints.md (Q): New constraint. * config/riscv/riscv.md (prefetch): Use new predicate and constraint. (riscv_prefetchi_): Similarly. gcc/testsuite/ * gcc.target/riscv/pr118241.c: New test. Diff: --- gcc/config/riscv/constraints.md | 4 gcc/config/riscv/predicates.md| 12 gcc/config/riscv/riscv.md | 4 ++-- gcc/testsuite/gcc.target/riscv/pr118241.c | 16 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index 58355cf03f2f..ccab1a2e29df 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -325,3 +325,7 @@ "A 2-bit unsigned immediate." (and (match_code "const_int") (match_test "IN_RANGE (ival, 0, 3)"))) + +(define_constraint "Q" + "An address operand that is valid for a prefetch instruction" + (match_operand 0 "prefetch_operand")) diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 23690792b32e..8072d67fbd97 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -27,6 +27,18 @@ (ior (match_operand 0 "const_arith_operand") (match_operand 0 "register_operand"))) +;; REG or REG+D where D fits in a simm12 and has the low 4 bits +;; off. The REG+D form can be reloaded into a temporary if needed +;; after FP elimination if that exposes an invalid offset. +(define_predicate "prefetch_operand" + (ior (match_operand 0 "register_operand") + (and (match_test "const_arith_operand (op, VOIDmode)") + (match_test "(INTVAL (op) & 0xf) == 0")) + (and (match_code "plus") + (match_test "register_operand (XEXP (op, 0), word_mode)") + (match_test "const_arith_operand (XEXP (op, 1), VOIDmode)") + (match_test "(INTVAL (XEXP (op, 1)) & 0xf) == 0" + (define_predicate "lui_operand" (and (match_code "const_int") (match_test "LUI_OPERAND (INTVAL (op))"))) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 3aed25c25880..3406b50518ed 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -4402,7 +4402,7 @@ ) (define_insn "prefetch" - [(prefetch (match_operand 0 "address_operand" "r") + [(prefetch (match_operand 0 "prefetch_operand" "Q") (match_operand 1 "imm5_operand" "i") (match_operand 2 "const_int_operand" "n"))] "TARGET_ZICBOP" @@ -4422,7 +4422,7 @@ (const_string "4")))]) (define_insn "riscv_prefetchi_" - [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r") + [(unspec_volatile:X [(match_operand:X 0 "prefetch_operand" "Q") (match_operand:X 1 "imm5_operand" "i")] UNSPECV_PREI)] "TARGET_ZICBOP" diff --git a/gcc/testsuite/gcc.target/riscv/pr118241.c b/gcc/testsuite/gcc.target/riscv/pr118241.c new file mode 100644 index ..f1dc44bce0ca --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr118241.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zicbop" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_zicbop" { target { rv32 } } } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void test1() { __builtin_prefetch((int *)2047); } +void test2() { __builtin_prefetch((int *)1024); } +void test3(char *x) { __builtin_prefetch(&x); } +void test4(char *x) { __builtin_prefetch(&x[2]); } +void test5(char *x) { __builtin_prefetch(&x
[gcc(refs/users/mikael/heads/non_lvalue_v03)] match: Simplify doubled not, negate and conjugate operators to a non_lvalue
https://gcc.gnu.org/g:2eb0523c0d47826c0a2b04b655f6b6bf0a14e0a6 commit 2eb0523c0d47826c0a2b04b655f6b6bf0a14e0a6 Author: Mikael Morin Date: Thu Jul 4 12:59:34 2024 +0200 match: Simplify doubled not, negate and conjugate operators to a non_lvalue Regression tested on x86_64-linux. OK for master? Changes v1 -> v2: - Also handle conjugate operator. - Don't create the NON_LVALUE_EXPR if there is a type conversion between the doubled operators. -- 8< -- gcc/ChangeLog: * match.pd (`-(-X)`, `~(~X)`, `conj(conj(X))`): Add a NON_LVALUE_EXPR wrapper to the simplification of doubled unary operators NEGATE_EXPR, BIT_NOT_EXPR and CONJ_EXPR. gcc/testsuite/ChangeLog: * gfortran.dg/non_lvalue_1.f90: New test. Diff: --- gcc/match.pd | 10 +++--- gcc/testsuite/gfortran.dg/non_lvalue_1.f90 | 32 ++ 2 files changed, 39 insertions(+), 3 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index 0f53c162fce3..372d4657baac 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2357,7 +2357,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* ~~x -> x */ (simplify (bit_not (bit_not @0)) - @0) + (non_lvalue @0)) /* zero_one_valued_p will match when a value is known to be either 0 or 1 including constants 0 or 1. @@ -4037,7 +4037,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (negate (nop_convert? (negate @1))) (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@1))) - (view_convert @1))) + (if (GENERIC && type == TREE_TYPE (@1)) +(non_lvalue @1) +(view_convert @1 /* We can't reassociate floating-point unless -fassociative-math or fixed-point plus or minus because of saturation to +-Inf. */ @@ -5767,7 +5769,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (conj (convert? (conj @0))) (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) - (convert @0))) + (if (GENERIC && type == TREE_TYPE (@0)) + (non_lvalue @0) + (convert @0 /* conj({x,y}) -> {x,-y} */ (simplify diff --git a/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 new file mode 100644 index ..61dad5a2ce1b --- /dev/null +++ b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 @@ -0,0 +1,32 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-original" } +! +! Check the generation of NON_LVALUE_EXPR expressions in cases where a unary +! operator expression would simplify to a bare data reference. + +! A NON_LVALUE_EXPR is generated for a double negation that would simplify to +! a bare data reference. +function f1 (f1_arg1) + integer, value :: f1_arg1 + integer :: f1 + f1 = -(-f1_arg1) +end function +! { dg-final { scan-tree-dump "__result_f1 = NON_LVALUE_EXPR ;" "original" } } + +! A NON_LVALUE_EXPR is generated for a double complement that would simplify to +! a bare data reference. +function f2 (f2_arg1) + integer, value :: f2_arg1 + integer :: f2 + f2 = not(not(f2_arg1)) +end function +! { dg-final { scan-tree-dump "__result_f2 = NON_LVALUE_EXPR ;" "original" } } + +! A NON_LVALUE_EXPR is generated for a double complex conjugate that would +! simplify to a bare data reference. +function f3 (f3_arg1) + complex, value :: f3_arg1 + complex :: f3 + f3 = conjg(conjg(f3_arg1)) +end function +! { dg-final { scan-tree-dump "__result_f3 = NON_LVALUE_EXPR ;" "original" } }
[gcc] Created branch 'mikael/heads/non_lvalue_v03' in namespace 'refs/users'
The branch 'mikael/heads/non_lvalue_v03' was created in namespace 'refs/users' pointing to: 2eb0523c0d47... match: Simplify doubled not, negate and conjugate operators
[gcc r16-1600] value-range: Use int instead of uint for wi::ctz result [PR120746]
https://gcc.gnu.org/g:b76779c7cd6b92f167a80f16f11f599eea4dfc67 commit r16-1600-gb76779c7cd6b92f167a80f16f11f599eea4dfc67 Author: Jakub Jelinek Date: Sat Jun 21 16:09:08 2025 +0200 value-range: Use int instead of uint for wi::ctz result [PR120746] uint is some compatibility type in glibc sys/types.h enabled in misc/GNU modes, so it doesn't exist on many hosts. Furthermore, wi::ctz returns int rather than unsigned and the var is only used in comparison to zero or as second argument of left shift, so I think just using int instead of unsigned is better. 2025-06-21 Jakub Jelinek PR middle-end/120746 * value-range.cc (irange::snap): Use int type instead of uint. Diff: --- gcc/value-range.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index ce13acc312d2..23a5c66ed5e3 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -2287,7 +2287,7 @@ bool irange::snap (const wide_int &lb, const wide_int &ub, wide_int &new_lb, wide_int &new_ub) { - uint z = wi::ctz (m_bitmask.mask ()); + int z = wi::ctz (m_bitmask.mask ()); if (z == 0) return false;
[gcc r16-1606] Scale up auto-profile counts
https://gcc.gnu.org/g:dda86c80bca2300a47f91bcfc589951df9c7f1be commit r16-1606-gdda86c80bca2300a47f91bcfc589951df9c7f1be Author: Jan Hubicka Date: Sun Jun 22 03:26:36 2025 +0200 Scale up auto-profile counts This patch makes auto-profile counts to scale up when the train run has too small maximal count. This still happens on moderately sized train runs such as SPEC2017 ref datasets and helps to avoid rounding errors in cases where number of samples is small. gcc/ChangeLog: * auto-profile.cc (autofdo::afdo_count_scale): New. (autofdo_source_profile::update_inlined_ind_target): Scale counts. (autofdo_source_profile::read): Set scale and dump statistics. (afdo_indirect_call): Scale. (afdo_set_bb_count): Scale. (afdo_find_equiv_class): Fix dumps. (afdo_annotate_cfg): Scale. Diff: --- gcc/auto-profile.cc | 46 ++ 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index f4dda8272ed8..202f1083633d 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -353,6 +353,13 @@ static autofdo_source_profile *afdo_source_profile; /* gcov_summary structure to store the profile_info. */ static gcov_summary *afdo_profile_info; +/* Scaling factor for afdo data. Compared to normal profile + AFDO profile counts are much lower, depending on sampling + frequency. We scale data up to reudce effects of roundoff + errors. */ + +static gcov_type afdo_count_scale = 1; + /* Helper functions. */ /* Return the original name of NAME: strip the suffix that starts @@ -650,8 +657,8 @@ function_instance::find_icall_target_map (tree fn, gcall *stmt, get_identifier (afdo_string_table->get_name (callee))); if (node == NULL) continue; - (*map)[callee] = iter->second->total_count (); - ret += iter->second->total_count (); + (*map)[callee] = iter->second->total_count () * afdo_count_scale; + ret += iter->second->total_count () * afdo_count_scale; } return ret; } @@ -826,6 +833,7 @@ autofdo_source_profile::update_inlined_ind_target (gcall *stmt, for (icall_target_map::const_iterator iter = old_info.targets.begin (); iter != old_info.targets.end (); ++iter) total += iter->second; + total *= afdo_count_scale; /* Program behavior changed, original promoted (and inlined) target is not hot any more. Will avoid promote the original target. @@ -913,7 +921,7 @@ autofdo_source_profile::get_callsite_total_count ( != s->name())) return 0; - return s->total_count (); + return s->total_count () * afdo_count_scale; } /* Read AutoFDO profile and returns TRUE on success. */ @@ -972,6 +980,23 @@ autofdo_source_profile::read () map_[fun_id]->merge (s); } } + /* Scale up the profile, but leave some bits in case some counts gets + bigger than sum_max eventually. */ + if (afdo_profile_info->sum_max) +afdo_count_scale + = MAX (((gcov_type)1 << (profile_count::n_bits / 2)) +/ afdo_profile_info->sum_max, 1); + if (dump_file) +fprintf (dump_file, "Max count in profile %" PRIu64 "\n" + "Setting scale %" PRIu64 "\n" + "Scaled max count %" PRIu64 "\n" + "Hot count threshold %" PRIu64 "\n", +(int64_t)afdo_profile_info->sum_max, +(int64_t)afdo_count_scale, +(int64_t)(afdo_profile_info->sum_max * afdo_count_scale), +(int64_t)(afdo_profile_info->sum_max * afdo_count_scale + / param_hot_bb_count_fraction)); + afdo_profile_info->sum_max *= afdo_count_scale; g->get_dumps ()->dump_finish (profile_pass_num); return true; } @@ -1112,6 +1137,7 @@ afdo_indirect_call (gcall *stmt, const icall_target_map &map, if (max_iter == map.end () || max_iter->second < iter->second) max_iter = iter; } + total *= afdo_count_scale; struct cgraph_node *direct_call = cgraph_node::get_for_asmname ( get_identifier (afdo_string_table->get_name (max_iter->first))); if (direct_call == NULL) @@ -1145,7 +1171,7 @@ afdo_indirect_call (gcall *stmt, const icall_target_map &map, /* Value */ hist->hvalue.counters[2] = direct_call->profile_id; /* Counter */ - hist->hvalue.counters[3] = max_iter->second; + hist->hvalue.counters[3] = max_iter->second * afdo_count_scale; if (!direct_call->profile_id) { @@ -1313,11 +1339,13 @@ afdo_set_bb_count (basic_block bb) if (max_count) { - update_count_by_afdo_count (&bb->count, max_count); + update_count_by_afdo_count (&bb->count, max_count * afdo_count_scale); if (dump_file) fprintf (dump_file, -" Annotated bb %i with count %" PRId64 "\n", -bb->index, (int6