Re: Rewrite assign_discriminators pass

2025-07-12 Thread Jan Hubicka
> > This caused: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121045 I see, -compare-debug actually compares discriminators in dump of final pass. Discriminators do not need to be the same if they are unused and they are consumed only by dwaf2out and by auto-profile, so I think compare-debu

Fix ICE with speculative devirtualization

2025-07-11 Thread Jan Hubicka
Hi, this patch fixes ICE bilding lto1 with autoprofiledbootstrap and in pr114790. What happens is that auto-fdo speculatively devirtualizes to a wrong target. This is due to a bug where it mixes up dwarf names and linkage names of inline functions I need to fix as well. Later we clone at WPA time.

make autprofiledbootstrap with LTO meaningful

2025-07-11 Thread Jan Hubicka
Hello, currently autoprofiled bootstrap produces auto-profiles for cc1 and cc1plus binaries. Those are used to build respective frontend files. For backend cc1plus.fda is used. This does not work well with LTO bootstrap where cc1plus backend is untrained since it is used only for parsing and eal

Re: Rewrite assign_discriminators pass

2025-07-11 Thread Jan Hubicka
> So with this the discriminator we assign might depend on whether > we have debug stmts or not. We output them only to debug info, so > it should in principle not cause compare-debug issues, right? And > we don't use discriminators to affect code generation (hopefully). This is the reason of op

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-10 Thread Jan Hubicka
Hi, this patch fixes several issues I noticed in gimple matching and -Wauto-profile warning. One problem is that we mismatched symbols with user names, such as "*strlen" instead of "strlen". I added raw_symbol_name to strip extra '*' which is ok on ELF targets which are only targets we support wit

Re: [PATCH] [x86] properly compute fp/mode for scalar ops for vectorizer costing

2025-07-10 Thread Jan Hubicka
> The x86 add_stmt_hook relies on the passed vectype to determine > the mode and whether it is FP for a scalar operation. This is > unreliable now for stmts involving patterns and in the future when > there is no vector type passed for scalar operations. > > To be least disruptive I've kept using

Rewrite assign_discriminators pass

2025-07-10 Thread Jan Hubicka
Hi, to assign debug locations to corresponding statements auto-fdo uses discriminators. Documentation says that if given statement belongs to multiple basic blocks, the discrminator distinguishes them. Current implementation however only work fork statements that expands into a squence of gimple

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-10 Thread Jan Hubicka
> > > > I tried to implement a workaround to match lost discriminator in cases > > this is uniquely deterined, but it is not so easy to do. > > My plan is to figure out how to upstream it and then drop the lost > > discriminator workaround from match. > > > > Do you see warnings with -Wauto-profi

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Jan Hubicka
> The following changes the percentage that determines how many > stmts are allowed for backwards jump threading from 50 to 54, > enabling the missed jump threading observed in PR109893. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. It seems that > at least backward threading is prone

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-09 Thread Jan Hubicka
> > I am seeing an ICEs in offline pass. > > > during IPA pass: afdo_offline > gmsh/src/mesh/meshGEdge.cpp:979:1: internal compiler error: in > set_call_location, at auto-profile.cc:433 I added location and call_location into function instance that are originally set to UNKNOWN_LOCATION and la

Re: [PATCH 0/1] [RFC][AutoFDO]: Source filename tracking in GCOV

2025-07-08 Thread Jan Hubicka
> Hi Honza, > > > On 8 Jul 2025, at 2:26 am, Jan Hubicka wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > as discussed also on the autofdo pull request, LLVM solves the same > > p

Fix profile scaling in tree-inline.cc:initialize_cfun

2025-07-07 Thread Jan Hubicka
Hi, initialize_cfun calls profile_count::adjust_for_ipa_scaling (&num, &den); but then the result is never used. This patch fixes it. Overall scalling of entry/exit block is bit sloppy in tree-inline. I see if I can clean it up. Bootstrapped/regtested x86_64-linux, comitted. * tree-in

Re: [PATCH 0/1] [RFC][AutoFDO]: Source filename tracking in GCOV

2025-07-07 Thread Jan Hubicka
Hi, as discussed also on the autofdo pull request, LLVM solves the same problem using -funique-internal-linkage-names https://reviews.llvm.org/D73307 All non-public functions gets theis symbol renamed from .__uniq. Decadic is used since demanglers special case numerical suffixes. In addition debug

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-07 Thread Jan Hubicka
Hi, there are two bugs in get_original_name. First the for loop walking list of known suffixes uses sizeos (suffixes). It evnetually walks to an empty suffix. Second problem is that strcmp may accept suffixes that are longer. I.e. mix up .isra with .israabc. This is probably not a big deal bu

Add cutoff information to profile_info and use it when forcing non-zero value

2025-07-06 Thread Jan Hubicka
Hi, main difference between normal profile feedback and auto-fdo is that with profile feedback every basic block with non-zero profile has an incomming edge with non-zero profile. With auto-profile it is possible that none of predecessors was sampled and also the tool has cutoff parameter which

gcc-patches@gcc.gnu.org

2025-07-06 Thread Jan Hubicka
Hi, this fixes stupid mistake of mine in the overflow check for sreal multiplication. This was introduced this stage1 so unless we want to backport the ipa-cp heuristics bugfixes, this does not need to go to release branches. Regtested and bootstrapped x86_64-linux. Honza gcc/ChangeLog:

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-04 Thread Jan Hubicka
> The following adds a x86 tuning to enable the use of AVX512 masked > epilogues in cases we heuristically determine it to be not detrimental > by high chance. Basically problematic cases are when there are > data streams that are both stored and loaded from and an outer loop > could end up execut

Fix overlfow in ipa-cp heuristics

2025-07-03 Thread Jan Hubicka
Hi, ipa-cp converts sreal times to int, while point of sreal is to accomodate very large values that can happen for loops with large number of iteraitons and also when profile is inconsistent. This happens with afdo in testsuite where loop preheader is estimated to have 0 excutions while loop body

Enable ipa-cp cloning for cold wrappers of hot functions

2025-07-03 Thread Jan Hubicka
Hi, ipa-cp cloning disables itself for all functions not passing opt_for_fn (node->decl, optimize_size) which disables it for cold wrappers of hot functions where we want to propagate. Since we later want to time saved to be considered hot, we do not need to make this early test. The patch also f

Re: [PATCH 1/1] [RFC][AutoFDO] Propagate information to outline copies if not inlined

2025-07-02 Thread Jan Hubicka
> On 02/07/25 07:26, Kugan Vivekanandarajah wrote: > > > > > > > > > > Given the latest few patches that you have committed, is this patch > > > necessary > > > anymore? I have not fully understood the new logic as I was on holiday > > > last > > > week, but it looks like the propagation is oc

Re: AFDO/FDO profile comparator

2025-06-30 Thread Jan Hubicka
> Hi Honza, > > On Sun, Jun 29, 2025 at 10:45 PM Jan Hubicka wrote: > > > > > > > > > > > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka wrote: > > > > > > > > External email: Use caution opening links or attachments > > >

AFDO/FDO profile comparator

2025-06-29 Thread Jan Hubicka
> > > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > this pass removes early-inlining from afdo pass since all inlining > > should now happen from ear

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-06-27 Thread Jan Hubicka
HI, I have tested your patch on exchange2 and noticed multiple problems: 1) with LTO the translation from dwarf names to symbol names is disabled since we free lang data sooner. I moved the offline pass upstream which however also may make us miss clones intorduced betwen free lang dat

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-06-27 Thread Jan Hubicka
> Hi Honza, > > So merging the profiles will also lead to inconsistencies making the > > .part variant to seem more hot than it is... > > I am looking into this and will post the patch as a follow up patch. Thanks. Note that now with merging being done recursively to inline instances while offli

Re: Remove early inlining from afdo pass

2025-06-27 Thread Jan Hubicka
Hi, > > We can look into this. We do compare manually the IR dumps from both and it > is not ideal. > What we should do is an additional (optional) pass that runs after > auto-profile to compare the annotations > using the profile-use. We will have to filter out any functions/path that > runs

Re: Remove early inlining from afdo pass

2025-06-26 Thread Jan Hubicka
> > > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > this pass removes early-inlining from afdo pass since all inlining > > should now happen from ear

Avoid some lost AFDO profiles with LTO

2025-06-26 Thread Jan Hubicka
Hi, This patch fixes some of cases where we lose profile info because we do not perform inlining that happened at train run before AFDO annotation is done. This is a common problem with LTO in the case cross-module inlining happened. I added afdo_offline pass that does two things: 1) collect set

Remove early inlining from afdo pass

2025-06-25 Thread Jan Hubicka
Hi, this pass removes early-inlining from afdo pass since all inlining should now happen from early inliner. I tedted this on spec and there are 3 inlines happening here which are blocked at early-inline time by hitting large function growth limit. We probably want to bypass that limit, I will lo

Re: Do not drop discriminator when inlining

2025-06-25 Thread Jan Hubicka
> > What seems to be common now is profile breakage around loops that has > > been fully unrolled or vectorized which is bit undderstandbale thought I > > wonder if we can improve here. I think we can fix problem where profile > > of loop header stmts is partly or fully lost (which seems to be mai

Re: [PATCH v3] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-25 Thread Jan Hubicka
> Here is the v3 patch. It no longer uses "rep mov/stos". Lili, can you > measure > its performance impact on Intel and AMD cpus? > > The updated generic has > > Update memcpy and memset inline strategies for -mtune=generic: > > 1. Don't align memory. This looks OK to me (recent microarchs s

Re: Do not drop discriminator when inlining

2025-06-24 Thread Jan Hubicka
> > That is why I checked for loc != UNKNOWN_LOCATION. I did not expect > > UNKNOWN_LOCATION to have discriminators. What they are good for? > > I have no idea, this was simply a defensive review where it's no > longer obvious that inlined_function_outer_scope_p would still work > in all cases.

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-06-24 Thread Jan Hubicka
> > > > With part suffixes we also may want to merge specially, since the > > entry_count of the split part does not correspond to entry_count of the > > original function. > > > > I wonder, does partitioned function work with the google tool? I > > remember it had limitations in this respect. >

Add -fauto-profile-inlining

2025-06-23 Thread Jan Hubicka
Hi, this patch adds -fauto-profile-inlining which can be used to control the auto-profile directed inlning. The feature is quite interesitng but also may trigger unexpected code size growth or prevent useful IPA inlining in the profiled binary. Bootstrapped/regtested x86_64. Plan to commit it tom

Re: Do not drop discriminator when inlining

2025-06-23 Thread Jan Hubicka
> On Sun, 22 Jun 2025, Jan Hubicka wrote: > > > Hi, > > auto-fdo is currently confused by a fact that all inlined functions get > > locators with 0 discriminator, so it is not bale to distinguish multiple > > inlined calls from single line. > > > > Discr

Re: [PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread Jan Hubicka
> Add a PROCESSOR_XXX comment to each entry in processor_cost_table to > describe which processor the cost enry is applied to. > > * config/i386/i386-options.cc (processor_cost_table): Add a > PROCESSOR_XXX comment to each entry. > > > -- > H.J. > From 8b37db60ec21c1c673eb1e336208dc10a5d86d5c

Add GUESSED_GLOBAL0_AFDO profile quality

2025-06-22 Thread Jan Hubicka
Hi, This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can be used to preserve local counts of functions which have 0 AFDO profile. I originally did not include it as it was not clear it will be useful and it turns quality from 3bits to 4bits which means that we need to steal another bit fro

Fix some problems with afdo propagation

2025-06-22 Thread Jan Hubicka
Hi, This patch fixes problems I noticed by exploring profiles of some hot functions in GCC. In particular the propagation sometimes changed precise 0 to afdo 0 for paths calling abort and sometimes we could propagate more when we accept that some paths has 0 count. Finally there was important bug

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread Jan Hubicka
> This contradicts > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions >such as "add $1, mem". */ > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > ~(m_PENT | m_LAKEMONT)) > > which enables "andl $0, (%edx)" for PentiumPro. "andl $0, (%edx

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread Jan Hubicka
> > Since read-modify-write is enabled for PentiumPro: > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions >such as "add $1, mem". */ > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > ~(m_PENT | m_LAKEMONT)) > > should this > > /* Generate

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread Jan Hubicka
> Since there is > > /* X86_TUNE_SPLIT_LONG_MOVES: Avoid instructions moving immediates >directly to memory. */ > DEF_TUNE (X86_TUNE_SPLIT_LONG_MOVES, "split_long_moves", m_PPRO) If I recall correctly, this tune was added for PentiumPro which had problem decoding moves with long immediate an

Do not drop discriminator when inlining

2025-06-22 Thread Jan Hubicka
Hi, auto-fdo is currently confused by a fact that all inlined functions get locators with 0 discriminator, so it is not bale to distinguish multiple inlined calls from single line. Discriminator is lost by calling LOCATION_LOCUS before copying it from former call statement. I believe this is only

Handle functions with 0 profile in auto-profile

2025-06-22 Thread Jan Hubicka
Hi, This is the last part of the infrastructure to allow functions with local profiles and 0 global autofdo counts. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * auto-profile.cc (afdo_set_bb_count): Dump inline stacks and reasons when lookup failed. (afd

Re: Improve static and AFDO profile combination

2025-06-22 Thread Jan Hubicka
> In addition to working with you on the issues of profile being lost with > LTO, cloning and other cases, my plan is to > 1) finish the VPT reorganization > 2) make AFD reader to scale up the profile since at least in data from > SPEC or profiledbootstrap the counters are quite small integers w

fix profile after fnsplit

2025-06-21 Thread Jan Hubicka
Hi, when splitting functions, tree-inline determined correctly entry count of the new function part, but then in case entry block of new function part is in a loop it scales body which is not suposed to happen. Bootstrapped/regtested x86_64-linux, comitted. * tree-inline.cc (copy_cfg_body

Extend afdo inliner to introduce speculative calls

2025-06-20 Thread Jan Hubicka
Hi, this patch makes the AFDO's VPT to happen during early inlining. This should make the einline pass inside afdo pass unnecesary, but some inlining still happens there - I will need to debug why that happens and will try to drop the afdo's inliner incrementally. get_inline_stack_in_node can now

Re: Improve static and AFDO profile combination

2025-06-19 Thread Jan Hubicka
> In an internal application I noticed that the ipa-inliner is quite > sensitive to AFDO counts and that seems to make the performance worse. > Did you notice this? This was before some of your changes. I will try > again. The cases I looked into were mixture of late inlining and ipa-cp cloning be

Implement afdo inliner

2025-06-18 Thread Jan Hubicka
Hi, this patch moves afdo inlining from early inliner into specialized one. The reason is that early inliner is by design non-recursive while afdo inliner needs to recurse. In the past google handled it by increasing early inliner iterations, but it can be done easily and cheaply without it by sim

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-06-18 Thread Jan Hubicka
> > IMO doing this in a loop would have to handle all the above cases and would > make it hard to read. Also, we would have two level for now. Even if this > change in the future, this is not going to be too long. > > Here is the revised patch, > > Is this OK for mainline. > gcc/ChangeLog: >

Improve static and AFDO profile combination

2025-06-17 Thread Jan Hubicka
Hi, this patch makes afdo_adjust_guessed_profile more agressive on finding scales on the boundaries of connected components with no annotation. Originaly I looked for edges into or out of the component with known AFDO counts and I also haled edges from basic block with known AFDO count and known s

Fix cgraph_node::apply_scale

2025-06-17 Thread Jan Hubicka
Hi, while working on auto-FDO I noticed that we may run into ICE because we inline function with count profile_count::zero to a call site with profile_count::zero. What may go wrong is that the caller has local profile while callee may have IPA profiles. We used to turn all such counts to 0, but t

Re: [PATCH 1/1] [RFC][AutoFDO] Propagate information to outline copies if not inlined

2025-06-17 Thread Jan Hubicka
> From: Dhruv Chawla > > This patch modifies afdo_set_bb_count to propagate profile information > to outline copies of functions if they are not inlined. This information > gets lost otherwise. > > Signed-off-by: Dhruv Chawla > > gcc/ChangeLog: > > * gcc/auto-profile.cc (count_info): Ad

Re: [PATCH 0/1] [RFC][AutoFDO] Propagate inline information to outline definitions if not inlined

2025-06-17 Thread Jan Hubicka
> Another problem here is that get_inline_stack returns an empty stack if > no inlining occurred in the corresponding GIMPLE statement. So if an > inline callsite does exist in the profile at the current GIMPLE > statement but no inlining actually occurs during auto-profile, the > information is ju

Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-06-16 Thread Jan Hubicka
> > I don't think you kept this logic in the new code? I really apologize for late reply. I missed that you wait for it. > > To be honest, I didn't really follow the logic here. Thinking about the > single-exit case (which the current code is designed to handle), both > the body of the if and the

Re: [PATCH] [RFC][AutoFDO] Source filename tracking in GCOV

2025-06-16 Thread Jan Hubicka
> gcc/ChangeLog: > * auto-profile.cc (AUTO_PROFILE_VERSION): Bump from 2 to 3. > (string_table::get_real_name): Define new member function. > (string_table::get_file_name): Likewise. > (string_table::get_file_name_idx): Likewise. > (string_table::real_names_): Define n

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-06-16 Thread Jan Hubicka
> On Mon, Jun 16, 2025 at 05:49:19PM +0200, Jan Hubicka wrote: > > > On Wed, Apr 30, 2025 at 08:56:57AM +0200, Jakub Jelinek wrote: > > > > On Mon, Apr 28, 2025 at 07:27:31PM +0200, Josef Melcr wrote: > > > > > As for the attribute, I am honestly not too sur

Re: [PATCH 0/1] [RFC][AutoFDO]: Source filename tracking in GCOV

2025-06-16 Thread Jan Hubicka
Hi, > Introduction > > > Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish > profile information for `function_instance's with the same base name, when > suffixes are removed. To fix this, source file names should be tracked in the > GCOV file information t

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-06-16 Thread Jan Hubicka
> On Wed, Apr 30, 2025 at 08:56:57AM +0200, Jakub Jelinek wrote: > > On Mon, Apr 28, 2025 at 07:27:31PM +0200, Josef Melcr wrote: > > > As for the attribute, I am honestly not too sure about what to do, as > > > clang > > > is > > > not consistent in with its own indexing, be it with the unknown v

Combine static and afdo profile

2025-06-16 Thread Jan Hubicka
Hi, Currently afdo reads the profile and anotates basic blocks containing statements which have samples in profile data. For basic blocks which has been fully optimized out (for example, basic blocks controlling loops that has been fully unrolled) it has no data which it then tries to determine in

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-15 Thread Jan Hubicka
> > Perhaps someone is interested in the following thread from LKML: > > "[PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops" > > https://lore.kernel.org/lkml/20250605164733.737543-1-mjgu...@gmail.com/ > > There are several PRs regarding memcpy/memset linked from the abov

Re: [PATCH 0/1] [RFC][AutoFDO] Propagate inline information to outline definitions if not inlined

2025-06-13 Thread Jan Hubicka
> On 13/06/25 14:51, Jan Hubicka wrote: > > External email: Use caution opening links or attachments > > > > > > > From: Dhruv Chawla > > Hi, > > > > > > For reasons explained in the patch, this patch prevents the loss of > > >

Re: [PATCH 0/1] [RFC][AutoFDO] Propagate inline information to outline definitions if not inlined

2025-06-13 Thread Jan Hubicka
> From: Dhruv Chawla Hi, > > For reasons explained in the patch, this patch prevents the loss of profile > information when inlining occurs in the profiled binary but not in the > auto-profile pass as a decision. As an example, for this code: I was wondering about this problem too > - Annotation

Re: [AutoFDO] Profile merging for clone test

2025-06-10 Thread Jan Hubicka
> Hi, > > > On 4 Jun 2025, at 9:53 pm, Jan Hubicka wrote: > > > > External email: Use caution opening links or attachments > > > > > >> This patch introduces a new testcase to verify the merging of profiles > >> is performed for cloned

Re: [AutoFDO] Profile merging for clone test

2025-06-09 Thread Jan Hubicka
OK, thanks! Honza

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-06-06 Thread Jan Hubicka
> On 2025-06-06 12:42, Jan Hubicka wrote: > > > Hi, > > > also after fixing this issue my bootstrap failes with: > > > > > > Permission error mapping pages. > > > Consider increasing /proc/sys/kernel/perf_event_mlock_kb, > > > or try agai

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-06-06 Thread Jan Hubicka
> Hi, > also after fixing this issue my bootstrap failes with: > > Permission error mapping pages. > Consider increasing /proc/sys/kernel/perf_event_mlock_kb, > or try again with a smaller value of -m/--mmap_pages. > (current value: 4294967295,0) > Permission error mapping pages. > Consider increa

More of autofdo 0 fixes

2025-06-06 Thread Jan Hubicka
This patch fixes ICE seen when building spec2k17 with autofdo and enable checking compiler. Bause we special case 0 of autofdo to be kind of 1 in IPA scalling, we can now end up with function heving global0 profile but producing inline clone with nonzero profile. I think correct way is to extend

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-06-06 Thread Jan Hubicka
Hi, also after fixing this issue my bootstrap failes with: Permission error mapping pages. Consider increasing /proc/sys/kernel/perf_event_mlock_kb, or try again with a smaller value of -m/--mmap_pages. (current value: 4294967295,0) Permission error mapping pages. Consider increasing /proc/sys/ker

Avoid useless reading of profile data in LTO

2025-06-06 Thread Jan Hubicka
Hi, New auto-profile merging dumps made me notice that we read the afdo data when we are in LTO. This is not necessary since profile is read at compile time and streamed to LTO bytecode. gcc/ChangeLog: * coverage.cc (coverage_init): Return early when in LTO. diff --git a/gcc/coverage.cc

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-06-06 Thread Jan Hubicka
> Should I go with: > > +autofdo_target > > +autofdo_target="i386" > +case "${target}" in > + aarch64-*-*) > +autofdo_target="aarch64" > +;; > +esac > > As in the first version? I can test and send a patch for review if there is > no other better alternative. This looks OK - I can n

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-06-06 Thread Jan Hubicka
> Hi Honza, > > > On 26 May 2025, at 5:28 pm, Jan Hubicka wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > >> Ping? > > Sorry for the delay. I think I finally got auto-fdo running on my box &

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-06-06 Thread Jan Hubicka
> Kugan Vivekanandarajah writes: > > Add support for autoprofiledbootstrap in aarch64. > > This is similar to what is done for i386. Added > > gcc/config/aarch64/gcc-auto-profile for aarch64 profile > > creation. > > > > How to run: > > configure --with-build-config=bootstrap-lto > > make autoprof

Re: [AutoFDO] Profile merging for clone test

2025-06-04 Thread Jan Hubicka
> This patch introduces a new testcase to verify the merging of profiles > is performed for cloned functions. > > Since this is invoked very early, before the pass manager, we need to > set up the dumping explicitly. This is similar to the handling in > finish_optimization_passes. > > gcc/ChangeL

Re: [PATCH] Fix crash with constant initializer caused by IPA

2025-05-30 Thread Jan Hubicka
> On Fri, May 30, 2025 at 11:30 AM Jan Hubicka wrote: > > > > Hi, > > > > > > > > Hi, > > > > > > > > the attached Ada testcase compiled with -O2 -gnatn makes the compiler > > > > crash in > > > > vect_ca

Re: [PATCH] Fix crash with constant initializer caused by IPA

2025-05-30 Thread Jan Hubicka
Hi, > > > > Hi, > > > > the attached Ada testcase compiled with -O2 -gnatn makes the compiler crash > > in > > vect_can_force_dr_alignment_p during SLP vectorization: > > > > if (decl_in_symtab_p (decl) > > && !symtab_node::get (decl)->can_increase_alignment_p ()) > > return false; > >

Re: [PATCH] ipa: When inlining, don't combine PT JFs changing signedness (PR120295)

2025-05-29 Thread Jan Hubicka
> Hi, > > in GCC 15 we allowed jump-function generation code to skip over a > type-cast converting one integer to another as long as the latter can > hold all the values of the former or has at least the same precision. > This works well for IPA-CP where we do then evaluate each jump > function as

Re: [AUTOFDO] Fix annotated profile for de-duplicated call

2025-05-29 Thread Jan Hubicka
> > However i do not quite follow the old or new logic here. > So if I have only one unknown edge out (or in) from BB and I know > its count, I can determine count of that edge by Kirhoff law. > > But then the old code computes number of edges out of the BB > and if it is only one it updates the

Re: [AUTOFDO] Fix annotated profile for de-duplicated call

2025-05-29 Thread Jan Hubicka
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc > index 7e0e8c66124..8a317d85277 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -1129,6 +1129,26 @@ afdo_set_bb_count (basic_block bb, const stmt_set > &promoted) >gimple *stmt = gsi_stmt (gsi); >if (gimp

Re: [PATCH] [AUTOFDO] Enable autofdo tests for aarch64

2025-05-29 Thread Jan Hubicka
> I also noticed that some tests are only enabled for x86. I am also seeing: > ./gcc/testsuite/gcc/gcc.sum:UNSUPPORTED: gcc.dg/tree-prof/pr66295.c This is testing a former ifun bug which reproduced with -fprofile-use > ./gcc/testsuite/gcc/gcc.sum:UNSUPPORTED: gcc.dg/tree-prof/split-1.c This is test

Re: [PATCH] [AUTOFDO] Enable autofdo tests for aarch64

2025-05-29 Thread Jan Hubicka
> Hi, > autofdo tests are now running only for x86. This patch makes it > run for aarch64 too. Verified that perf and create_gcov are running > as expected. > > gcc/ChangeLog: > > * config/aarch64/gcc-auto-profile: Make script executable. > > gcc/testsuite/ChangeLog: > > * lib/t

Set znver5 addss cost to 2 again

2025-05-28 Thread Jan Hubicka
Hi, since uses of addss for other purposes then modelling FP addition/subtraction should be gone now, this patch sets addss cost back to 2. Bootsrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR target/119298 * config/i386/x86-tune-costs.h (struct processor_costs): Set

Do not drop AFDO profile if entry block has count of 0

2025-05-28 Thread Jan Hubicka
Hi, with normal profile feedback checking entry block count to be non-zero is quite reliable check for presence of non-0 profile in the body since the function body can only be executed if the entry block was executed. With autofdo this is not true, since the entry block may just execute too few t

Do not erase static profile by 0 autofdo profile

2025-05-28 Thread Jan Hubicka
Hi, This patch makes auto-fdo more careful about keeping info we have from static profile prediction. If all counters in function are 0, we can keep original auto-fdo profile. Having all 0 profile is not very useful especially becuase 0 in autofdo is not very informative and the code still may hav

Re: [PATCH] i386: Use Shuffles instead of shifts for Reduction in AMD znver4/5

2025-05-28 Thread Jan Hubicka
> gcc/ChangeLog: > > * config/i386/i386-expand.cc (emit_reduc_half): Use shuffles to > generate reduc half for V4SI, similar modes. > * config/i386/i386.h (TARGET_SSE_REDUCTION_PREFER_PSHUF): New Macro. > * config/i386/x86-tune.def (X86_TUNE_SSE_REDUCTION_PREFER_PSHUF): >

Remove dead code in auto-profile.cc

2025-05-27 Thread Jan Hubicka
Hi, this code to track what locations were used when reading auto-fdo profile seems dead since the initial commit. Removed thus. Comitted as obvious. Honza gcc/ChangeLog: * auto-profile.cc (function_instance::mark_annotated): Remove. (function_instance::total_annotated_count): Re

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Jan Hubicka
> > > > On 26 May 2025, at 5:34 pm, Jan Hubicka wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > also, please, can you add an testcase? We should have some coverage for > > auto-fdo specific is

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Jan Hubicka
Hi, also, please, can you add an testcase? We should have some coverage for auto-fdo specific issues Honza 0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch Description: 0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Jan Hubicka
Hi, > Ping? Sorry for the delay. I think I finally got auto-fdo running on my box and indeed I see that if function is cloned later, the profile is lost. There are .suffixes added before afdo pass (such as openmp offloading or nested functions) and there are .suffixes added afer afdo (by ipa clonin

Re: [AUTOFDO] Enable ipa-split for auto-profile

2025-05-22 Thread Jan Hubicka
> > On 9 May 2025, at 11:55 am, Kugan Vivekanandarajah > > wrote: > > > > ipa-split is not now run for auto-profile. IMO this was an oversight. > > This patch enables it similar to PGO runs. > > > > gcc/ChangeLog: > > > >* ipa-split.cc pass_feedback_split_functions::clone (): New. > >

Re: [PATCH 3/5] ipa: Dump cgraph_node UID instead of order into ipa-clones dump file

2025-05-15 Thread Jan Hubicka
> Hi, > > starting with GCC 15 the order is not unique for any symtab_nodes but > m_uid is, I believe we ought to dump the latter in the ipa-clones dump, > if only so that people can reliably match entries about new clones to > those about removed nodes (if any). > > Bootstrapped and tested on x8

Re: [PATCH][x86] Fix regression from x86 multi-epilogue tuning

2025-05-14 Thread Jan Hubicka
> With the avx512_two_epilogues tuning enabled for zen4 and zen5 > the gcc.target/i386/vect-epilogues-5.c testcase below regresses > and ends up using AVX2 sized vectors for the masked epilogue > rather than AVX512 sized vectors. The following patch rectifies > this and adds coverage for the inten

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-14 Thread Jan Hubicka
> Thansk for review. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > n some benchmark, I notice stv failed due to cost unprofitable, but the igain > is inside the loop, but sse<->integer conversion is outside the loop, current > cost > model doesn't consider the

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-12 Thread Jan Hubicka
> > gcc/ChangeLog: > > > > * config/i386/i386-features.cc > > (scalar_chain::mark_dual_mode_def): Weight > > n_integer_to_sse/n_sse_to_integer with bb frequency. > > (general_scalar_chain::compute_convert_gain): Ditto, and > > adjust function prototype to ret

Re: i386: Fix some problems in stv cost model

2025-05-12 Thread Jan Hubicka
> > Instructions with latency info are those really different. > > So the uncoverted code has sum of latencies 4 and real latency 3. > > Converted code has sum of latencies 4 and real latency 3 > > (vmod+vpmaxsd+vmov). > > So I do not quite see it should be a win. > > Note this was historically d

i386: Fix some problems in stv cost model

2025-05-10 Thread Jan Hubicka
Hi, this patch fixes some of problems with cosint in scalar to vector pass. In particular 1) the pass uses optimize_insn_for_size which is intended to be used by expanders and splitters and requires the optimization pass to use set_rtl_profile (bb) for currently processed bb. This is n

i386: implement costs for float<->int conversions in ix86_vector_costs::add_stmt_cost

2025-05-07 Thread Jan Hubicka
Hi, This patch adds pattern matching for float<->int conversions both as normal statements and promote_demote. While updating promote_demote I noticed that in cleanups I turned "stmt_cost =" into "int stmt_cost = " which turned the existing FP costing to NOOP. I also added comment on how demotes a

Fix i386 bootstrap on non-Windows targets

2025-05-06 Thread Jan Hubicka
Hi, this patch adds ifdef so we don't get warning on ix86_tls_index being unused. Bootstrapped x86_64-linux, comitted. * config/i386/i386.cc (ix86_tls_index): Add ifdef. diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index f28c92a9d3a..89f518c86b5 100644 --- a/gcc/config/

Improve maybe_hot handling in inliner heuristics

2025-05-03 Thread Jan Hubicka
Hi, Inliner currently applies different heuristics to hot and cold calls (the second are inlined only if the code size will shrink). It may happen that the call itself is hot, but the significant time is spent in callee and inlining makes it faster. For this reason we want to check if the anticip

Improve ix86 VEC_MERGE costs

2025-05-02 Thread Jan Hubicka
Hi, ix86_rtx_costs VEC_MERGE by special casing AVX512 mask operations and otherwise returning cost->sse_op completely ignoring costs of the operands. Since VEC_MERGE is also used to represent scalar variant of SSE/AVX operation, this means that many instructions (such as SSE converisions) are ofte

Re: Make ix86 cost of VEC_SELECT equivalent to SUBREG same as of SUBREG

2025-05-02 Thread Jan Hubicka
> target_insn_cost is used to prevent rpad optimization to be restored by > late_combine1, looks like it's not sufficient for size_cost. > > 21804static int > 21805ix86_insn_cost (rtx_insn *insn, bool speed) > 21806{ > 21807 int insn_cost = 0; > 21808 /* Add extra cost to avoid post_reload late

Re: [PATCH v2] Consider frequency in cost estimation when converting scalar to vector.

2025-04-29 Thread Jan Hubicka
> > so gain is the difference of runtime of integer variant compared to > > vector vairant and cost are the extra int->see and sse->int conversions > > needed? > > > > If you scale everything by a BB frequency, you will get a weird > > behaviour if chain happens to consist only of instructions in

  1   2   3   4   5   6   7   8   9   10   >