>
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121045
I see, -compare-debug actually compares discriminators in dump of final
pass. Discriminators do not need to be the same if they are unused
and they are consumed only by dwaf2out and by auto-profile, so I think
compare-debu
Hi,
this patch fixes ICE bilding lto1 with autoprofiledbootstrap and in pr114790.
What happens is that auto-fdo speculatively devirtualizes to a wrong target.
This is due to a bug where it mixes up dwarf names and linkage names of inline
functions I need to fix as well.
Later we clone at WPA time.
Hello,
currently autoprofiled bootstrap produces auto-profiles for cc1 and
cc1plus binaries. Those are used to build respective frontend files.
For backend cc1plus.fda is used. This does not work well with LTO
bootstrap where cc1plus backend is untrained since it is used only for
parsing and eal
> So with this the discriminator we assign might depend on whether
> we have debug stmts or not. We output them only to debug info, so
> it should in principle not cause compare-debug issues, right? And
> we don't use discriminators to affect code generation (hopefully).
This is the reason of op
Hi,
this patch fixes several issues I noticed in gimple matching and -Wauto-profile
warning. One problem is that we mismatched symbols with user names, such as
"*strlen" instead of "strlen". I added raw_symbol_name to strip extra '*' which
is ok on ELF targets which are only targets we support wit
> The x86 add_stmt_hook relies on the passed vectype to determine
> the mode and whether it is FP for a scalar operation. This is
> unreliable now for stmts involving patterns and in the future when
> there is no vector type passed for scalar operations.
>
> To be least disruptive I've kept using
Hi,
to assign debug locations to corresponding statements auto-fdo uses
discriminators. Documentation says that if given statement belongs to multiple
basic blocks, the discrminator distinguishes them.
Current implementation however only work fork statements that expands into a
squence of gimple
> >
> > I tried to implement a workaround to match lost discriminator in cases
> > this is uniquely deterined, but it is not so easy to do.
> > My plan is to figure out how to upstream it and then drop the lost
> > discriminator workaround from match.
> >
> > Do you see warnings with -Wauto-profi
> The following changes the percentage that determines how many
> stmts are allowed for backwards jump threading from 50 to 54,
> enabling the missed jump threading observed in PR109893.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. It seems that
> at least backward threading is prone
>
> I am seeing an ICEs in offline pass.
>
>
> during IPA pass: afdo_offline
> gmsh/src/mesh/meshGEdge.cpp:979:1: internal compiler error: in
> set_call_location, at auto-profile.cc:433
I added location and call_location into function instance that are
originally set to UNKNOWN_LOCATION and la
> Hi Honza,
>
> > On 8 Jul 2025, at 2:26 am, Jan Hubicka wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> > as discussed also on the autofdo pull request, LLVM solves the same
> > p
Hi,
initialize_cfun calls
profile_count::adjust_for_ipa_scaling (&num, &den);
but then the result is never used. This patch fixes it. Overall scalling
of entry/exit block is bit sloppy in tree-inline. I see if I can clean it up.
Bootstrapped/regtested x86_64-linux, comitted.
* tree-in
Hi,
as discussed also on the autofdo pull request, LLVM solves the same
problem using -funique-internal-linkage-names
https://reviews.llvm.org/D73307
All non-public functions gets theis symbol renamed from
.__uniq.
Decadic is used since demanglers special case numerical suffixes.
In addition debug
Hi,
there are two bugs in get_original_name. First the for loop walking
list of known suffixes uses sizeos (suffixes). It evnetually walks to
an empty suffix. Second problem is that strcmp may accept suffixes that
are longer. I.e. mix up .isra with .israabc. This is probably not a
big deal bu
Hi,
main difference between normal profile feedback and auto-fdo is that with
profile
feedback every basic block with non-zero profile has an incomming edge with
non-zero
profile. With auto-profile it is possible that none of predecessors was sampled
and also the tool has cutoff parameter which
Hi,
this fixes stupid mistake of mine in the overflow check for sreal
multiplication. This was introduced this stage1 so unless we want to
backport the ipa-cp heuristics bugfixes, this does not need to go to
release branches.
Regtested and bootstrapped x86_64-linux.
Honza
gcc/ChangeLog:
> The following adds a x86 tuning to enable the use of AVX512 masked
> epilogues in cases we heuristically determine it to be not detrimental
> by high chance. Basically problematic cases are when there are
> data streams that are both stored and loaded from and an outer loop
> could end up execut
Hi,
ipa-cp converts sreal times to int, while point of sreal is to accomodate very
large values that can happen for loops with large number of iteraitons and also
when profile is inconsistent. This happens with afdo in testsuite where loop
preheader is estimated to have 0 excutions while loop body
Hi,
ipa-cp cloning disables itself for all functions not passing opt_for_fn
(node->decl, optimize_size) which disables it for cold wrappers of hot
functions where we want to propagate. Since we later want to time saved
to be considered hot, we do not need to make this early test.
The patch also f
> On 02/07/25 07:26, Kugan Vivekanandarajah wrote:
> >
> >
> > >
> > > Given the latest few patches that you have committed, is this patch
> > > necessary
> > > anymore? I have not fully understood the new logic as I was on holiday
> > > last
> > > week, but it looks like the propagation is oc
> Hi Honza,
>
> On Sun, Jun 29, 2025 at 10:45 PM Jan Hubicka wrote:
> >
> > >
> > >
> > > > On 24 Jun 2025, at 7:43 pm, Jan Hubicka wrote:
> > > >
> > > > External email: Use caution opening links or attachments
> > >
>
>
> > On 24 Jun 2025, at 7:43 pm, Jan Hubicka wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> > this pass removes early-inlining from afdo pass since all inlining
> > should now happen from ear
HI,
I have tested your patch on exchange2 and noticed multiple problems:
1) with LTO the translation from dwarf names to symbol names is disabled
since we free lang data sooner. I moved the offline pass upstream which
however also may make us miss clones intorduced betwen free lang dat
> Hi Honza,
> > So merging the profiles will also lead to inconsistencies making the
> > .part variant to seem more hot than it is...
>
> I am looking into this and will post the patch as a follow up patch.
Thanks. Note that now with merging being done recursively to inline
instances while offli
Hi,
>
> We can look into this. We do compare manually the IR dumps from both and it
> is not ideal.
> What we should do is an additional (optional) pass that runs after
> auto-profile to compare the annotations
> using the profile-use. We will have to filter out any functions/path that
> runs
>
>
> > On 24 Jun 2025, at 7:43 pm, Jan Hubicka wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> > this pass removes early-inlining from afdo pass since all inlining
> > should now happen from ear
Hi,
This patch fixes some of cases where we lose profile info because we do not
perform inlining that happened at train run before AFDO annotation is done.
This is a common problem with LTO in the case cross-module inlining happened.
I added afdo_offline pass that does two things:
1) collect set
Hi,
this pass removes early-inlining from afdo pass since all inlining
should now happen from early inliner. I tedted this on spec and there
are 3 inlines happening here which are blocked at early-inline time by
hitting large function growth limit. We probably want to bypass that
limit, I will lo
> > What seems to be common now is profile breakage around loops that has
> > been fully unrolled or vectorized which is bit undderstandbale thought I
> > wonder if we can improve here. I think we can fix problem where profile
> > of loop header stmts is partly or fully lost (which seems to be mai
> Here is the v3 patch. It no longer uses "rep mov/stos". Lili, can you
> measure
> its performance impact on Intel and AMD cpus?
>
> The updated generic has
>
> Update memcpy and memset inline strategies for -mtune=generic:
>
> 1. Don't align memory.
This looks OK to me (recent microarchs s
> > That is why I checked for loc != UNKNOWN_LOCATION. I did not expect
> > UNKNOWN_LOCATION to have discriminators. What they are good for?
>
> I have no idea, this was simply a defensive review where it's no
> longer obvious that inlined_function_outer_scope_p would still work
> in all cases.
> >
> > With part suffixes we also may want to merge specially, since the
> > entry_count of the split part does not correspond to entry_count of the
> > original function.
> >
> > I wonder, does partitioned function work with the google tool? I
> > remember it had limitations in this respect.
>
Hi,
this patch adds -fauto-profile-inlining which can be used to control
the auto-profile directed inlning. The feature is quite interesitng
but also may trigger unexpected code size growth or prevent useful
IPA inlining in the profiled binary.
Bootstrapped/regtested x86_64. Plan to commit it tom
> On Sun, 22 Jun 2025, Jan Hubicka wrote:
>
> > Hi,
> > auto-fdo is currently confused by a fact that all inlined functions get
> > locators with 0 discriminator, so it is not bale to distinguish multiple
> > inlined calls from single line.
> >
> > Discr
> Add a PROCESSOR_XXX comment to each entry in processor_cost_table to
> describe which processor the cost enry is applied to.
>
> * config/i386/i386-options.cc (processor_cost_table): Add a
> PROCESSOR_XXX comment to each entry.
>
>
> --
> H.J.
> From 8b37db60ec21c1c673eb1e336208dc10a5d86d5c
Hi,
This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can
be used to preserve local counts of functions which have 0 AFDO
profile.
I originally did not include it as it was not clear it will be useful and
it turns quality from 3bits to 4bits which means that we need to steal another
bit fro
Hi,
This patch fixes problems I noticed by exploring profiles of some hot
functions in GCC. In particular the propagation sometimes changed
precise 0 to afdo 0 for paths calling abort and sometimes we could
propagate more when we accept that some paths has 0 count.
Finally there was important bug
> This contradicts
>
> /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
>such as "add $1, mem". */
> DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
> ~(m_PENT | m_LAKEMONT))
>
> which enables "andl $0, (%edx)" for PentiumPro. "andl $0, (%edx
>
> Since read-modify-write is enabled for PentiumPro:
>
> /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
>such as "add $1, mem". */
> DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
> ~(m_PENT | m_LAKEMONT))
>
> should this
>
> /* Generate
> Since there is
>
> /* X86_TUNE_SPLIT_LONG_MOVES: Avoid instructions moving immediates
>directly to memory. */
> DEF_TUNE (X86_TUNE_SPLIT_LONG_MOVES, "split_long_moves", m_PPRO)
If I recall correctly, this tune was added for PentiumPro which had
problem decoding moves with long immediate an
Hi,
auto-fdo is currently confused by a fact that all inlined functions get
locators with 0 discriminator, so it is not bale to distinguish multiple
inlined calls from single line.
Discriminator is lost by calling LOCATION_LOCUS before copying it from
former call statement. I believe this is only
Hi,
This is the last part of the infrastructure to allow functions with
local profiles and 0 global autofdo counts.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* auto-profile.cc (afdo_set_bb_count): Dump inline stacks
and reasons when lookup failed.
(afd
> In addition to working with you on the issues of profile being lost with
> LTO, cloning and other cases, my plan is to
> 1) finish the VPT reorganization
> 2) make AFD reader to scale up the profile since at least in data from
> SPEC or profiledbootstrap the counters are quite small integers w
Hi,
when splitting functions, tree-inline determined correctly entry count of the
new function part, but then in case entry block of new function part is in a
loop it scales body which is not suposed to happen.
Bootstrapped/regtested x86_64-linux, comitted.
* tree-inline.cc (copy_cfg_body
Hi,
this patch makes the AFDO's VPT to happen during early inlining. This should
make the einline pass inside afdo pass unnecesary, but some inlining still
happens there - I will need to debug why that happens and will try to drop the
afdo's inliner incrementally.
get_inline_stack_in_node can now
> In an internal application I noticed that the ipa-inliner is quite
> sensitive to AFDO counts and that seems to make the performance worse.
> Did you notice this? This was before some of your changes. I will try
> again.
The cases I looked into were mixture of late inlining and ipa-cp cloning
be
Hi,
this patch moves afdo inlining from early inliner into specialized one.
The reason is that early inliner is by design non-recursive while afdo
inliner needs to recurse. In the past google handled it by increasing
early inliner iterations, but it can be done easily and cheaply without
it by sim
>
> IMO doing this in a loop would have to handle all the above cases and would
> make it hard to read. Also, we would have two level for now. Even if this
> change in the future, this is not going to be too long.
>
> Here is the revised patch,
>
> Is this OK for mainline.
> gcc/ChangeLog:
>
Hi,
this patch makes afdo_adjust_guessed_profile more agressive on finding scales
on the boundaries of connected components with no annotation. Originaly I
looked for edges into or out of the component with known AFDO counts and I also
haled edges from basic block with known AFDO count and known s
Hi,
while working on auto-FDO I noticed that we may run into ICE because we inline
function with count profile_count::zero to a call site with profile_count::zero.
What may go wrong is that the caller has local profile while callee may have
IPA profiles.
We used to turn all such counts to 0, but t
> From: Dhruv Chawla
>
> This patch modifies afdo_set_bb_count to propagate profile information
> to outline copies of functions if they are not inlined. This information
> gets lost otherwise.
>
> Signed-off-by: Dhruv Chawla
>
> gcc/ChangeLog:
>
> * gcc/auto-profile.cc (count_info): Ad
> Another problem here is that get_inline_stack returns an empty stack if
> no inlining occurred in the corresponding GIMPLE statement. So if an
> inline callsite does exist in the profile at the current GIMPLE
> statement but no inlining actually occurs during auto-profile, the
> information is ju
> > I don't think you kept this logic in the new code?
I really apologize for late reply. I missed that you wait for it.
>
> To be honest, I didn't really follow the logic here. Thinking about the
> single-exit case (which the current code is designed to handle), both
> the body of the if and the
> gcc/ChangeLog:
> * auto-profile.cc (AUTO_PROFILE_VERSION): Bump from 2 to 3.
> (string_table::get_real_name): Define new member function.
> (string_table::get_file_name): Likewise.
> (string_table::get_file_name_idx): Likewise.
> (string_table::real_names_): Define n
> On Mon, Jun 16, 2025 at 05:49:19PM +0200, Jan Hubicka wrote:
> > > On Wed, Apr 30, 2025 at 08:56:57AM +0200, Jakub Jelinek wrote:
> > > > On Mon, Apr 28, 2025 at 07:27:31PM +0200, Josef Melcr wrote:
> > > > > As for the attribute, I am honestly not too sur
Hi,
> Introduction
>
>
> Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish
> profile information for `function_instance's with the same base name, when
> suffixes are removed. To fix this, source file names should be tracked in the
> GCOV file information t
> On Wed, Apr 30, 2025 at 08:56:57AM +0200, Jakub Jelinek wrote:
> > On Mon, Apr 28, 2025 at 07:27:31PM +0200, Josef Melcr wrote:
> > > As for the attribute, I am honestly not too sure about what to do, as
> > > clang
> > > is
> > > not consistent in with its own indexing, be it with the unknown v
Hi,
Currently afdo reads the profile and anotates basic blocks containing
statements which have samples in profile data. For basic blocks which has been
fully optimized out (for example, basic blocks controlling loops that has been
fully unrolled) it has no data which it then tries to determine in
>
> Perhaps someone is interested in the following thread from LKML:
>
> "[PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops"
>
> https://lore.kernel.org/lkml/20250605164733.737543-1-mjgu...@gmail.com/
>
> There are several PRs regarding memcpy/memset linked from the abov
> On 13/06/25 14:51, Jan Hubicka wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > From: Dhruv Chawla
> > Hi,
> > >
> > > For reasons explained in the patch, this patch prevents the loss of
> > >
> From: Dhruv Chawla
Hi,
>
> For reasons explained in the patch, this patch prevents the loss of profile
> information when inlining occurs in the profiled binary but not in the
> auto-profile pass as a decision. As an example, for this code:
I was wondering about this problem too
> - Annotation
> Hi,
>
> > On 4 Jun 2025, at 9:53 pm, Jan Hubicka wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> >> This patch introduces a new testcase to verify the merging of profiles
> >> is performed for cloned
OK,
thanks!
Honza
> On 2025-06-06 12:42, Jan Hubicka wrote:
> > > Hi,
> > > also after fixing this issue my bootstrap failes with:
> > >
> > > Permission error mapping pages.
> > > Consider increasing /proc/sys/kernel/perf_event_mlock_kb,
> > > or try agai
> Hi,
> also after fixing this issue my bootstrap failes with:
>
> Permission error mapping pages.
> Consider increasing /proc/sys/kernel/perf_event_mlock_kb,
> or try again with a smaller value of -m/--mmap_pages.
> (current value: 4294967295,0)
> Permission error mapping pages.
> Consider increa
This patch fixes ICE seen when building spec2k17 with autofdo and enable
checking compiler. Bause we special case 0 of autofdo to be kind of 1 in IPA
scalling, we can now end up with function heving global0 profile but producing
inline clone with nonzero profile.
I think correct way is to extend
Hi,
also after fixing this issue my bootstrap failes with:
Permission error mapping pages.
Consider increasing /proc/sys/kernel/perf_event_mlock_kb,
or try again with a smaller value of -m/--mmap_pages.
(current value: 4294967295,0)
Permission error mapping pages.
Consider increasing /proc/sys/ker
Hi,
New auto-profile merging dumps made me notice that we read the afdo
data when we are in LTO. This is not necessary since profile is read
at compile time and streamed to LTO bytecode.
gcc/ChangeLog:
* coverage.cc (coverage_init): Return early when in LTO.
diff --git a/gcc/coverage.cc
> Should I go with:
>
> +autofdo_target
>
> +autofdo_target="i386"
> +case "${target}" in
> + aarch64-*-*)
> +autofdo_target="aarch64"
> +;;
> +esac
>
> As in the first version? I can test and send a patch for review if there is
> no other better alternative.
This looks OK - I can n
> Hi Honza,
>
> > On 26 May 2025, at 5:28 pm, Jan Hubicka wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> >> Ping?
> > Sorry for the delay. I think I finally got auto-fdo running on my box
&
> Kugan Vivekanandarajah writes:
> > Add support for autoprofiledbootstrap in aarch64.
> > This is similar to what is done for i386. Added
> > gcc/config/aarch64/gcc-auto-profile for aarch64 profile
> > creation.
> >
> > How to run:
> > configure --with-build-config=bootstrap-lto
> > make autoprof
> This patch introduces a new testcase to verify the merging of profiles
> is performed for cloned functions.
>
> Since this is invoked very early, before the pass manager, we need to
> set up the dumping explicitly. This is similar to the handling in
> finish_optimization_passes.
>
> gcc/ChangeL
> On Fri, May 30, 2025 at 11:30 AM Jan Hubicka wrote:
> >
> > Hi,
> > > >
> > > > Hi,
> > > >
> > > > the attached Ada testcase compiled with -O2 -gnatn makes the compiler
> > > > crash in
> > > > vect_ca
Hi,
> >
> > Hi,
> >
> > the attached Ada testcase compiled with -O2 -gnatn makes the compiler crash
> > in
> > vect_can_force_dr_alignment_p during SLP vectorization:
> >
> > if (decl_in_symtab_p (decl)
> > && !symtab_node::get (decl)->can_increase_alignment_p ())
> > return false;
> >
> Hi,
>
> in GCC 15 we allowed jump-function generation code to skip over a
> type-cast converting one integer to another as long as the latter can
> hold all the values of the former or has at least the same precision.
> This works well for IPA-CP where we do then evaluate each jump
> function as
>
> However i do not quite follow the old or new logic here.
> So if I have only one unknown edge out (or in) from BB and I know
> its count, I can determine count of that edge by Kirhoff law.
>
> But then the old code computes number of edges out of the BB
> and if it is only one it updates the
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index 7e0e8c66124..8a317d85277 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1129,6 +1129,26 @@ afdo_set_bb_count (basic_block bb, const stmt_set
> &promoted)
>gimple *stmt = gsi_stmt (gsi);
>if (gimp
> I also noticed that some tests are only enabled for x86. I am also seeing:
> ./gcc/testsuite/gcc/gcc.sum:UNSUPPORTED: gcc.dg/tree-prof/pr66295.c
This is testing a former ifun bug which reproduced with -fprofile-use
> ./gcc/testsuite/gcc/gcc.sum:UNSUPPORTED: gcc.dg/tree-prof/split-1.c
This is test
> Hi,
> autofdo tests are now running only for x86. This patch makes it
> run for aarch64 too. Verified that perf and create_gcov are running
> as expected.
>
> gcc/ChangeLog:
>
> * config/aarch64/gcc-auto-profile: Make script executable.
>
> gcc/testsuite/ChangeLog:
>
> * lib/t
Hi,
since uses of addss for other purposes then modelling FP addition/subtraction
should
be gone now, this patch sets addss cost back to 2.
Bootsrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
PR target/119298
* config/i386/x86-tune-costs.h (struct processor_costs): Set
Hi,
with normal profile feedback checking entry block count to be non-zero is quite
reliable check for presence of non-0 profile in the body since the function
body can only be executed if the entry block was executed. With autofdo this
is not true, since the entry block may just execute too few t
Hi,
This patch makes auto-fdo more careful about keeping info we have
from static profile prediction.
If all counters in function are 0, we can keep original auto-fdo profile.
Having all 0 profile is not very useful especially becuase 0 in autofdo is not
very informative and the code still may hav
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (emit_reduc_half): Use shuffles to
> generate reduc half for V4SI, similar modes.
> * config/i386/i386.h (TARGET_SSE_REDUCTION_PREFER_PSHUF): New Macro.
> * config/i386/x86-tune.def (X86_TUNE_SSE_REDUCTION_PREFER_PSHUF):
>
Hi,
this code to track what locations were used when reading auto-fdo profile
seems dead since the initial commit. Removed thus.
Comitted as obvious.
Honza
gcc/ChangeLog:
* auto-profile.cc (function_instance::mark_annotated): Remove.
(function_instance::total_annotated_count): Re
>
>
> > On 26 May 2025, at 5:34 pm, Jan Hubicka wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> > also, please, can you add an testcase? We should have some coverage for
> > auto-fdo specific is
Hi,
also, please, can you add an testcase? We should have some coverage for
auto-fdo specific issues
Honza
0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch
Description: 0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch
Hi,
> Ping?
Sorry for the delay. I think I finally got auto-fdo running on my box
and indeed I see that if function is cloned later, the profile is lost.
There are .suffixes added before afdo pass (such as openmp offloading or
nested functions) and there are .suffixes added afer afdo (by ipa
clonin
> > On 9 May 2025, at 11:55 am, Kugan Vivekanandarajah
> > wrote:
> >
> > ipa-split is not now run for auto-profile. IMO this was an oversight.
> > This patch enables it similar to PGO runs.
> >
> > gcc/ChangeLog:
> >
> >* ipa-split.cc pass_feedback_split_functions::clone (): New.
> >
> Hi,
>
> starting with GCC 15 the order is not unique for any symtab_nodes but
> m_uid is, I believe we ought to dump the latter in the ipa-clones dump,
> if only so that people can reliably match entries about new clones to
> those about removed nodes (if any).
>
> Bootstrapped and tested on x8
> With the avx512_two_epilogues tuning enabled for zen4 and zen5
> the gcc.target/i386/vect-epilogues-5.c testcase below regresses
> and ends up using AVX2 sized vectors for the masked epilogue
> rather than AVX512 sized vectors. The following patch rectifies
> this and adds coverage for the inten
> Thansk for review.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> n some benchmark, I notice stv failed due to cost unprofitable, but the igain
> is inside the loop, but sse<->integer conversion is outside the loop, current
> cost
> model doesn't consider the
> > gcc/ChangeLog:
> >
> > * config/i386/i386-features.cc
> > (scalar_chain::mark_dual_mode_def): Weight
> > n_integer_to_sse/n_sse_to_integer with bb frequency.
> > (general_scalar_chain::compute_convert_gain): Ditto, and
> > adjust function prototype to ret
> > Instructions with latency info are those really different.
> > So the uncoverted code has sum of latencies 4 and real latency 3.
> > Converted code has sum of latencies 4 and real latency 3
> > (vmod+vpmaxsd+vmov).
> > So I do not quite see it should be a win.
>
> Note this was historically d
Hi,
this patch fixes some of problems with cosint in scalar to vector pass.
In particular
1) the pass uses optimize_insn_for_size which is intended to be used by
expanders and splitters and requires the optimization pass to use
set_rtl_profile (bb) for currently processed bb.
This is n
Hi,
This patch adds pattern matching for float<->int conversions both as normal
statements and promote_demote. While updating promote_demote I noticed that
in cleanups I turned "stmt_cost =" into "int stmt_cost = " which turned
the existing FP costing to NOOP. I also added comment on how demotes a
Hi,
this patch adds ifdef so we don't get warning on ix86_tls_index being
unused.
Bootstrapped x86_64-linux, comitted.
* config/i386/i386.cc (ix86_tls_index): Add ifdef.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f28c92a9d3a..89f518c86b5 100644
--- a/gcc/config/
Hi,
Inliner currently applies different heuristics to hot and cold calls (the
second are inlined only if the code size will shrink). It may happen that the
call itself is hot, but the significant time is spent in callee and inlining
makes it faster. For this reason we want to check if the anticip
Hi,
ix86_rtx_costs VEC_MERGE by special casing AVX512 mask operations and otherwise
returning cost->sse_op completely ignoring costs of the operands. Since
VEC_MERGE is also used to represent scalar variant of SSE/AVX operation, this
means that many instructions (such as SSE converisions) are ofte
> target_insn_cost is used to prevent rpad optimization to be restored by
> late_combine1, looks like it's not sufficient for size_cost.
>
> 21804static int
> 21805ix86_insn_cost (rtx_insn *insn, bool speed)
> 21806{
> 21807 int insn_cost = 0;
> 21808 /* Add extra cost to avoid post_reload late
> > so gain is the difference of runtime of integer variant compared to
> > vector vairant and cost are the extra int->see and sse->int conversions
> > needed?
> >
> > If you scale everything by a BB frequency, you will get a weird
> > behaviour if chain happens to consist only of instructions in
1 - 100 of 1269 matches
Mail list logo