> > So I think all we can hope for is merging memcpy with the extra write of 0.
>
> That's not actually clear.
>
> It would be reasonable to assume that foo isn't likely to change the string
> and have the inlined destructor for a string that was initialized as a short
> string like here do somet
> different issue from the one that is raised in the PR. (Unless we think that
> -O2 and -O3 should always have the same inlining heuristics henceforward, but
> that seems unlikely.)
Yes, I think point of -O3 is to let compiler to be more aggressive than
what seems desirable for your average dist
> Is the option supposed to be only about the standard global scope operator
> new/delete (_Znam etc.) or also user operator new/delete class methods? If
> the
> former, then I agree it is a global property (or at least a per shared
> library/binary property, one can arrange stuff with symbol vis
This patch attempts to add __builtin_operator_new/delete. So far they
are not optimized, which will need to be done by extra flag of BUILT_IN_
code. also the decl.cc code can be refactored to be less of cut&paste
and I guess has_builtin hack to return proper value needs to be moved
to C++ FE.
How
There is still problem with loop bounds. I am testing patch on that and
then we should be (finally) finally safe.
> Note GCC has not retuned its -Os heurstics for a long time because it has been
> decent enough for most folks and corner cases like this is almost never come
> up.
There were quite few changes to -Os heuristics :)
One of bigger challenges is that we do see more and more C++ code built
with -Os wh
Looking at the prototype patch, why need to change also the splitters?
My original goal was to use splitters to expand to faster code sequences
while having patterns necessary for both variants. This makes it
possible to use optimize_insn_for_size/speed and make decisions using BB
profile, since
> > I guess PTA gets around by tracking points-to set also for non-pointer
> > types and consequently it also gives up on any such addition.
>
> It does. But note it does _not_ for POINTER_PLUS where it treats
> the offset operand as non-pointer.
>
> > I think it is ipa-prop.c::unadjusted_ptr_an
> Confirm. But option save/restore has been always implemented:
>
> .section.gnu.lto_.opts,"",@progbits
> .ascii "'-fno-openmp' '-fno-openacc' '-fno-pie' '-fcf-protection"
> .ascii "=none' '-mabi=lp64d' '-march=loongarch64' '-mfpu=64' '-m"
> .ascii "simd=lasx' '-mcmodel=nor
> But adds a return with a value. And then the inliner inlines foo into foo2 but
> we still have the return with a value around ...
I guess ICF can special case unused return value, but why this is not
taken care of by ipa-sra?
> > If I comment it out as above patch, then O3/PGO can get 16% and 12%
> > performance
> > improvement compared to O2 on x86.
> >
> > O2 O3 PGO
> > cycles 2,497,674,824 2,104,993,224 2,199,753,593
> > instructions1
> This heuristic wants to catch
>
>
> if (foo) abort ();
>
>
> and avoid sinking "too far" across a path with "similar enough"
> execution count (I think the original motivation was to fix some
> spilling / register pressure issue). The loop depth test
> should be !(bb_loop_depth (best_b
> I suspect this is most likely the profile updates changes ...
Quite possibly. The goal of this excercise is to figure out if there are
some bugs in profile estimate or whether passes somehow preffer broken
profile or if it is just back luck.
Looking at sphinx and fatigue it seems that LRA really
>
> why disallow caller->indirect_calls?
See testcase in comment #9
>
> > + return false;
> > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee)
>
> I don't think this flys - it looks quadratic. Can we compute this
> in the inline summary once instead?
I guess I can
Just so it is somewhere, here is a testcase that we can't inline leaf
functions to always_inlines unless we do some tracking of what calls
were formerly indirect calls.
We really overloaded always_inline from the original semantics "drop
inlining heuristics" into "be sure that result is inlined" w
>
> There is no guarantee that std::vector::max_size() is PTRDIFF_MAX. It
> depends on the Allocator type, A. A user-defined allocator could have
> max_size() == 100.
If inliner we see path to the throw functions, it will not determine
_M_check_len as early inlinable.
Perhaps we can __builtin_con
> > Indeed it is quite long time problem with clang not building with lifetime
> > DSE and strict aliasing. I wonder why this is not fixed on clang side?
>
> Because the problems were not communicated? I knew that Firefox needed
> -flifetime-dse=1, but it's the first time I hear that any such pro
>
> Do you mean we should fix modeling of divisions there as well? I don't have
> latency/throughput measurements for those CPUs, nor access so I can run
> experiments myself, unfortunately.
>
> I guess you mean just making a patch to model division units separately,
> leaving latency/throughput
> > For this one it's PRE hoisting *b across the endless loop (PRE handles
> > calls as possibly not returning but not loops as possibly not
> > terminating...)
> > So it's a different bug.
>
> Btw, C++ requiring forward progress makes the testcase undefined.
In my understanding access to volatil
> > My guess is that the
> > BUILD_BUG();
> > line is the sole thing that is wrong, it should be just break;
> > as the memory_is_poisoned_n(addr, size); will handle all the sizes,
> > regardless if they are constants or not.
>
> Sure, I'm going to suggest such a change.
To me it looked like a pro
> To me, all of these do the same thing and should generate the same code.
> As nobody else can see removeme, and we aren't leaking its address, shouldn't
> the compiler be able to deduce that all accesses to removeme are
> inconsequential and can be removed?
>
> My gcc 11.3 generates a condidion
> I would say so. It saves code size and also uop space unless the two
> can magically fuse to a immediate to %xmm move (I doubt that).
I made simple benchmark
double a=10;
int
main()
{
long int i;
double sum,val1,val2,val3,val4;
for (i=0;i<10;i++)
{
#if
> > According to znver2_cost
> >
> > Cost of sse_to_integer is a little bit less than fp_store, maybe increase
> > sse_to_integer cost(more than fp_store) can helps RA to choose memory
> > instead of GPR.
>
> That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm
> but GPR<->xmm s
> > bool
> Since the pass issues a bunch other warnings (e.g., -Wstringop-overflow,
> -Wuse-after-free, etc.) the gate doesn't seem right. But since #pragma GCC
> diagnostic can re-enable warnings disabled by -w (or turn them into errors)
> any
> gate that considers the global option setting will
So I assume that this is due to new pass_waccess which was added into
early optimizations. I think this is not really ipa component but
tree-optimize.
> So nothing to see? I guess our unit growth limit doesn't trigger because it's
> a small (benchmark) unit?
Yep, unit growths do not apply for very small units. ipa-cp heuristics
still IMO needs work and be based on relative speedups rather then
absolute for the cutoffs.
>
> Sure - I just remember (falsely?) that we finally decided to do it :)
I do not recall this, but I may have forgotten :))
> If we don't run IPA inline we don't figure we failed to inline the
> always_inline either ;) And IPA inline can expose more indirect
> alywas-inlines we only discover a
> You can not disable an IPA pass becasuse then we will mishandle
> optimize attributes. I think you simply want to set
>
> flag_inline_small_functions = 0
> flag_inline_functions_called_once = 0
Actually I forgot, we have flag_no_inline which makes
tree_inlinable_function_p to return false for
> --- Comment #6 from Richard Biener ---
> Honza, -Og was supposed to not do so much work, I intended to disable IPA
> inlining but there's no knob for that. I wonder where to best put such
> guard? I set flag_inline_small_functions to zero for -Og but we still
> run inline_small_functions ().
on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2
and -Ofast -march=native which is both very nice!
Zen1 for some reason sees less improvement, about 6%.
With PGO it is 3.8%
Overall it seems a win, but there are few noteworthy issues.
I also see a 6.69% regression on x64 with
>
> Well, I'm specifically speaking about:
> error: the control flow of function ‘BZ2_compressBlock’ does not match its
> profile data (counter ‘arcs’)
>
> this type of errors should not happen even in a multi-threaded programs.
There are some cases where I see even those on clang build - I am
The patch passed testing on x86_64-linux.
This is bit modified patch I am testing. I added pre-computation of the
number of accesses, enabled the path for const functions (in case they
have memory operand), initialized alias sets and clarified the logic
around every_* and global_memory_accesses
PR tree-optimization/103168
> (The -fno-semantic-interposition thing is probably the biggest performance gap
> between gcc -fpic and clang -fpic.)
Yep, it is often confusing to users (who do not understand what ELF
interposition is) that clang and gcc disagree on default flags here.
Recently -Ofast was extended to imply -fno-
Needs -O2 -floop-unroll-and-jam --param early-inlining-insns=14
to fail, so I guess it may be issue with unrol-and-jam.
> @@ -1,4 +1,3 @@
> -static int
> __attribute__ ((noinline,const))
> infinite (int p)
> {
Just for a record, it crahes with or without static int here for me :)
I run across it because the code tracking must access in ipa-sra is IMO
conceptually wrong. I noticed that because ipa-modref solves
Aha, but here is better example (reproduces same way).
In the former one I forgot const attribute which makes it invalid.
The testcase tests that ipa-sra is missing ECF_LOOPING_CONST_OR_PURE
check
static int
__attribute__ ((noinline))
infinite (int p)
{
if (p)
while (1);
return p;
}
__attr
Works for me even with the 3 warnings.
hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ cat >tt.c
__attribute__ ((noinline,const))
infinite (int p)
{
if (p)
while (1);
return p;
}
__attribute__ ((noinline))
static void
test(int p, int *a)
{
int v = infinite (p);
if (*a && v)
__
> [659] %
> [659] % gcctk -O0 -w small.c
> [660] %
> [660] % gcctk -O1 -w small.c
> [661] % gcctk -O1 -w small.c
> [662] % gcctk -O1 -w small.c
> gcctk: internal compiler error: Segmentation fault signal terminated program
> cc1
> Please submit a full bug report,
> with preprocessed source if app
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103230
>
> --- Comment #2 from Martin Liška ---
> > How do you build ubsan compiler?
>
> F="-O0 -g -fsanitize=undefined" ; make -j16 all-host -k CFLAGS="$F"
> CXXFLAGS="$F" LDFLAGS="$F"
>
> is the fastest approach.
Thanks, it is similar to what I
> Happens with UBSAN compiler for:
>
> $ gcc gcc/testsuite/gcc.c-torture/execute/pr71494.c -O1 -flto
> ...
> /home/marxin/Programming/gcc/gcc/ipa-modref-tree.h:550:33: runtime error: load
> of value 255, which is not a valid value for type 'bool'
> #0 0x18acc38 in modref_tree::merge(modref_tr
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103211
>
> --- Comment #2 from Martin Liška ---
> Optimized dump differs for couple of functions in the same way:
>
> diff -u good bad
> --- good2021-11-12 17:42:36.995947103 +0100
> +++ bad 2021-11-12 17:41:56.728194961 +0100
> @@ -38,7 +38
The sanity check verifies that functions acessing parameter indirectly
also reads the parameter (otherwise the indirect reference can not
happen). This patch moves the check earlier and removes some overactive
flag cleaning on function call boundary which introduces the non-sential
situation. I g
Note that it still seems to me that the crossed_loop_header handling is
overly conservative. We have:
@ -2771,6 +2771,7 @@ jt_path_registry::cancel_invalid_paths
(vec &path)
bool seen_latch = false;
int loops_crossed = 0;
bool crossed_latch = false;
+ bool crossed_loop_header = false;
>
> This PR is still open, at least for slowdown in the threader with LTO. The
> issue is ranger wide, so it may also cause slowdowns on non-LTO builds for
> WRF, though I haven't checked.
I just wanted to record the fact somewhere since I was looking up the
revision range mostly to figure out i
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
>
> Aldy Hernandez changed:
>
>What|Removed |Added
>
> Depends on||103058
>
> --- Comment #
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103040
>
> --- Comment #15 from Iain Buclaw ---
> Got it. The difference between D and C++ is a matter of early inlining.
>
> The C++ example Jakub posted fails in the same way that D does if you compile
> with: -O1 -fno-inline
Great, I will take a
> See above comments from Iain, even if that pre-initialization is removed it is
> still miscompiled. And, the testcase fails not because of the padding bits
> not
> being zero, but because the address of self stored into one of the fields
> isn't
> there or modref thinks it can't be changed or
> Not seen on Haswell (but w/o PGO). Is this PGO specific? There's another
> large jump visible end of 2019.
It is between 2019-11-15 and 18 but the revisions does not exist at git
- perhaps they reffer to the old git mirror. Martin will know better.
In that range there are many of Richard's vec
>
> fixup_cfg already removes write-only stores so that seems fit for that
> purpose.
>
> Btw,
>
> static int x = 1;
>
> int main()
> {
> x = 1;
> }
>
> should ideally be handled as well as maybe the more common(?)
>
> static int x[128];
>
> int main()
> {
> memset (x, 0, 128*4);
> }
>
> Started with r5-6477-g3620b606822f80863488ca4883542d848d41f9f9
This only affects early inlining decisions, so it may be useful to
bisect this with --param early-inlining-insns=14
Honza
> Any *.opt changes can break the streaming of optimization or target option
> nodes.
> And from experience with gcc plugins we have such changes ~ each month even on
> release branches.
It may make sense to add a simple test to our regular testers that
either the new revision can consume old objec
> At -O3 the unused 'c' remains. Likely different (recursive?) inlining makes
> us
> process a cgraph cycle in different order and thus fail to elide the output
> of 'c' (it's output first at -O3).
>
> Fixing that would need processing cgraph SCCs with an extra IPA phase in main
> optimization s
> FYI, I have today bootstrapped it as well in rpm build on
> {x86_64,i686,powerpc64le}-linux, both your patch and just trunk without the
> workaround I've been using before. The latter failed to bootstrap on i686
> and passed it on x86_64 and powerpc64le, the former passed bootstrap on all
> arch
> Ah, yeah, that will make a big difference.
> So clang is using 'make check', running a test-suite for a PGO build, right?
It uses
make check-llvm
make check-clang
and then it rebuilds whole llvm with the instrumented compiler.
Honza
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99105
>
> --- Comment #8 from Martin Liška ---
> This is what I see for GCC PGO in train stage. It's from perf top:
>
>4.33% cc1plus [.]
> __gcov_indirect_call_profiler_v4
> ◆
>2.28
> A small improvement can be achieved by the removal of libgcov I/O buffering:
> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=5a17015c096012b9e43a8dd45768a8d5fb3a3aee
So it effectively replaces gcov's own buffered I/O by stdio. First I am
not sure how safe it is (as we had a lot of fun about usin
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99097
>
> --- Comment #5 from Martin Liška ---
> (In reply to Jan Hubicka from comment #3)
> > > I've just tried to reproduce it:
> > > ../configure --with-build-config=bootstrap-lto --enable-checking=release
>
> I've just tried to reproduce it:
> ../configure --with-build-config=bootstrap-lto --enable-checking=release
> --disable-plugin
>
> But the build is fine for me.
On our dhcp230 (zen III machine) it works if you make system linker ld,
if system linker is gold (from tumbleweed) it fails
GNU gold (
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98330
>
> --- Comment #4 from Richard Biener ---
> So modref allocates a fnspec_summary for an unknown indirect call (NULL
> callee)
> but then in compute_parm_map calls function_or_virtual_thunk_symbol on
> that NULL callee unconditionally. We hav
> @Marek: The callgraph checking error is correct.
> If you disable it, you will likely see duplicate assembler names in GAS. And
> that's the error that 2 symbol names clash.
Indeed, there are two lambdas, but I think C++ FE should assign them
different symbol names.
Honza
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>
> --- Comment #18 from Martin Sebor ---
> Let me explain how this works. The VLA bounds in function parameters are used
> in two ways:
> 1) in the front end, to check function redeclarations involving arrays and
> VLAs
> for equivalence,
>
Hi,
this ought to be fixed by g:0862d007b564eca8c9a48fca0e689dd3f90db828
sorry for the breakage. OBJ_TYPE_REF in obj-C frontend is odd.
This patch fixes the issue by making the conflict with C type sticky via
clearing the CXX bit. I checked that it recovers profiledbootstrap,
hwoever I want to look into the code tomorrow bit more to be sure that
it does not disable more than it should.
Honza
diff --git a/gcc/ipa-utils.h b/gcc/ip
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840
>
> --- Comment #14 from Martin Sebor ---
> Created attachment 49572
> --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49572&action=edit
> Patch under test.
>
> The attached patch avoids the warning on aarch64. Let me finish testing it
The checking enabled build ICEs for me at same spot as for you
0x01475505 <+165>: punpcklqdq %xmm2,%xmm3
0x01475509 <+169>: movaps %xmm3,0x30(%rsp)
0x0147550e <+174>: callq 0x10949d0
::iterator::slide()>
0x01475513 <+179>: mov%r12,0x20(%rsp
> I agree we should just rename default_is_empty_type to is_empty_type, export
> it, declare in tree.h and use it instead that complicated test. TYPE_EMPTY_P
> isn't something tree-ssa-uninit.c should care about, that is just whether the
> backend decided it will not be passed at all.
OK, perhaps
> Note i686-linux bootstrap is still broken in r11-5062 - the PR97853 error.
Yes, as discussed earlier (but perhaps lost in other coments) we need
fix for the targetm.calls.empty_record_p (type) divergence. It is not
clear to me if simply calling the default implementation instead of the
rather com
It seems to crash on quite few locaitons but always related to indirect
calls. So perhaps there is some sort of weird relation to indirect call
profiling or devirutalization...
I am going to move my build to faster machine.
Honza
> > Yep, I already worked out it is ipa-icf...
> > Do you have easy way to bisect what merge is causing the failure?
>
> Working on that will send details soon.
Great, thanks. In meantime I will check if I can isolate one of the paths
(constant access merging, variable access merging on the two o
> I see a similar bootstrap failure that's with:
>
> ../configure --enable-languages=c,c++,lto --prefix=/home/marxin/bin/gcc
> --disable-multilib --without-isl --disable-libsanitizer
> --with-build-config=bootstrap-lto-lean && make profiledbootstrap
> 'STAGE1_CFLAGS=-g -O2'
>
> started with r11-4
will clean it up incrementally.
gcc/ChangeLog:
2020-11-03 Jan Hubicka
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Fix ICE with
in dumping code.
(cgraph_node::remove): Save clone info before releasing it and pass it
to unregister.
* cgraph.h
> It needs to refer to the DW_TAG_formal_parameter DIEs, and only the PARM_DECLs
> map to those.
It has problem with the partitioning (if we call a callee from different
parititon) and also if the callee is compiled before caller (as it
should) we will call cgraph_node::release_body and that will l
Hi,
this patch fixes the ICE, though I think we do have a design issue here
while producing debug info across ltrans boundary.
Martin, Jakub: as discussed on IRC it would be nice to add predicate
when the body is really needed and avoid materializing if it is not.
Can you add one?
Something like
> Hi,
> this is patch that moves updates to WPA time. Does it work for you?
Actually it won't help, since it updates only non-lto summary. I am
testing better patch, sorry for that.
Honza
Hi,
this is patch that moves updates to WPA time. Does it work for you?
Honza
2020-10-27 Jan Hubicka
* ipa-modref.c (modref_summaries_lto::duplicate): Check that no clones
happens after modref.
(modref_transform): Rename to ...
(update_signature): ... this
> So the _bfd_safe_read_leb128.constprop removes the first unused argument:
>
...
>
> But the analysis is bogus:
>
> ipa-modref: call to _bfd_safe_read_leb128.constprop/17919 does not clobber
> ref:
> bytes_read alias sets: 7->7
>
> The &bytes_read is always modified in the function (if it's n
still need the stronger hint though.
gcc/ChangeLog:
2020-10-20 Jan Hubicka
PR c/97445
* ipa-fnsummary.c (ipa_dump_hints): Handle
INLINE_HINT_builtin_constant_p.
(ipa_fn_summary::~ipa_fn_summary): Free builtin_constant_p_parms.
(ipa_fn_summary_t
>
> Original asm is:
>
> __attribute__ ((noinline))
> int fls64(__u64 x)
> {
> int bitpos = -1;
> asm("bsrq %1,%q0"
> : "+r" (bitpos)
> : "rm" (x));
> return bitpos + 1;
> }
>
> There seems to be bug in bsr{q} pattern. I can make GCC produce same
> code with:
>
> __attribute__ ((n
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #33 from Jakub Jelinek ---
> (In reply to Jan Hubicka from comment #32)
> > get_order is a wrapper around ffs64. This can be implemented w/o asm
> > statement as follows:
> > int
> > m
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #31 from Segher Boessenkool ---
> (In reply to Jan Hubicka from comment #27)
> > It is because --param inline-insns-single was reduced for -O2 from 200
> > to 70. GCC 10 has newly different set of pa
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #23 from Christophe Leroy ---
> (In reply to Jan Hubicka from comment #19)
> >
> > It is always possible to always_inline functions that are intended to be
> > always inlined.
> > Honza
>
>
> They have the very same problem when I disable a statically pre-allocated
> buffers with -mllvm -vp-static-alloc=0:
>
> Program received signal SIGILL, Illegal instruction.
> 0x004014e6 in calloc (nmemb=1, size=8) at pr97461.c:103
> 103 if (malloc_depth != 0) __builtin_trap();
> No. The only thing we support is a recursive malloc as seen in:
> ./gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
>
> It was added in g:bc2b1a232b1825b421a1aaa21a0865b2d1e4e08c as we use a
> statically allocated buffer when we recursively entry allocate_gcov_kvp.
>
> However this is d
Hi,
the following patch should let us to pinpoint the wrong disambiguation.
With -fdump-tree-all-details we should also see the difference in dump
file.
Honza
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index cf8775b2b66..07946a85ecc 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -171,6 +17
Recursion is handled in normal compilation (we analyze the function and
while hitting the recursive call we skip the summary). I suppose here
the problem is missing LTO and offloading.
With LTO lto summaries (that include types) are streamed out while they
are turned into non-lto summaries at ltr
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96794
>
> --- Comment #4 from Martin Liška ---
> > > For jobserver they are still running even though they sleep.
> > Aha, so it is extra locking mechanizm we add without jobserver
> > knowledge.
>
> It's unrelated to jobserver, one can enable it wi
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96794
>
> --- Comment #2 from Martin Liška ---
> (In reply to Jan Hubicka from comment #1)
> > > As seen
> > > here:https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/7e0363da60dcddbfde4ab6
> As seen
> here:https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/7e0363da60dcddbfde4ab68fa3be755515166297/gcc-10-with-zstd.svg
>
> each blocking linking of a GCC front-end leads to a wasted jobserver worker.
Hmm, I am not sure how to interpret the graph. I can see th
> I think, this inliner change needs to be reverted. People expect -O2 to
> produce
> decently optimized binaries, and starting with gcc 10.x it doesn't deliver.
> -O3
> traditionally enabled optimizations that may or may not improve performance
> (and historically, sometimes even break code), so
>
> Maybe you want to use same GCC version as phoronix used (GCC 10.2)?
OK, I will give it a try, but there are no inliner changes in gcc 10.2
compared to 10.1.
Honza
> I think Honza ran into this himself.
Yep, i converted code to use wide-ints. But it is nice to have short
testcase.
Honza
> Which ARM target has 16-bit int?
> I don't see INT_TYPE_SIZE nor SHORT_TYPE_SIZE defined in config/arm/*, neither
> BITS_PER_WORD, so all depends on UNITS_PER_WORD, which is 4 and thus short is
> 16-bit and int is 32-bit.
Hmm, you are right - I messed up target triplets. With arm-linux-gnueabi
I
Ok,
I managed to reproduce the crash locally (it was not that easy)
At the point of failure the node passes verification and I suppose
problem is that the call stmt hash contains indirect call while it is
supposed to contain direct call.
Edge removal code probably replaces direct edge by indreict
> xxx.localalias is gcc-generated as a noninterposable alias to xxx. But I guess
> target node returned by xxx.localalias->function_symbol() is not xxx. A simple
that ought to return xxx unless the target of localalias is thunk that
is not recursive.
> thing we can do is to write a simple case to f
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93084
>
> --- Comment #6 from fxue at gcc dot gnu.org ---
> Could you share how you build clang with PGO, and train workload?
It needs a lot of patience. If you have patch I can try it since I
still have the train data and corresponding gcc tree.
I
>
> I don't think so. But I don't know much about that bug, it is something
> with AVX I think? If you are talking about PR79224.
I see, we have separate PR for that, good ;)
>
> > Also with profile feedback perhaps you have enough info to tell that the
> > speculative path is almost as likely
> Scheduling should never move very expensive instructions to places they
> are executed more frequently. This patch fixes that, reducing the
> execution time of c-ray by over 40% (I tested on a BE Power7 system).
>
> This introduces a new target hook sched.can_speculate_insn which returns
> whet
Can you please compile with --verbose --save-temps and attach the output +
temporary files produced?
(in particular I wonder about resolution file that should be named *.res)
Thanks,
32-bit eon runs improved today, though I am not 100% sure it is ude to
vectorization or the unit growth change
http://gcc.opensuse.org/SPEC/CINT/sb-frescobaldi.suse.de-head-64-32o-32bit/252_eon_recent_big.png
Overall we had better scores on 32bit eon in the past however
http://gcc.opensuse
1 - 100 of 199 matches
Mail list logo