Re: not computable at load time
On Fri, May 25, 2018 at 8:05 PM Paul Koning wrote: > One of my testsuite failures for the pdp11 back end is gcc.c-torture/compile/930326-1.c which is: > struct > { >char a, b, f[3]; > } s; > long i = s.f-&s.b; > It fails with "error: initializer element is not computable at load time". > I don't understand why because it seems to be a perfectly reasonable > compile time constant; "load time" doesn't enter into the picture that > I can see. It means there's no relocation that can express the result of 's.f - &s.b' and the frontend doesn't consider this a constant expression (likely because of the conversion). > If I replace "long" by "short" it works correctly. So presumably it has > something to do with the fact that Pmode == HImode. But how that translates > into this failure I don't know. > paul
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Sat, May 26, 2018 at 12:36 PM Richard Biener wrote: > On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensen < li...@carewolf.com> wrote: > >I brought this subject up earlier, and was told to suggest it again for > >gcc 9, > >so I have attached the preliminary changes. > > > >My studies have show that with generic x86-64 optimization it reduces > >binary > >size with around 0.5%, and when optimizing for x64 targets with SSE4 or > > > >better, it reduces binary size by 2-3% on average. The performance > >changes are > >negligible however*, and I haven't been able to detect changes in > >compile time > >big enough to penetrate general noise on my platform, but perhaps > >someone has > >a better setup for that? > > > >* I believe that is because it currently works best on non-optimized > >code, it > >is better at big basic blocks doing all kinds of things than tightly > >written > >inner loops. > > > >Anythhing else I should test or report? > If you have access to SPEC CPU I'd like to see performance, size and compile-time effects of the patch on that. Embedded folks may want to rhn their favorite benchmark and report results as well. So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile and run and the compile-time effect where measurable (SPEC records on a second granularity) is within one second per benchmark apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). Performance-wise I notice significant slowdowns for SPEC FP and some for SPEC INT (I only did a train run sofar). I'll re-run with ref input now and will post those numbers. binary size numbers show an increase for 403.gcc, 433.milc 444.namd and otherwise decreases or no changes. The changes are in the sub-percentage area of course. Overall 12583 "BBs" are vectorized. I need to improve that reporting for multiple (non-)overlapping instances. I realize that combining -O2 with -march=haswell might not be what people do but I tried to increase the number of vectorized BBs. Richard. > Richard. > >Best regards > >'Allan > > > > > >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > >index beba295bef5..05851229354 100644 > >--- a/gcc/doc/invoke.texi > >+++ b/gcc/doc/invoke.texi > >@@ -7612,6 +7612,7 @@ also turns on the following optimization flags: > > -fstore-merging @gol > > -fstrict-aliasing @gol > > -ftree-builtin-call-dce @gol > >+-ftree-slp-vectorize @gol > > -ftree-switch-conversion -ftree-tail-merge @gol > > -fcode-hoisting @gol > > -ftree-pre @gol > >@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following > >optimization flags: > > -floop-interchange @gol > > -floop-unroll-and-jam @gol > > -fsplit-paths @gol > >--ftree-slp-vectorize @gol > > -fvect-cost-model @gol > > -ftree-partial-pre @gol > > -fpeel-loops @gol > >@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is > > > >enabled by default at > > @item -ftree-slp-vectorize > > @opindex ftree-slp-vectorize > >Perform basic block vectorization on trees. This flag is enabled by > >default > >at > >-@option{-O3} and when @option{-ftree-vectorize} is enabled. > >+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled. > > > > @item -fvect-cost-model=@var{model} > > @opindex fvect-cost-model > >diff --git a/gcc/opts.c b/gcc/opts.c > >index 33efcc0d6e7..11027b847e8 100644 > >--- a/gcc/opts.c > >+++ b/gcc/opts.c > >@@ -523,6 +523,7 @@ static const struct default_options > >default_options_table[] = > > { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, > > { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, > > { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 }, > >+{ OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > > > > /* -O3 optimizations. */ > >{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, > >@@ -539,7 +540,6 @@ static const struct default_options > >default_options_table[] = > > { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 }, > > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, > > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, > >-{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, > >VECT_COST_MODEL_DYNAMIC > >}, > > { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, > > { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
RISC-V problem with weak function references and -mcmodel=medany
Hello, I try to build a 64-bit RISC-V tool chain for RTEMS. RTEMS doesn't use virtual memory. The reference chips for 64-bit RISC-V such as FU540-C000 locate the RAM at 0x8000_. This forces me to use -mcmodel=medany in 64-bit mode. The ctrbegin.o contains this code (via crtstuff.c): extern void *__deregister_frame_info (const void *) __attribute__ ((weak)); ... # 370 "libgcc/crtstuff.c" static void __attribute__((used)) __do_global_dtors_aux (void) { static _Bool completed; if (__builtin_expect (completed, 0)) return; # 413 "libgcc/crtstuff.c" deregister_tm_clones (); # 423 "libgcc/crtstuff.c" if (__deregister_frame_info) __deregister_frame_info (__EH_FRAME_BEGIN__); completed = 1; } Which is: .text .align 1 .type __do_global_dtors_aux, @function __do_global_dtors_aux: lbu a5,completed.3298 bnez a5,.L22 addi sp,sp,-16 sd ra,8(sp) call deregister_tm_clones lla a5,__deregister_frame_info beqz a5,.L17 lla a0,__EH_FRAME_BEGIN__ call __deregister_frame_info .L17: ld ra,8(sp) li a5,1 sb a5,completed.3298,a4 addi sp,sp,16 jr ra .L22: ret If I link an executable I get this: /opt/rtems/5/lib64/gcc/riscv64-rtems5/9.0.0/../../../../riscv64-rtems5/bin/ld: /opt/rtems/5/lib64/gcc/riscv64-rtems5/9.0.0/crtbegin.o: in function `.L0 ': crtstuff.c:(.text+0x72): relocation truncated to fit: R_RISCV_CALL against undefined symbol `__deregister_frame_info' I guess, that the resolution of the weak reference to the undefined symbol __deregister_frame_info somehow sets __deregister_frame_info to the absolute address 0 which is illegal in the following "call __deregister_frame_info"? Is this construct with weak references and a -mcmodel=medany supported on RISC-V at all? If I change crtstuff.c like this using weak function definitions diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c index 5e894455e16..770e3420c92 100644 --- a/libgcc/crtstuff.c +++ b/libgcc/crtstuff.c @@ -177,13 +177,24 @@ call_ ## FUNC (void) \ /* References to __register_frame_info and __deregister_frame_info should be weak in this file if at all possible. */ -extern void __register_frame_info (const void *, struct object *) - TARGET_ATTRIBUTE_WEAK; +extern void __register_frame_info (const void *, struct object *) ; +TARGET_ATTRIBUTE_WEAK void __register_frame_info (const void *unused, struct object *unused2) +{ + (void)unused; + (void)unused2; +} + extern void __register_frame_info_bases (const void *, struct object *, void *, void *) TARGET_ATTRIBUTE_WEAK; -extern void *__deregister_frame_info (const void *) - TARGET_ATTRIBUTE_WEAK; + +extern void *__deregister_frame_info (const void *); +TARGET_ATTRIBUTE_WEAK void *__deregister_frame_info (const void *unused) +{ + (void)unused; + return 0; +} + extern void *__deregister_frame_info_bases (const void *) TARGET_ATTRIBUTE_WEAK; extern void __do_global_ctors_1 (void); then the example program links. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: PR80155: Code hoisting and register pressure
On Sat, 26 May 2018, Bin.Cheng wrote: > On Fri, May 25, 2018 at 5:54 PM, Richard Biener wrote: > > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: > >>On 05/25/2018 03:49 AM, Bin.Cheng wrote: > >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > >>> wrote: > On 23 May 2018 at 18:37, Jeff Law wrote: > > On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > >> On 23 May 2018 at 13:58, Richard Biener wrote: > >>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >>> > Hi, > I am trying to work on PR80155, which exposes a problem with > >>code > hoisting and register pressure on a leading embedded benchmark > >>for ARM > cortex-m7, where code-hoisting causes an extra register spill. > > I have attached two test-cases which (hopefully) are > >>representative of > the original test-case. > The first one (trans_dfa.c) is bigger and somewhat similar to > >>the > original test-case and trans_dfa_2.c is hand-reduced version of > trans_dfa.c. There's 2 spills caused with trans_dfa.c > and one spill with trans_dfa_2.c due to lesser amount of cases. > The test-cases in the PR are probably not relevant. > > Initially I thought the spill was happening because of "too many > hoistings" taking place in original test-case thus increasing > >>the > register pressure, but it seems the spill is possibly caused > >>because > expression gets hoisted out of a block that is on loop exit. > > For example, the following hoistings take place with > >>trans_dfa_2.c: > > (1) Inserting expression in block 4 for code hoisting: > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > > (2) Inserting expression in block 4 for code hoisting: > >>{plus_expr,_4,1} (0006) > > (3) Inserting expression in block 4 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > (4) Inserting expression in block 3 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > The issue seems to be hoisting of (*tab + 1) which consists of > >>first > two hoistings in block 4 > from blocks 5 and 9, which causes the extra spill. I verified > >>that by > disabling hoisting into block 4, > which resulted in no extra spills. > > I wonder if that's because the expression (*tab + 1) is getting > hoisted from blocks 5 and 9, > which are on loop exit ? So the expression that was previously > computed in a block on loop exit, gets hoisted outside that > >>block > which possibly makes the allocator more defensive ? Similarly > disabling hoisting of expressions which appeared in blocks on > >>loop > exit in original test-case prevented the extra spill. The other > hoistings didn't seem to matter. > >>> > >>> I think that's simply co-incidence. The only thing that makes > >>> a block that also exits from the loop special is that an > >>> expression could be sunk out of the loop and hoisting (commoning > >>> with another path) could prevent that. But that isn't what is > >>> happening here and it would be a pass ordering issue as > >>> the sinking pass runs only after hoisting (no idea why exactly > >>> but I guess there are cases where we want to prefer CSE over > >>> sinking). So you could try if re-ordering PRE and sinking helps > >>> your testcase. > >> Thanks for the suggestions. Placing sink pass before PRE works > >> for both these test-cases! Sadly it still causes the spill for the > >>benchmark -:( > >> I will try to create a better approximation of the original > >>test-case. > >>> > >>> What I do see is a missed opportunity to merge the successors > >>> of BB 4. After PRE we have > >>> > >>> [local count: 159303558]: > >>> : > >>> pretmp_123 = *tab_37(D); > >>> _87 = pretmp_123 + 1; > >>> if (c_36 == 65) > >>> goto ; [34.00%] > >>> else > >>> goto ; [66.00%] > >>> > >>> [local count: 54163210]: > >>> *tab_37(D) = _87; > >>> _96 = MEM[(char *)s_57 + 1B]; > >>> if (_96 != 0) > >>> goto ; [89.00%] > >>> else > >>> goto ; [11.00%] > >>> > >>> [local count: 105140348]: > >>> *tab_37(D) = _87; > >>> _56 = MEM[(char *)s_57 + 1B]; > >>> if (_56 != 0) > >>> goto ; [89.00%] > >>> else > >>> goto ; [11.00%] > >>> > >>> here at least the stores and loads can be hoisted. Note this > >>> may also point at the real issue of the code hoisting which is > >>> tearing apart the RMW operation? > >> Indeed, this possibility seems much more likely than block being > >>on loop exit. > >> I will try to "hardcode" the load/store hoists into block 4 for > >>th
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > compile-time effects of the patch on that. Embedded folks may want to rhn > their favorite benchmark and report results as well. > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile > and run and the compile-time > effect where measurable (SPEC records on a second granularity) is within > one second per benchmark > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > Performance-wise I notice significant > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > sofar). I'll re-run with ref input now > and will post those numbers. > If you continue to see slowdowns, could you check with either no avx, or with -mprefer-avx128? The occational AVX256 instructions might be downclocking the CPU. But yes that would be a problem for this change on its own. 'Allan
Re: not computable at load time
On Mai 28 2018, Richard Biener wrote: > It means there's no relocation that can express the result of 's.f - &s.b' > and the frontend doesn't consider this a constant expression (likely because > of the conversion). Shouldn't the frontend notice that s.f - &s.b by itself is a constant? Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: not computable at load time
On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab wrote: >On Mai 28 2018, Richard Biener wrote: > >> It means there's no relocation that can express the result of 's.f - >&s.b' >> and the frontend doesn't consider this a constant expression (likely >because >> of the conversion). > >Shouldn't the frontend notice that s.f - &s.b by itself is a constant? Sure - the question is whether it is required to and why it doesn't. Richard. >Andreas.
Re: not computable at load time
> On May 28, 2018, at 12:03 PM, Richard Biener > wrote: > > On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab wrote: >> On Mai 28 2018, Richard Biener wrote: >> >>> It means there's no relocation that can express the result of 's.f - >> &s.b' >>> and the frontend doesn't consider this a constant expression (likely >> because >>> of the conversion). >> >> Shouldn't the frontend notice that s.f - &s.b by itself is a constant? > > Sure - the question is whether it is required to and why it doesn't. This is a test case in the C torture test suite. The only reason I can see for it being there is to verify that GCC resolves this as a compile time constant. The issue can be masked by changing the "long" in that test case to a ptrdiff_t, which eliminates the conversion. Should I do that? It would make the test pass, at the expense of masking this glitch. By the way, I get the same error if I change the "long" to a "long long" and them compile for 32-bit Intel. paul
Connect with me on LinkedIn to be on my safe supplier list we need your products
gcc@gcc.gnu.org Here are some people you may know and would like to connect with you. Reach out and build new connections. Andrea Jung Chairperson and CEO of Avon Group of companies. View Profile Connect Unsubscribe | Help You are receiving LinkedIn notification emails. This email was intended for gcc@gcc.gnu.org. Learn why we included this. © LinkedIn. Mailing address: Room 817, 18F, Building 18, #1 DiSheng Bei Road, Bejing Yizhuang Development Area, China. LinkedIn and the LinkedIn logo are registered trademarks of LinkedIn. //
Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
Ok, thanks for the clarification jakub. Umesg On Mon, May 7, 2018, 2:08 PM Jakub Jelinek wrote: > On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote: > > CCed Jakub, > > > > Agree that float division don't touch memory ,but fdiv result (stack > > > register ) is stored back to a memory i.e fResult . > > That doesn't really matter. It is stored to a stack spill slot, something > that doesn't have address taken and other code (e.g. in other threads) > can't > in a valid program access it. That is not considered memory for the > inline-asm, only objects that must live in memory count. > > Jakub >