date:20180528

Re: not computable at load time

2018-05-28 Thread Richard Biener

On Fri, May 25, 2018 at 8:05 PM Paul Koning  wrote:

> One of my testsuite failures for the pdp11 back end is
gcc.c-torture/compile/930326-1.c which is:

> struct
> {
>char a, b, f[3];
> } s;

> long i = s.f-&s.b;

> It fails with "error: initializer element is not computable at load time".
> I don't understand why because it seems to be a perfectly reasonable
> compile time constant; "load time" doesn't enter into the picture that
> I can see.

It means there's no relocation that can express the result of 's.f - &s.b'
and the frontend doesn't consider this a constant expression (likely because
of the conversion).

> If I replace "long" by "short" it works correctly.  So presumably it has
> something to do with the fact that Pmode == HImode.  But how that
translates
> into this failure I don't know.

>  paul

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-28 Thread Richard Biener

On Sat, May 26, 2018 at 12:36 PM Richard Biener 
wrote:

> On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensen <
li...@carewolf.com> wrote:
> >I brought this subject up earlier, and was told to suggest it again for
> >gcc 9,
> >so I have attached the preliminary changes.
> >
> >My studies have show that with generic x86-64 optimization it reduces
> >binary
> >size with around 0.5%, and when optimizing for x64 targets with SSE4 or
> >
> >better, it reduces binary size by 2-3% on average. The performance
> >changes are
> >negligible however*, and I haven't been able to detect changes in
> >compile time
> >big enough to penetrate general noise on my platform, but perhaps
> >someone has
> >a better setup for that?
> >
> >* I believe that is because it currently works best on non-optimized
> >code, it
> >is better at big basic blocks doing all kinds of things than tightly
> >written
> >inner loops.
> >
> >Anythhing else I should test or report?

> If you have access to SPEC CPU I'd like to see performance, size and
compile-time effects of the patch on that. Embedded folks may want to rhn
their favorite benchmark and report results as well.

So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile
and run and the compile-time
effect where measurable (SPEC records on a second granularity) is within
one second per benchmark
apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
Performance-wise I notice significant
slowdowns for SPEC FP and some for SPEC INT (I only did a train run
sofar).  I'll re-run with ref input now
and will post those numbers.

binary size numbers show an increase for 403.gcc, 433.milc 444.namd and
otherwise decreases or
no changes.  The changes are in the sub-percentage area of course.

Overall 12583 "BBs" are vectorized.  I need to improve that reporting for
multiple (non-)overlapping instances.

I realize that combining -O2 with -march=haswell might not be what people
do but I tried to increase
the number of vectorized BBs.

Richard.

> Richard.

> >Best regards
> >'Allan
> >
> >
> >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> >index beba295bef5..05851229354 100644
> >--- a/gcc/doc/invoke.texi
> >+++ b/gcc/doc/invoke.texi
> >@@ -7612,6 +7612,7 @@ also turns on the following optimization flags:
> > -fstore-merging @gol
> > -fstrict-aliasing @gol
> > -ftree-builtin-call-dce @gol
> >+-ftree-slp-vectorize @gol
> > -ftree-switch-conversion -ftree-tail-merge @gol
> > -fcode-hoisting @gol
> > -ftree-pre @gol
> >@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following
> >optimization flags:
> > -floop-interchange @gol
> > -floop-unroll-and-jam @gol
> > -fsplit-paths @gol
> >--ftree-slp-vectorize @gol
> > -fvect-cost-model @gol
> > -ftree-partial-pre @gol
> > -fpeel-loops @gol
> >@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is
> >
> >enabled by default at
> > @item -ftree-slp-vectorize
> > @opindex ftree-slp-vectorize
> >Perform basic block vectorization on trees. This flag is enabled by
> >default
> >at
> >-@option{-O3} and when @option{-ftree-vectorize} is enabled.
> >+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled.
> >
> > @item -fvect-cost-model=@var{model}
> > @opindex fvect-cost-model
> >diff --git a/gcc/opts.c b/gcc/opts.c
> >index 33efcc0d6e7..11027b847e8 100644
> >--- a/gcc/opts.c
> >+++ b/gcc/opts.c
> >@@ -523,6 +523,7 @@ static const struct default_options
> >default_options_table[] =
> > { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
> > { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
> > { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 },
> >+{ OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
> >
> > /* -O3 optimizations.  */
> >{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> >@@ -539,7 +540,6 @@ static const struct default_options
> >default_options_table[] =
> > { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 },
> > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
> > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> >-{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
> >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL,
> >VECT_COST_MODEL_DYNAMIC
> >},
> > { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
> > { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },

RISC-V problem with weak function references and -mcmodel=medany

2018-05-28 Thread Sebastian Huber


Hello,

I try to build a 64-bit RISC-V tool chain for RTEMS. RTEMS doesn't use 
virtual memory. The reference chips for 64-bit RISC-V such as FU540-C000 
locate the RAM at 0x8000_. This forces me to use -mcmodel=medany in 
64-bit mode. The ctrbegin.o contains this code (via crtstuff.c):


extern void *__deregister_frame_info (const void *)
 __attribute__ ((weak));
...
# 370 "libgcc/crtstuff.c"
static void __attribute__((used))
__do_global_dtors_aux (void)
{
  static _Bool completed;

  if (__builtin_expect (completed, 0))
    return;
# 413 "libgcc/crtstuff.c"
  deregister_tm_clones ();
# 423 "libgcc/crtstuff.c"
  if (__deregister_frame_info)
    __deregister_frame_info (__EH_FRAME_BEGIN__);



  completed = 1;
}

Which is:

    .text
    .align    1
    .type    __do_global_dtors_aux, @function
__do_global_dtors_aux:
    lbu    a5,completed.3298
    bnez    a5,.L22
    addi    sp,sp,-16
    sd    ra,8(sp)
    call    deregister_tm_clones
    lla    a5,__deregister_frame_info
    beqz    a5,.L17
    lla    a0,__EH_FRAME_BEGIN__
    call    __deregister_frame_info
.L17:
    ld    ra,8(sp)
    li    a5,1
    sb    a5,completed.3298,a4
    addi    sp,sp,16
    jr    ra
.L22:
    ret

If I link an executable I get this:

/opt/rtems/5/lib64/gcc/riscv64-rtems5/9.0.0/../../../../riscv64-rtems5/bin/ld: 
/opt/rtems/5/lib64/gcc/riscv64-rtems5/9.0.0/crtbegin.o: in function `.L0 ':
crtstuff.c:(.text+0x72): relocation truncated to fit: R_RISCV_CALL 
against undefined symbol `__deregister_frame_info'


I guess, that the resolution of the weak reference to the undefined 
symbol __deregister_frame_info somehow sets __deregister_frame_info to 
the absolute address 0 which is illegal in the following "call 
__deregister_frame_info"? Is this construct with weak references and a 
-mcmodel=medany supported on RISC-V at all?


If I change crtstuff.c like this using weak function definitions

diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 5e894455e16..770e3420c92 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -177,13 +177,24 @@ call_ ## FUNC 
(void)  \


 /* References to __register_frame_info and __deregister_frame_info should
    be weak in this file if at all possible.  */
-extern void __register_frame_info (const void *, struct object *)
- TARGET_ATTRIBUTE_WEAK;
+extern void __register_frame_info (const void *, struct object *) ;
+TARGET_ATTRIBUTE_WEAK void __register_frame_info (const void *unused, 
struct object *unused2)

+{
+   (void)unused;
+   (void)unused2;
+}
+
 extern void __register_frame_info_bases (const void *, struct object *,
 void *, void *)
  TARGET_ATTRIBUTE_WEAK;
-extern void *__deregister_frame_info (const void *)
- TARGET_ATTRIBUTE_WEAK;
+
+extern void *__deregister_frame_info (const void *);
+TARGET_ATTRIBUTE_WEAK void *__deregister_frame_info (const void *unused)
+{
+   (void)unused;
+   return 0;
+}
+
 extern void *__deregister_frame_info_bases (const void *)
TARGET_ATTRIBUTE_WEAK;
 extern void __do_global_ctors_1 (void);

then the example program links.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: PR80155: Code hoisting and register pressure

2018-05-28 Thread Richard Biener

On Sat, 26 May 2018, Bin.Cheng wrote:

> On Fri, May 25, 2018 at 5:54 PM, Richard Biener  wrote:
> > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law  wrote:
> >>On 05/25/2018 03:49 AM, Bin.Cheng wrote:
> >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
> >>>  wrote:
>  On 23 May 2018 at 18:37, Jeff Law  wrote:
> > On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
> >> On 23 May 2018 at 13:58, Richard Biener  wrote:
> >>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
> >>>
>  Hi,
>  I am trying to work on PR80155, which exposes a problem with
> >>code
>  hoisting and register pressure on a leading embedded benchmark
> >>for ARM
>  cortex-m7, where code-hoisting causes an extra register spill.
> 
>  I have attached two test-cases which (hopefully) are
> >>representative of
>  the original test-case.
>  The first one (trans_dfa.c) is bigger and somewhat similar to
> >>the
>  original test-case and trans_dfa_2.c is hand-reduced version of
>  trans_dfa.c. There's 2 spills caused with trans_dfa.c
>  and one spill with trans_dfa_2.c due to lesser amount of cases.
>  The test-cases in the PR are probably not relevant.
> 
>  Initially I thought the spill was happening because of "too many
>  hoistings" taking place in original test-case thus increasing
> >>the
>  register pressure, but it seems the spill is possibly caused
> >>because
>  expression gets hoisted out of a block that is on loop exit.
> 
>  For example, the following hoistings take place with
> >>trans_dfa_2.c:
> 
>  (1) Inserting expression in block 4 for code hoisting:
>  {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
> 
>  (2) Inserting expression in block 4 for code hoisting:
> >>{plus_expr,_4,1} (0006)
> 
>  (3) Inserting expression in block 4 for code hoisting:
>  {pointer_plus_expr,s_33,1} (0023)
> 
>  (4) Inserting expression in block 3 for code hoisting:
>  {pointer_plus_expr,s_33,1} (0023)
> 
>  The issue seems to be hoisting of (*tab + 1) which consists of
> >>first
>  two hoistings in block 4
>  from blocks 5 and 9, which causes the extra spill. I verified
> >>that by
>  disabling hoisting into block 4,
>  which resulted in no extra spills.
> 
>  I wonder if that's because the expression (*tab + 1) is getting
>  hoisted from blocks 5 and 9,
>  which are on loop exit ? So the expression that was previously
>  computed in a block on loop exit, gets hoisted outside that
> >>block
>  which possibly makes the allocator more defensive ? Similarly
>  disabling hoisting of expressions which appeared in blocks on
> >>loop
>  exit in original test-case prevented the extra spill. The other
>  hoistings didn't seem to matter.
> >>>
> >>> I think that's simply co-incidence.  The only thing that makes
> >>> a block that also exits from the loop special is that an
> >>> expression could be sunk out of the loop and hoisting (commoning
> >>> with another path) could prevent that.  But that isn't what is
> >>> happening here and it would be a pass ordering issue as
> >>> the sinking pass runs only after hoisting (no idea why exactly
> >>> but I guess there are cases where we want to prefer CSE over
> >>> sinking).  So you could try if re-ordering PRE and sinking helps
> >>> your testcase.
> >> Thanks for the suggestions. Placing sink pass before PRE works
> >> for both these test-cases! Sadly it still causes the spill for the
> >>benchmark -:(
> >> I will try to create a better approximation of the original
> >>test-case.
> >>>
> >>> What I do see is a missed opportunity to merge the successors
> >>> of BB 4.  After PRE we have
> >>>
> >>>  [local count: 159303558]:
> >>> :
> >>> pretmp_123 = *tab_37(D);
> >>> _87 = pretmp_123 + 1;
> >>> if (c_36 == 65)
> >>>   goto ; [34.00%]
> >>> else
> >>>   goto ; [66.00%]
> >>>
> >>>  [local count: 54163210]:
> >>> *tab_37(D) = _87;
> >>> _96 = MEM[(char *)s_57 + 1B];
> >>> if (_96 != 0)
> >>>   goto ; [89.00%]
> >>> else
> >>>   goto ; [11.00%]
> >>>
> >>>  [local count: 105140348]:
> >>> *tab_37(D) = _87;
> >>> _56 = MEM[(char *)s_57 + 1B];
> >>> if (_56 != 0)
> >>>   goto ; [89.00%]
> >>> else
> >>>   goto ; [11.00%]
> >>>
> >>> here at least the stores and loads can be hoisted.  Note this
> >>> may also point at the real issue of the code hoisting which is
> >>> tearing apart the RMW operation?
> >> Indeed, this possibility seems much more likely than block being
> >>on loop exit.
> >> I will try to "hardcode" the load/store hoists into block 4 for
> >>th

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-28 Thread Allan Sandfeld Jensen

On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote:
> compile-time effects of the patch on that. Embedded folks may want to rhn
> their favorite benchmark and report results as well.
> 
> So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile
> and run and the compile-time
> effect where measurable (SPEC records on a second granularity) is within
> one second per benchmark
> apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
> Performance-wise I notice significant
> slowdowns for SPEC FP and some for SPEC INT (I only did a train run
> sofar).  I'll re-run with ref input now
> and will post those numbers.
> 
If you continue to see slowdowns, could you check with either no avx, or with 
-mprefer-avx128? The occational AVX256 instructions might be downclocking the 
CPU. But yes that would be a problem for this change on its own.

'Allan

Re: not computable at load time

2018-05-28 Thread Andreas Schwab

On Mai 28 2018, Richard Biener  wrote:

> It means there's no relocation that can express the result of 's.f - &s.b'
> and the frontend doesn't consider this a constant expression (likely because
> of the conversion).

Shouldn't the frontend notice that s.f - &s.b by itself is a constant?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: not computable at load time

2018-05-28 Thread Richard Biener

On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab  wrote:
>On Mai 28 2018, Richard Biener  wrote:
>
>> It means there's no relocation that can express the result of 's.f -
>&s.b'
>> and the frontend doesn't consider this a constant expression (likely
>because
>> of the conversion).
>
>Shouldn't the frontend notice that s.f - &s.b by itself is a constant?

Sure - the question is whether it is required to and why it doesn't. 

Richard. 

>Andreas.

Re: not computable at load time

2018-05-28 Thread Paul Koning

> On May 28, 2018, at 12:03 PM, Richard Biener  
> wrote:
> 
> On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab  wrote:
>> On Mai 28 2018, Richard Biener  wrote:
>> 
>>> It means there's no relocation that can express the result of 's.f -
>> &s.b'
>>> and the frontend doesn't consider this a constant expression (likely
>> because
>>> of the conversion).
>> 
>> Shouldn't the frontend notice that s.f - &s.b by itself is a constant?
> 
> Sure - the question is whether it is required to and why it doesn't. 

This is a test case in the C torture test suite.  The only  reason 
I can see for it being there is to verify that GCC resolves this as 
a compile time constant.

The issue can be masked by changing the "long" in that test case to
a ptrdiff_t, which eliminates the conversion.  Should I do that?
It would make the test pass, at the expense of masking this glitch.

By the way, I get the same error if I change the "long" to a "long long"
and them compile for 32-bit Intel. 

paul

Connect with me on LinkedIn to be on my safe supplier list we need your products

2018-05-28 Thread Andrea Jung

 
 

 gcc@gcc.gnu.org  
   


Here are some people you may know and would like to connect with you. Reach out 
and build new connections.

   


  
   Andrea Jung
   Chairperson and CEO of Avon Group of companies.

   

   View Profile  Connect
   

   Unsubscribe   |   Help
   
You are receiving LinkedIn notification emails.

   This email was intended for gcc@gcc.gnu.org. Learn why we included this.

  
   © LinkedIn. Mailing address: Room 817, 18F, Building 18, #1 DiSheng Bei 
Road, Bejing Yizhuang Development Area, China. LinkedIn and the LinkedIn logo 
are registered trademarks of LinkedIn.

 
 
 //

Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-05-28 Thread Umesh Kalappa

Ok, thanks for the clarification jakub.

Umesg

On Mon, May 7, 2018, 2:08 PM Jakub Jelinek  wrote:

> On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote:
> > CCed Jakub,
>
> > > Agree that float division don't touch memory ,but fdiv  result (stack
> > > register ) is stored  back to a memory i.e fResult .
>
> That doesn't really matter.  It is stored to a stack spill slot, something
> that doesn't have address taken and other code (e.g. in other threads)
> can't
> in a valid program access it.  That is not considered memory for the
> inline-asm, only objects that must live in memory count.
>
> Jakub
>

Re: not computable at load time

Re: Enabling -ftree-slp-vectorize on -O2/Os

RISC-V problem with weak function references and -mcmodel=medany

Re: PR80155: Code hoisting and register pressure

Re: Enabling -ftree-slp-vectorize on -O2/Os

Re: not computable at load time

Re: not computable at load time

Re: not computable at load time

Connect with me on LinkedIn to be on my safe supplier list we need your products

Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

10 matches

Site Navigation

Mail list logo

Footer information