Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-29 Thread Richard Biener
On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen 
wrote:

> On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote:
> > compile-time effects of the patch on that. Embedded folks may want to
rhn
> > their favorite benchmark and report results as well.
> >
> > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006
compile
> > and run and the compile-time
> > effect where measurable (SPEC records on a second granularity) is within
> > one second per benchmark
> > apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
> > Performance-wise I notice significant
> > slowdowns for SPEC FP and some for SPEC INT (I only did a train run
> > sofar).  I'll re-run with ref input now
> > and will post those numbers.
> >
> If you continue to see slowdowns, could you check with either no avx, or
with
> -mprefer-avx128? The occational AVX256 instructions might be downclocking
the
> CPU. But yes that would be a problem for this change on its own.

So here's a complete two-run with ref input, peak is -O2 -march=haswell
-ftree-slp-vectorize.
It confirms the slowdowns in SPEC FP but not in SPEC INT.  You are right
that using
AVX256 (or AVX512) might be problematic on its own but that is not
restricted to
-O2 -ftree-slp-vectorize but also -O3.  I will re-benchmark the SPEC FP
part with
-mprefer-avx128 to see if that is the issue.  Note I  did not use any
-ffast-math flags in the
experiment - those are as "unlikely" as using -march=native together with
-O2.  In theory
another issue is the ability to debug code.

 Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  -
410.bwaves  13590362   37.5 *   13590370  36.7
  *
410.bwaves  13590365   37.2 S   13590377  36.0
  S
416.gamess  19580558   35.1 *   19580598  32.7
  *
416.gamess  19580560   35.0 S   19580600  32.6
  S
433.milc 9180331   27.8 S9180374  24.6
  *
433.milc 9180331   27.8 *9180383  24.0
  S
434.zeusmp   9100301   30.2 S9100301  30.2
  *
434.zeusmp   9100301   30.2 *9100302  30.1
  S
435.gromacs  7140300   23.8 S7140303  23.6
  S
435.gromacs  7140298   23.9 *7140301  23.8
  *
436.cactusADM   11950495   24.1 S   11950482  24.8
  *
436.cactusADM   11950486   24.6 *   11950484  24.7
  S
437.leslie3d 9400289   32.5 *9400288  32.6
  *
437.leslie3d 9400301   31.3 S9400289  32.5
  S
444.namd 8020301   26.6 *8020301  26.6
  *
444.namd 8020301   26.6 S8020301  26.6
  S
447.dealII  11440255   44.9 *   11440252  45.3
  *
447.dealII  11440255   44.9 S   11440253  45.3
  S
450.soplex   8340212   39.4 S8340213  39.1
  S
450.soplex   8340211   39.5 *8340211  39.5
  *
453.povray   5320111   47.9 S5320113  47.0
  S
453.povray   5320111   48.0 *5320113  47.2
  *
454.calculix 8250748   11.0 *8250835   9.88
*
454.calculix 8250748   11.0 S8250835   9.88
S
459.GemsFDTD10610324   32.8 S   10610324  32.8
  S
459.GemsFDTD10610323   32.9 *   10610323  32.9
  *
465.tonto9840449   21.9 S9840469  21.0
  *
465.tonto9840446   22.0 *9840469  21.0
  S
470.lbm 13740253   54.3 *   13740255  53.9
  S
470.lbm 13740253   54.2 S   13740254  54.2
  *
481.wrf 11170415   26.9 *   11170416  26.9
  S
481.wrf 11170417   26.8 S   11170416  26.9
  *
482.sphinx3 19490456   42.7 *   19490465  41.9
  *
482.sphinx3 19490464   42.0 S   19490468  41.6
  S

 Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  -
400.perlbench9770251   38.9 S9770252   38.8
S
400.perlbench9770250   39.1 *9770251   39.0
*
401.bzip29650399   24.2 S9650397   24.3
S
401.bzip29650395   24.4 *9650395   24.4
*
403.gcc  

Re: not computable at load time

2018-05-29 Thread Richard Biener
On Mon, May 28, 2018 at 8:34 PM Paul Koning  wrote:



> > On May 28, 2018, at 12:03 PM, Richard Biener
> 
wrote:
> >
> > On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab 
wrote:
> >> On Mai 28 2018, Richard Biener  wrote:
> >>
> >>> It means there's no relocation that can express the result of 's.f -
> >> &s.b'
> >>> and the frontend doesn't consider this a constant expression (likely
> >> because
> >>> of the conversion).
> >>
> >> Shouldn't the frontend notice that s.f - &s.b by itself is a constant?
> >
> > Sure - the question is whether it is required to and why it doesn't.

> This is a test case in the C torture test suite.  The only  reason
> I can see for it being there is to verify that GCC resolves this as
> a compile time constant.

> The issue can be masked by changing the "long" in that test case to
> a ptrdiff_t, which eliminates the conversion.  Should I do that?
> It would make the test pass, at the expense of masking this glitch.

> By the way, I get the same error if I change the "long" to a "long long"
> and them compile for 32-bit Intel.

The testcase dates back to some repository creation rev. (egcs?) and
I'm not sure we may compute the difference of addresses of structure
members.  So that GCC accepts this is probably not required.  Joseph
may have a definitive answer here.

It might be a "regression" with the POINTER_MINUS_EXPR introduction.
You can debug this with gdb when you break on 'pointer_diff'.  For me
on x86_64 this builds a POINTER_DIFF_EXPR: (char *) &s.f - &s.b
of ptrdiff_t.  That a conversion breaks the simplification tells us that
somewhere we possibly fail to simplify it (maybe even during assembling).

You might want to file a bug for the 'long long' issue.

Richard.


>  paul


Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP

2018-05-29 Thread Richard Sandiford
Jeff Law  writes:
> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff.
> When we left things I think we were trying to decide between
> CLOBBER_HIGH and clobbering the appropriate subreg.  The problem with
> the latter is the dataflow we compute is inaccurate (overly pessimistic)
> so that'd have to be fixed.

The clobbered part of the register in this case is a high-part subreg,
which is ill-formed for single registers.  It would also be difficult
to represent in terms of the mode, since there are no defined modes for
what can be stored in the high part of an SVE register.  For 128-bit
SVE that mode would have zero bits. :-)

I thought the alternative suggestion was instead to have:

   (set (reg:M X) (reg:M X))

when X is preserved in mode M but not in wider modes.  But that seems
like too much of a special case to me, both in terms of the source and
the destination:

- On the destination side, a SET normally provides something for later
  instructions to use, whereas here the effect is intended to be the
  opposite: the instruction has no effect at all on a value of mode M
  in X.  As you say, this would pessimise df without specific handling.
  But I think all optimisations that look for the definition of a value
  would need to be taught to "look through" this set to find the real
  definition of (reg:M X) (or any value of a mode no larger than M in X).
  Very few passes use the df def-uses chains for this due its high cost.

- On the source side, the instruction doesn't actually care what's in X,
  but nevertheless appears to use it.  This means that most passes would
  need to be taught that a use of X on the rhs of a no-op SET is special
  and should usually be ignored.

  More fundamentally, it should be possible in RTL to express an
  instruction J that *does* read X in mode M and clobbers its high part.
  If we use the SET above to represent the clobber, and treat the rhs use
  as special, then presumably J would need two uses of X, one "dummy" one
  on the no-op SET and one "real" one on some other SET (or perhaps in a
  top-level USE).  Having the number of uses determine this seems
  a bit awkward.

IMO CLOBBER and SET have different semantics for good reason: CLOBBER
represents an optimisation barrier for things that care about the value
of a certain rtx object, while SET represents a productive effect or
side-effect.  The effect we want here is the same as a normal clobber,
except that the clobber is mode-dependent.

Thanks,
Richard


Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Sebastian Huber

Changing the code to something like this

void f(void) __attribute__((__weak__));

void _start(void)
{
    void (*g)(void) = f;

    if (g != 0) {
    (*g)();
    }
}

doesn't work either, since this is optimized to

    .option nopic
    .text
    .align  1
    .globl  _start
    .type   _start, @function
_start:
    lla a5,f
    beqz    a5,.L1
    tail    f
.L1:
    ret
    .size   _start, .-_start
    .weak   f

Why doesn't the RISC-V generate a trampoline code to call far functions?

The non-optimized example code with "tail f" replaced by "jalr a5" links 
well:


    .option nopic
    .text
    .align  1
    .globl  _start
    .type   _start, @function
_start:
    addi    sp,sp,-32
    sd  ra,24(sp)
    sd  s0,16(sp)
    addi    s0,sp,32
    lla a5,f
    sd  a5,-24(s0)
    ld  a5,-24(s0)
    beqz    a5,.L3
    ld  a5,-24(s0)
    jalr    a5
.L3:
    nop
    ld  ra,24(sp)
    ld  s0,16(sp)
    addi    sp,sp,32
    jr  ra
    .size   _start, .-_start
    .weak   f

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: not computable at load time

2018-05-29 Thread Paul Koning



> On May 29, 2018, at 5:49 AM, Richard Biener  
> wrote:
> ...
> It might be a "regression" with the POINTER_MINUS_EXPR introduction.
> You can debug this with gdb when you break on 'pointer_diff'.  For me
> on x86_64 this builds a POINTER_DIFF_EXPR: (char *) &s.f - &s.b
> of ptrdiff_t.  That a conversion breaks the simplification tells us that
> somewhere we possibly fail to simplify it (maybe even during assembling).
> 
> You might want to file a bug for the 'long long' issue.

Done, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85974

paul



Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-29 Thread Richard Biener
On Tue, May 29, 2018 at 11:32 AM Richard Biener 
wrote:

> On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen 
> wrote:

> > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote:
> > > compile-time effects of the patch on that. Embedded folks may want to
> rhn
> > > their favorite benchmark and report results as well.
> > >
> > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006
> compile
> > > and run and the compile-time
> > > effect where measurable (SPEC records on a second granularity) is
within
> > > one second per benchmark
> > > apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
> > > Performance-wise I notice significant
> > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run
> > > sofar).  I'll re-run with ref input now
> > > and will post those numbers.
> > >
> > If you continue to see slowdowns, could you check with either no avx, or
> with
> > -mprefer-avx128? The occational AVX256 instructions might be
downclocking
> the
> > CPU. But yes that would be a problem for this change on its own.

> So here's a complete two-run with ref input, peak is -O2 -march=haswell
> -ftree-slp-vectorize.
> It confirms the slowdowns in SPEC FP but not in SPEC INT.  You are right
> that using
> AVX256 (or AVX512) might be problematic on its own but that is not
> restricted to
> -O2 -ftree-slp-vectorize but also -O3.  I will re-benchmark the SPEC FP
> part with
> -mprefer-avx128 to see if that is the issue.  Note I  did not use any
> -ffast-math flags in the
> experiment - those are as "unlikely" as using -march=native together with
> -O2.  In theory
> another issue is the ability to debug code.

>   Base Base   BasePeak Peak   Peak
> Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
> -- --  -  ---  -
  -
> 410.bwaves  13590362   37.5 *   13590370  36.7
>*
> 410.bwaves  13590365   37.2 S   13590377  36.0
>S
> 416.gamess  19580558   35.1 *   19580598  32.7
>*
> 416.gamess  19580560   35.0 S   19580600  32.6
>S
> 433.milc 9180331   27.8 S9180374  24.6
>*
> 433.milc 9180331   27.8 *9180383  24.0
>S
> 434.zeusmp   9100301   30.2 S9100301  30.2
>*
> 434.zeusmp   9100301   30.2 *9100302  30.1
>S
> 435.gromacs  7140300   23.8 S7140303  23.6
>S
> 435.gromacs  7140298   23.9 *7140301  23.8
>*
> 436.cactusADM   11950495   24.1 S   11950482  24.8
>*
> 436.cactusADM   11950486   24.6 *   11950484  24.7
>S
> 437.leslie3d 9400289   32.5 *9400288  32.6
>*
> 437.leslie3d 9400301   31.3 S9400289  32.5
>S
> 444.namd 8020301   26.6 *8020301  26.6
>*
> 444.namd 8020301   26.6 S8020301  26.6
>S
> 447.dealII  11440255   44.9 *   11440252  45.3
>*
> 447.dealII  11440255   44.9 S   11440253  45.3
>S
> 450.soplex   8340212   39.4 S8340213  39.1
>S
> 450.soplex   8340211   39.5 *8340211  39.5
>*
> 453.povray   5320111   47.9 S5320113  47.0
>S
> 453.povray   5320111   48.0 *5320113  47.2
>*
> 454.calculix 8250748   11.0 *8250835
9.88
> *
> 454.calculix 8250748   11.0 S8250835
9.88
> S
> 459.GemsFDTD10610324   32.8 S   10610324  32.8
>S
> 459.GemsFDTD10610323   32.9 *   10610323  32.9
>*
> 465.tonto9840449   21.9 S9840469  21.0
>*
> 465.tonto9840446   22.0 *9840469  21.0
>S
> 470.lbm 13740253   54.3 *   13740255  53.9
>S
> 470.lbm 13740253   54.2 S   13740254  54.2
>*
> 481.wrf 11170415   26.9 *   11170416  26.9
>S
> 481.wrf 11170417   26.8 S   11170416  26.9
>*
> 482.sphinx3 19490456   42.7 *   19490465  41.9
>*
> 482.sphinx3 19490464   42.0 S   19490468  41.6
>S

Numbers with -mprefer-avx128:

 Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  ---

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-29 Thread Allan Sandfeld Jensen
On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote:
>
> so the situation improves but isn't fully fixed (STLF issues maybe?)
> 

That raises the question if it helps in these cases even in -O3? 

Anyway it doesn't look good for it. Did the binary size at least improve with 
prefer-avx128, or was that also worse or insignificant?


'Allan




[GSOC] LTO dump tool project

2018-05-29 Thread Hrishikesh Kulkarni
Hi,

My exams have finally ended and I have started working on the GSOC project.
I have forked GCC mirror (https://github.com/hrisearch/gcc) and
created a option for dumping functions and variables used in IL.
Please find the patch attached herewith.

Regards,
Hrishikesh
diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
index 1083f9b..ae66c06 100644
--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -66,7 +66,11 @@ Whole program analysis (WPA) mode with number of parallel jobs specified.
 
 fdump
 LTO Var(flag_lto_dump)
-Call the dump function
+Call the dump function.
+
+fdump-lto-list
+LTO Var(flag_lto_dump_list)
+Call the dump function for variables and function in IL.
 
 fresolution=
 LTO Joined
diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c
index b6a8b45..5e4d069 100644
--- a/gcc/lto/lto-dump.c
+++ b/gcc/lto/lto-dump.c
@@ -38,4 +38,21 @@ along with GCC; see the file COPYING3.  If not see
 void dump()
 {
 	fprintf(stderr, "\nHello World!\n");
+}
+
+void dump_list()
+{
+
+	fprintf (stderr, "Call Graph:\n\n");
+	cgraph_node *cnode;
+	FOR_EACH_FUNCTION (cnode)
+cnode->dump (stderr);
+fprintf(stderr, "\n\n" );
+
+	fprintf (stderr, "Varpool:\n\n");
+	varpool_node *vnode;
+FOR_EACH_VARIABLE (vnode)
+	vnode->dump (stderr);
+fprintf(stderr, "\n\n" );
+
 }
\ No newline at end of file
diff --git a/gcc/lto/lto-dump.h b/gcc/lto/lto-dump.h
index 4a06217..5ee71c6 100644
--- a/gcc/lto/lto-dump.h
+++ b/gcc/lto/lto-dump.h
@@ -21,5 +21,6 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_LTO_DUMP_H_
 
 void dump();
+void dump_list();
 
 #endif
\ No newline at end of file
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 9c79242..93ef52b 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -3360,6 +3360,11 @@ lto_main (void)
 dump();
   }
 
+  if (flag_lto_dump_list)
+  {
+dump_list();
+  }
+
   timevar_stop (TV_PHASE_STREAM_IN);
 
   if (!seen_error ())


Re: [GSOC] LTO dump tool project

2018-05-29 Thread Prathamesh Kulkarni
On 29 May 2018 at 22:33, Hrishikesh Kulkarni  wrote:
> Hi,
>
> My exams have finally ended and I have started working on the GSOC project.
> I have forked GCC mirror (https://github.com/hrisearch/gcc) and
> created a option for dumping functions and variables used in IL.
> Please find the patch attached herewith.
diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
index 1083f9b..ae66c06 100644
--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -66,7 +66,11 @@ Whole program analysis (WPA) mode with number of
parallel jobs specified.

 fdump
 LTO Var(flag_lto_dump)
-Call the dump function
+Call the dump function.
+
+fdump-lto-list
+LTO Var(flag_lto_dump_list)
+Call the dump function for variables and function in IL.

Instead of making separate options -fdump and -fdump-lto-list,
would it be a good idea to make it a "sub option" to -fdump like
lto1 -fdump,-l which would list all symbols within the LTO object file ?

 fresolution=
 LTO Joined
diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c
index b6a8b45..5e4d069 100644
--- a/gcc/lto/lto-dump.c
+++ b/gcc/lto/lto-dump.c
@@ -38,4 +38,21 @@ along with GCC; see the file COPYING3.  If not see
 void dump()
 {
  fprintf(stderr, "\nHello World!\n");
+}
+
+void dump_list()
+{
+
+ fprintf (stderr, "Call Graph:\n\n");
+ cgraph_node *cnode;
+ FOR_EACH_FUNCTION (cnode)
+cnode->dump (stderr);
+fprintf(stderr, "\n\n" );
+
+ fprintf (stderr, "Varpool:\n\n");
+ varpool_node *vnode;
+FOR_EACH_VARIABLE (vnode)
+ vnode->dump (stderr);
+fprintf(stderr, "\n\n" );
+
 }
\ No newline at end of file
Formatting nit - Add comments for the newly added functions.

diff --git a/gcc/lto/lto-dump.h b/gcc/lto/lto-dump.h
index 4a06217..5ee71c6 100644
--- a/gcc/lto/lto-dump.h
+++ b/gcc/lto/lto-dump.h
@@ -21,5 +21,6 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_LTO_DUMP_H_

 void dump();
+void dump_list();

 #endif
\ No newline at end of file
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 9c79242..93ef52b 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -3360,6 +3360,11 @@ lto_main (void)
 dump();
   }

+  if (flag_lto_dump_list)
+  {
+dump_list();
+  }
+
Formatting nit - Avoid braces for single statement within if.
Shouldn't fdump-lto-list be enabled only if fdump is enabled  ?

Thanks,
Prathamesh

   timevar_stop (TV_PHASE_STREAM_IN);

   if (!seen_error ())
>
> Regards,
> Hrishikesh


Re: [GSOC] LTO dump tool project

2018-05-29 Thread Martin Liška

On 05/29/2018 07:03 PM, Hrishikesh Kulkarni wrote:

Hi,

My exams have finally ended and I have started working on the GSOC project.
I have forked GCC mirror (https://github.com/hrisearch/gcc) and
created a option for dumping functions and variables used in IL.
Please find the patch attached herewith.


Hello.

Good start. You branched the repository but your forget to push the commit
you sent as attachment. Second issues is that the patch is not against
GCC trunk, but against your local branch. Thus one can't apply that.

About the options:
- once you send a new functionality, it's fine to paste a sample output
- for now I would remove the dummy flag_lto_dump flag
- I would expect for -fdump-lto-list something like what nm does:

$ nm main.o
 T main
 T mystring
 C pole

Then of course you can add some level of verbosity which can print what you 
have.
Would be also handy during the time to come up with some sorting, but it can 
wait.

That said, the direction is fine. Please carry on.

Thanks,
Martin



Regards,
Hrishikesh



Re: [GSOC] LTO dump tool project

2018-05-29 Thread Martin Liška

On 05/29/2018 07:17 PM, Prathamesh Kulkarni wrote:

Shouldn't fdump-lto-list be enabled only if fdump is enabled  ?


The option is dummy, and eventually all do options will be moved
to a separate tool called lto-dump. Thus all the prefixed '-fdump-lto-foo'
will be replaced with -foo is guess.

Martin


Re: [GSOC] LTO dump tool project

2018-05-29 Thread Martin Liška

On 05/29/2018 07:38 PM, Martin Liška wrote:

$ nm main.o
 T main
 T mystring
 C pole


Or we can be inspired by readelf:

$ readelf -s a.out
[snip]
Symbol table '.symtab' contains 74 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
66: 00601250 0 NOTYPE  GLOBAL DEFAULT   24 _end
67: 004004b043 FUNCGLOBAL DEFAULT   13 _start
68: 00601038 0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
69: 0040058270 FUNCGLOBAL DEFAULT   13 main
70:  0 FUNCGLOBAL DEFAULT  UND fwrite@@GLIBC_2.2.5

Martin


Re: RISC-V ELF multilibs

2018-05-29 Thread Jim Wilson

On 05/26/2018 06:04 AM, Sebastian Huber wrote:

Why is the default multilib and a variant identical?


This is supposed to be a single multilib, with two names.  We use 
MULTILIB_REUSE to map the two names to a single multilib.


rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc
./rv64imafdc/lp64d/libgcc.a
rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc
./rv64imafdc/lp64d/libgcc.a
rohan:1032$ ./xgcc -B./ --print-libgcc
./libgcc.a
rohan:1033$

So this is working right when the -march option is given, but not when 
no -march is given.  I'd suggest a bug report so I can track this, if 
you haven't already filed one.



Most variants include the C extension. Would it be possible to add -march=rv32g 
and -march=rv64g variants?


The expectation is that most implementations will include the C 
extension.  It reduces code size, improves performance, and I think I 
read somewhere that it takes only 400 gates to implement.


It isn't practical to try to support every possible combination of 
architecture and ABI here, as there are too many possible combinations. 
But if there is a major RISC-V target that is rv32g or rv64g then we 
should consider it.  You can of course define your own set of multilibs.


Jim



Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Jim Wilson

On 05/28/2018 06:32 AM, Sebastian Huber wrote:
I guess, that the resolution of the weak reference to the undefined 
symbol __deregister_frame_info somehow sets __deregister_frame_info to 
the absolute address 0 which is illegal in the following "call 
__deregister_frame_info"? Is this construct with weak references and a 
-mcmodel=medany supported on RISC-V at all?


Yes.  It works for me.  Given a simple testcase

extern void *__deregister_frame_info (const void *)
 __attribute__ ((weak));
void * foo;
int
main (void)
{
  if (__deregister_frame_info)
__deregister_frame_info (foo);
  return 0;
}

and compiling with -mcmodel=medany -O -Ttext=0x8000, I get

8158:   8097auipc   ra,0x8
815c:   ea8080e7jalr-344(ra) # 0 
<_start-0x8000>


for the weak call.  It isn't clear what you are doing differently.

Jim


Adding a libgcc file

2018-05-29 Thread Paul Koning
Question about proper target maintainer procedures...

The pdp11 target needs udivhi3 in libgcc.  There's udivsi3, and it's really 
easy to tweak those files for HImode.  And that works.

Should I add the HI files to the libgcc directory, or under config/pdp11?  
There's nothing target specific about them, though I don't know of other 
targets that might want this.

And would this change fall under target maintainer write privileges, or should 
I get the patch reviewed first?

paul



Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Jim Wilson

On 05/29/2018 04:19 AM, Sebastian Huber wrote:

Changing the code to something like this

void f(void) __attribute__((__weak__));

void _start(void)
{
     void (*g)(void) = f;

     if (g != 0) {
     (*g)();
     }
}


This testcase works for me also, using -mcmodel=medany -O tmp.c 
-Ttext=0x8000 -nostdlib -nostartfiles.


I need enough info to reproduce your problem in order to look at it.

One thing you can try is adding -Wl,--noinhibit-exec, which will produce 
an executable even though there was a linker error, and then you can 
disassemble the binary to see what you have for the weak call.  That 
might give a clue as to what is wrong.



Why doesn't the RISC-V generate a trampoline code to call far functions?


RISC-V is a new target.  The answer to questions like this is that we 
haven't needed it yet, and hence haven't implemented it yet.  But I 
don't see any need for trampolines to support a call to 0.  We can reach 
anywhere in the low 32-bit address space with auipc/jalr.  We can also 
use zero-relative addressing via the x0 register if necessary.  We 
already have some linker relaxation support for that, but it doesn't 
seem to be triggering for this testcase.


Jim


Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Sebastian Huber
Hello Jim,

- Am 29. Mai 2018 um 20:27 schrieb Jim Wilson j...@sifive.com:

> On 05/28/2018 06:32 AM, Sebastian Huber wrote:
>> I guess, that the resolution of the weak reference to the undefined
>> symbol __deregister_frame_info somehow sets __deregister_frame_info to
>> the absolute address 0 which is illegal in the following "call
>> __deregister_frame_info"? Is this construct with weak references and a
>> -mcmodel=medany supported on RISC-V at all?
> 
> Yes.  It works for me.  Given a simple testcase
> 
> extern void *__deregister_frame_info (const void *)
>  __attribute__ ((weak));
> void * foo;
> int
> main (void)
> {
>   if (__deregister_frame_info)
> __deregister_frame_info (foo);
>   return 0;
> }
> 
> and compiling with -mcmodel=medany -O -Ttext=0x8000, I get

would you mind trying this with -Ttext=0x9000?

Please have a look at:

https://sourceware.org/bugzilla/show_bug.cgi?id=23244

https://sourceware.org/ml/binutils/2018-05/msg00296.html


Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Jim Wilson
On Tue, May 29, 2018 at 11:43 AM, Sebastian Huber
 wrote:
> would you mind trying this with -Ttext=0x9000?

This gives me for the weak call

9014: 7097  auipc ra,0x7
9018: fec080e7  jalr -20(ra) # 0 <__global_pointer$+0x6fffe7d4>

> Please have a look at:
> https://sourceware.org/bugzilla/show_bug.cgi?id=23244
> https://sourceware.org/ml/binutils/2018-05/msg00296.html

OK.  I'm still catching up on mailing lists after the US holiday weekend.

Jim


Project Ranger

2018-05-29 Thread Andrew MacLeod
I'd like to introduce a project we've been working on for the past year 
an a half.


The original project goal was to see if we could derived accurate range 
information from the IL without requiring much effort on the client 
side. The idea being that a pass could simply ask "what is the range of 
this ssa_name on this statement? "  and the compiler would go figure it out.


After lots of experimenting and prototyping the project evolved into 
what we are introducing here. I call it the Ranger.


Existing range infrastructure in the compiler works from the top down. 
It walks through the IL computing all ranges and propagates these values 
forward in case they are needed.  For the most part, other passes are 
required to either use global information, or process things in 
dominator order and work lockstep with EVRP to get more context 
sensitive ranges.


The Ranger's model is purely on-demand, and designed to have minimal 
overhead.   When a range is requested, the Ranger walking backwards 
through use-def chains to determine what ranges it can find relative to 
the name being requested.  This means it only looks at statements which 
are deemed necessary to evaluate a range.  This can result is some 
significant  speedups when a pass is only interested in a few specific 
cases, as is demonstrated in some of the pass conversions we have 
performed. We have also implemented a "quick and dirty" vrp-like pass 
using the ranger to demonstrate that it can also handle much heavier 
duty range work and still perform well.


The code is located on an svn branch *ssa-range*.  It is based on trunk 
at revision *259405***circa mid April 2018. **The branch currently 
bootstraps with no regressions.  The top level ranger class is called 
'path_ranger' and is found in the file ssa-range.h.  It has 4 primary API's:


 * bool path_range_edge (irange& r, tree name, edge e);
 * bool path_range_entry (irange& r, tree name, basic_block bb);
 * bool path_range_on_stmt (irange&r, tree name, gimple *g);
 * bool path_range_stmt (irange& r, gimple *g);

This allows queries for a range on an edge, on entry to a block, as an 
operand on an specific statement, or to calculate the range of the 
result of a statement.  There are no prerequisites to use it, simply 
create a path ranger and start using the API.   There is even an 
available function which can be lightly called and doesn't require 
knowledge of the ranger:


   static inline bool
   on_demand_get_range_on_stmt (irange &r, tree ssa, gimple *stmt)
   {
   path_ranger ranger;
   return ranger.path_range_on_stmt (r, ssa, stmt);
   }

The Ranger consists of 3 primary components:

 * range.[ch] - A new range representation purely based on wide-int ,
   and allows ranges to consist of multiple non-overlapping sub-ranges.
 * range-op.[ch] - Implements centralized tree-code operations on the
   irange class (allowing adding, masking, multiplying, etc).
 * ssa-range*.[ch]  - Files containing a set of classes which implement
   the Ranger.

We have set up a project page on the wiki which contains documentation 
for the approach as well as some converted pass info and a to-do list here:


https://gcc.gnu.org/wiki/AndrewMacLeod/Ranger

We would like to include the ranger in GCC for this release, and realize 
there are still numerous things to be done before its ready for 
integration. It has been in prototype mode until now,  so we have not 
prepared the code for a merge yet.  No real performance analysis has 
been done on it either, but there is an integration page where you will 
find information about the 4 passes that have been converted to use the 
Ranger and the performance of those:


https://gcc.gnu.org/wiki/AndrewMacLeod/RangerIntegration

One of the primary tasks over the next month or two is to improve the 
sharing of operation code between the VRPs and the Ranger. We haven't 
done a very good job of that so far.   This is included along with a 
list ofknown issues we need to look at on the to-do page:


https://gcc.gnu.org/wiki/AndrewMacLeod/RangerToDo .

The Ranger is far enough along now that we have confidence in both its 
approach and ability to perform, and would like to solicit feedback on 
what you think of it,  any questions, possible uses,  as well as 
potential requirements to integrate with trunk later this stage.


Please visit the project page and have a look.  We've put as much 
documentation, comments, and to-dos there as we could think of.  We will 
try to keep it up-to-date.


Andrew, Aldy and Jeff