Re: Enabling -ftree-slp-vectorize on -O2/Os
On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen wrote: > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > > compile-time effects of the patch on that. Embedded folks may want to rhn > > their favorite benchmark and report results as well. > > > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile > > and run and the compile-time > > effect where measurable (SPEC records on a second granularity) is within > > one second per benchmark > > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > > Performance-wise I notice significant > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > > sofar). I'll re-run with ref input now > > and will post those numbers. > > > If you continue to see slowdowns, could you check with either no avx, or with > -mprefer-avx128? The occational AVX256 instructions might be downclocking the > CPU. But yes that would be a problem for this change on its own. So here's a complete two-run with ref input, peak is -O2 -march=haswell -ftree-slp-vectorize. It confirms the slowdowns in SPEC FP but not in SPEC INT. You are right that using AVX256 (or AVX512) might be problematic on its own but that is not restricted to -O2 -ftree-slp-vectorize but also -O3. I will re-benchmark the SPEC FP part with -mprefer-avx128 to see if that is the issue. Note I did not use any -ffast-math flags in the experiment - those are as "unlikely" as using -march=native together with -O2. In theory another issue is the ability to debug code. Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 410.bwaves 13590362 37.5 * 13590370 36.7 * 410.bwaves 13590365 37.2 S 13590377 36.0 S 416.gamess 19580558 35.1 * 19580598 32.7 * 416.gamess 19580560 35.0 S 19580600 32.6 S 433.milc 9180331 27.8 S9180374 24.6 * 433.milc 9180331 27.8 *9180383 24.0 S 434.zeusmp 9100301 30.2 S9100301 30.2 * 434.zeusmp 9100301 30.2 *9100302 30.1 S 435.gromacs 7140300 23.8 S7140303 23.6 S 435.gromacs 7140298 23.9 *7140301 23.8 * 436.cactusADM 11950495 24.1 S 11950482 24.8 * 436.cactusADM 11950486 24.6 * 11950484 24.7 S 437.leslie3d 9400289 32.5 *9400288 32.6 * 437.leslie3d 9400301 31.3 S9400289 32.5 S 444.namd 8020301 26.6 *8020301 26.6 * 444.namd 8020301 26.6 S8020301 26.6 S 447.dealII 11440255 44.9 * 11440252 45.3 * 447.dealII 11440255 44.9 S 11440253 45.3 S 450.soplex 8340212 39.4 S8340213 39.1 S 450.soplex 8340211 39.5 *8340211 39.5 * 453.povray 5320111 47.9 S5320113 47.0 S 453.povray 5320111 48.0 *5320113 47.2 * 454.calculix 8250748 11.0 *8250835 9.88 * 454.calculix 8250748 11.0 S8250835 9.88 S 459.GemsFDTD10610324 32.8 S 10610324 32.8 S 459.GemsFDTD10610323 32.9 * 10610323 32.9 * 465.tonto9840449 21.9 S9840469 21.0 * 465.tonto9840446 22.0 *9840469 21.0 S 470.lbm 13740253 54.3 * 13740255 53.9 S 470.lbm 13740253 54.2 S 13740254 54.2 * 481.wrf 11170415 26.9 * 11170416 26.9 S 481.wrf 11170417 26.8 S 11170416 26.9 * 482.sphinx3 19490456 42.7 * 19490465 41.9 * 482.sphinx3 19490464 42.0 S 19490468 41.6 S Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 400.perlbench9770251 38.9 S9770252 38.8 S 400.perlbench9770250 39.1 *9770251 39.0 * 401.bzip29650399 24.2 S9650397 24.3 S 401.bzip29650395 24.4 *9650395 24.4 * 403.gcc
Re: not computable at load time
On Mon, May 28, 2018 at 8:34 PM Paul Koning wrote: > > On May 28, 2018, at 12:03 PM, Richard Biener > wrote: > > > > On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab wrote: > >> On Mai 28 2018, Richard Biener wrote: > >> > >>> It means there's no relocation that can express the result of 's.f - > >> &s.b' > >>> and the frontend doesn't consider this a constant expression (likely > >> because > >>> of the conversion). > >> > >> Shouldn't the frontend notice that s.f - &s.b by itself is a constant? > > > > Sure - the question is whether it is required to and why it doesn't. > This is a test case in the C torture test suite. The only reason > I can see for it being there is to verify that GCC resolves this as > a compile time constant. > The issue can be masked by changing the "long" in that test case to > a ptrdiff_t, which eliminates the conversion. Should I do that? > It would make the test pass, at the expense of masking this glitch. > By the way, I get the same error if I change the "long" to a "long long" > and them compile for 32-bit Intel. The testcase dates back to some repository creation rev. (egcs?) and I'm not sure we may compute the difference of addresses of structure members. So that GCC accepts this is probably not required. Joseph may have a definitive answer here. It might be a "regression" with the POINTER_MINUS_EXPR introduction. You can debug this with gdb when you break on 'pointer_diff'. For me on x86_64 this builds a POINTER_DIFF_EXPR: (char *) &s.f - &s.b of ptrdiff_t. That a conversion breaks the simplification tells us that somewhere we possibly fail to simplify it (maybe even during assembling). You might want to file a bug for the 'long long' issue. Richard. > paul
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Jeff Law writes: > Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. > When we left things I think we were trying to decide between > CLOBBER_HIGH and clobbering the appropriate subreg. The problem with > the latter is the dataflow we compute is inaccurate (overly pessimistic) > so that'd have to be fixed. The clobbered part of the register in this case is a high-part subreg, which is ill-formed for single registers. It would also be difficult to represent in terms of the mode, since there are no defined modes for what can be stored in the high part of an SVE register. For 128-bit SVE that mode would have zero bits. :-) I thought the alternative suggestion was instead to have: (set (reg:M X) (reg:M X)) when X is preserved in mode M but not in wider modes. But that seems like too much of a special case to me, both in terms of the source and the destination: - On the destination side, a SET normally provides something for later instructions to use, whereas here the effect is intended to be the opposite: the instruction has no effect at all on a value of mode M in X. As you say, this would pessimise df without specific handling. But I think all optimisations that look for the definition of a value would need to be taught to "look through" this set to find the real definition of (reg:M X) (or any value of a mode no larger than M in X). Very few passes use the df def-uses chains for this due its high cost. - On the source side, the instruction doesn't actually care what's in X, but nevertheless appears to use it. This means that most passes would need to be taught that a use of X on the rhs of a no-op SET is special and should usually be ignored. More fundamentally, it should be possible in RTL to express an instruction J that *does* read X in mode M and clobbers its high part. If we use the SET above to represent the clobber, and treat the rhs use as special, then presumably J would need two uses of X, one "dummy" one on the no-op SET and one "real" one on some other SET (or perhaps in a top-level USE). Having the number of uses determine this seems a bit awkward. IMO CLOBBER and SET have different semantics for good reason: CLOBBER represents an optimisation barrier for things that care about the value of a certain rtx object, while SET represents a productive effect or side-effect. The effect we want here is the same as a normal clobber, except that the clobber is mode-dependent. Thanks, Richard
Re: RISC-V problem with weak function references and -mcmodel=medany
Changing the code to something like this void f(void) __attribute__((__weak__)); void _start(void) { void (*g)(void) = f; if (g != 0) { (*g)(); } } doesn't work either, since this is optimized to .option nopic .text .align 1 .globl _start .type _start, @function _start: lla a5,f beqz a5,.L1 tail f .L1: ret .size _start, .-_start .weak f Why doesn't the RISC-V generate a trampoline code to call far functions? The non-optimized example code with "tail f" replaced by "jalr a5" links well: .option nopic .text .align 1 .globl _start .type _start, @function _start: addi sp,sp,-32 sd ra,24(sp) sd s0,16(sp) addi s0,sp,32 lla a5,f sd a5,-24(s0) ld a5,-24(s0) beqz a5,.L3 ld a5,-24(s0) jalr a5 .L3: nop ld ra,24(sp) ld s0,16(sp) addi sp,sp,32 jr ra .size _start, .-_start .weak f -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: not computable at load time
> On May 29, 2018, at 5:49 AM, Richard Biener > wrote: > ... > It might be a "regression" with the POINTER_MINUS_EXPR introduction. > You can debug this with gdb when you break on 'pointer_diff'. For me > on x86_64 this builds a POINTER_DIFF_EXPR: (char *) &s.f - &s.b > of ptrdiff_t. That a conversion breaks the simplification tells us that > somewhere we possibly fail to simplify it (maybe even during assembling). > > You might want to file a bug for the 'long long' issue. Done, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85974 paul
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Tue, May 29, 2018 at 11:32 AM Richard Biener wrote: > On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen > wrote: > > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > > > compile-time effects of the patch on that. Embedded folks may want to > rhn > > > their favorite benchmark and report results as well. > > > > > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 > compile > > > and run and the compile-time > > > effect where measurable (SPEC records on a second granularity) is within > > > one second per benchmark > > > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > > > Performance-wise I notice significant > > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > > > sofar). I'll re-run with ref input now > > > and will post those numbers. > > > > > If you continue to see slowdowns, could you check with either no avx, or > with > > -mprefer-avx128? The occational AVX256 instructions might be downclocking > the > > CPU. But yes that would be a problem for this change on its own. > So here's a complete two-run with ref input, peak is -O2 -march=haswell > -ftree-slp-vectorize. > It confirms the slowdowns in SPEC FP but not in SPEC INT. You are right > that using > AVX256 (or AVX512) might be problematic on its own but that is not > restricted to > -O2 -ftree-slp-vectorize but also -O3. I will re-benchmark the SPEC FP > part with > -mprefer-avx128 to see if that is the issue. Note I did not use any > -ffast-math flags in the > experiment - those are as "unlikely" as using -march=native together with > -O2. In theory > another issue is the ability to debug code. > Base Base BasePeak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -- -- - --- - - > 410.bwaves 13590362 37.5 * 13590370 36.7 >* > 410.bwaves 13590365 37.2 S 13590377 36.0 >S > 416.gamess 19580558 35.1 * 19580598 32.7 >* > 416.gamess 19580560 35.0 S 19580600 32.6 >S > 433.milc 9180331 27.8 S9180374 24.6 >* > 433.milc 9180331 27.8 *9180383 24.0 >S > 434.zeusmp 9100301 30.2 S9100301 30.2 >* > 434.zeusmp 9100301 30.2 *9100302 30.1 >S > 435.gromacs 7140300 23.8 S7140303 23.6 >S > 435.gromacs 7140298 23.9 *7140301 23.8 >* > 436.cactusADM 11950495 24.1 S 11950482 24.8 >* > 436.cactusADM 11950486 24.6 * 11950484 24.7 >S > 437.leslie3d 9400289 32.5 *9400288 32.6 >* > 437.leslie3d 9400301 31.3 S9400289 32.5 >S > 444.namd 8020301 26.6 *8020301 26.6 >* > 444.namd 8020301 26.6 S8020301 26.6 >S > 447.dealII 11440255 44.9 * 11440252 45.3 >* > 447.dealII 11440255 44.9 S 11440253 45.3 >S > 450.soplex 8340212 39.4 S8340213 39.1 >S > 450.soplex 8340211 39.5 *8340211 39.5 >* > 453.povray 5320111 47.9 S5320113 47.0 >S > 453.povray 5320111 48.0 *5320113 47.2 >* > 454.calculix 8250748 11.0 *8250835 9.88 > * > 454.calculix 8250748 11.0 S8250835 9.88 > S > 459.GemsFDTD10610324 32.8 S 10610324 32.8 >S > 459.GemsFDTD10610323 32.9 * 10610323 32.9 >* > 465.tonto9840449 21.9 S9840469 21.0 >* > 465.tonto9840446 22.0 *9840469 21.0 >S > 470.lbm 13740253 54.3 * 13740255 53.9 >S > 470.lbm 13740253 54.2 S 13740254 54.2 >* > 481.wrf 11170415 26.9 * 11170416 26.9 >S > 481.wrf 11170417 26.8 S 11170416 26.9 >* > 482.sphinx3 19490456 42.7 * 19490465 41.9 >* > 482.sphinx3 19490464 42.0 S 19490468 41.6 >S Numbers with -mprefer-avx128: Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - ---
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote: > > so the situation improves but isn't fully fixed (STLF issues maybe?) > That raises the question if it helps in these cases even in -O3? Anyway it doesn't look good for it. Did the binary size at least improve with prefer-avx128, or was that also worse or insignificant? 'Allan
[GSOC] LTO dump tool project
Hi, My exams have finally ended and I have started working on the GSOC project. I have forked GCC mirror (https://github.com/hrisearch/gcc) and created a option for dumping functions and variables used in IL. Please find the patch attached herewith. Regards, Hrishikesh diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt index 1083f9b..ae66c06 100644 --- a/gcc/lto/lang.opt +++ b/gcc/lto/lang.opt @@ -66,7 +66,11 @@ Whole program analysis (WPA) mode with number of parallel jobs specified. fdump LTO Var(flag_lto_dump) -Call the dump function +Call the dump function. + +fdump-lto-list +LTO Var(flag_lto_dump_list) +Call the dump function for variables and function in IL. fresolution= LTO Joined diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c index b6a8b45..5e4d069 100644 --- a/gcc/lto/lto-dump.c +++ b/gcc/lto/lto-dump.c @@ -38,4 +38,21 @@ along with GCC; see the file COPYING3. If not see void dump() { fprintf(stderr, "\nHello World!\n"); +} + +void dump_list() +{ + + fprintf (stderr, "Call Graph:\n\n"); + cgraph_node *cnode; + FOR_EACH_FUNCTION (cnode) +cnode->dump (stderr); +fprintf(stderr, "\n\n" ); + + fprintf (stderr, "Varpool:\n\n"); + varpool_node *vnode; +FOR_EACH_VARIABLE (vnode) + vnode->dump (stderr); +fprintf(stderr, "\n\n" ); + } \ No newline at end of file diff --git a/gcc/lto/lto-dump.h b/gcc/lto/lto-dump.h index 4a06217..5ee71c6 100644 --- a/gcc/lto/lto-dump.h +++ b/gcc/lto/lto-dump.h @@ -21,5 +21,6 @@ along with GCC; see the file COPYING3. If not see #define GCC_LTO_DUMP_H_ void dump(); +void dump_list(); #endif \ No newline at end of file diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 9c79242..93ef52b 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -3360,6 +3360,11 @@ lto_main (void) dump(); } + if (flag_lto_dump_list) + { +dump_list(); + } + timevar_stop (TV_PHASE_STREAM_IN); if (!seen_error ())
Re: [GSOC] LTO dump tool project
On 29 May 2018 at 22:33, Hrishikesh Kulkarni wrote: > Hi, > > My exams have finally ended and I have started working on the GSOC project. > I have forked GCC mirror (https://github.com/hrisearch/gcc) and > created a option for dumping functions and variables used in IL. > Please find the patch attached herewith. diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt index 1083f9b..ae66c06 100644 --- a/gcc/lto/lang.opt +++ b/gcc/lto/lang.opt @@ -66,7 +66,11 @@ Whole program analysis (WPA) mode with number of parallel jobs specified. fdump LTO Var(flag_lto_dump) -Call the dump function +Call the dump function. + +fdump-lto-list +LTO Var(flag_lto_dump_list) +Call the dump function for variables and function in IL. Instead of making separate options -fdump and -fdump-lto-list, would it be a good idea to make it a "sub option" to -fdump like lto1 -fdump,-l which would list all symbols within the LTO object file ? fresolution= LTO Joined diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c index b6a8b45..5e4d069 100644 --- a/gcc/lto/lto-dump.c +++ b/gcc/lto/lto-dump.c @@ -38,4 +38,21 @@ along with GCC; see the file COPYING3. If not see void dump() { fprintf(stderr, "\nHello World!\n"); +} + +void dump_list() +{ + + fprintf (stderr, "Call Graph:\n\n"); + cgraph_node *cnode; + FOR_EACH_FUNCTION (cnode) +cnode->dump (stderr); +fprintf(stderr, "\n\n" ); + + fprintf (stderr, "Varpool:\n\n"); + varpool_node *vnode; +FOR_EACH_VARIABLE (vnode) + vnode->dump (stderr); +fprintf(stderr, "\n\n" ); + } \ No newline at end of file Formatting nit - Add comments for the newly added functions. diff --git a/gcc/lto/lto-dump.h b/gcc/lto/lto-dump.h index 4a06217..5ee71c6 100644 --- a/gcc/lto/lto-dump.h +++ b/gcc/lto/lto-dump.h @@ -21,5 +21,6 @@ along with GCC; see the file COPYING3. If not see #define GCC_LTO_DUMP_H_ void dump(); +void dump_list(); #endif \ No newline at end of file diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 9c79242..93ef52b 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -3360,6 +3360,11 @@ lto_main (void) dump(); } + if (flag_lto_dump_list) + { +dump_list(); + } + Formatting nit - Avoid braces for single statement within if. Shouldn't fdump-lto-list be enabled only if fdump is enabled ? Thanks, Prathamesh timevar_stop (TV_PHASE_STREAM_IN); if (!seen_error ()) > > Regards, > Hrishikesh
Re: [GSOC] LTO dump tool project
On 05/29/2018 07:03 PM, Hrishikesh Kulkarni wrote: Hi, My exams have finally ended and I have started working on the GSOC project. I have forked GCC mirror (https://github.com/hrisearch/gcc) and created a option for dumping functions and variables used in IL. Please find the patch attached herewith. Hello. Good start. You branched the repository but your forget to push the commit you sent as attachment. Second issues is that the patch is not against GCC trunk, but against your local branch. Thus one can't apply that. About the options: - once you send a new functionality, it's fine to paste a sample output - for now I would remove the dummy flag_lto_dump flag - I would expect for -fdump-lto-list something like what nm does: $ nm main.o T main T mystring C pole Then of course you can add some level of verbosity which can print what you have. Would be also handy during the time to come up with some sorting, but it can wait. That said, the direction is fine. Please carry on. Thanks, Martin Regards, Hrishikesh
Re: [GSOC] LTO dump tool project
On 05/29/2018 07:17 PM, Prathamesh Kulkarni wrote: Shouldn't fdump-lto-list be enabled only if fdump is enabled ? The option is dummy, and eventually all do options will be moved to a separate tool called lto-dump. Thus all the prefixed '-fdump-lto-foo' will be replaced with -foo is guess. Martin
Re: [GSOC] LTO dump tool project
On 05/29/2018 07:38 PM, Martin Liška wrote: $ nm main.o T main T mystring C pole Or we can be inspired by readelf: $ readelf -s a.out [snip] Symbol table '.symtab' contains 74 entries: Num:Value Size TypeBind Vis Ndx Name 66: 00601250 0 NOTYPE GLOBAL DEFAULT 24 _end 67: 004004b043 FUNCGLOBAL DEFAULT 13 _start 68: 00601038 0 NOTYPE GLOBAL DEFAULT 24 __bss_start 69: 0040058270 FUNCGLOBAL DEFAULT 13 main 70: 0 FUNCGLOBAL DEFAULT UND fwrite@@GLIBC_2.2.5 Martin
Re: RISC-V ELF multilibs
On 05/26/2018 06:04 AM, Sebastian Huber wrote: Why is the default multilib and a variant identical? This is supposed to be a single multilib, with two names. We use MULTILIB_REUSE to map the two names to a single multilib. rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc ./rv64imafdc/lp64d/libgcc.a rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc ./rv64imafdc/lp64d/libgcc.a rohan:1032$ ./xgcc -B./ --print-libgcc ./libgcc.a rohan:1033$ So this is working right when the -march option is given, but not when no -march is given. I'd suggest a bug report so I can track this, if you haven't already filed one. Most variants include the C extension. Would it be possible to add -march=rv32g and -march=rv64g variants? The expectation is that most implementations will include the C extension. It reduces code size, improves performance, and I think I read somewhere that it takes only 400 gates to implement. It isn't practical to try to support every possible combination of architecture and ABI here, as there are too many possible combinations. But if there is a major RISC-V target that is rv32g or rv64g then we should consider it. You can of course define your own set of multilibs. Jim
Re: RISC-V problem with weak function references and -mcmodel=medany
On 05/28/2018 06:32 AM, Sebastian Huber wrote: I guess, that the resolution of the weak reference to the undefined symbol __deregister_frame_info somehow sets __deregister_frame_info to the absolute address 0 which is illegal in the following "call __deregister_frame_info"? Is this construct with weak references and a -mcmodel=medany supported on RISC-V at all? Yes. It works for me. Given a simple testcase extern void *__deregister_frame_info (const void *) __attribute__ ((weak)); void * foo; int main (void) { if (__deregister_frame_info) __deregister_frame_info (foo); return 0; } and compiling with -mcmodel=medany -O -Ttext=0x8000, I get 8158: 8097auipc ra,0x8 815c: ea8080e7jalr-344(ra) # 0 <_start-0x8000> for the weak call. It isn't clear what you are doing differently. Jim
Adding a libgcc file
Question about proper target maintainer procedures... The pdp11 target needs udivhi3 in libgcc. There's udivsi3, and it's really easy to tweak those files for HImode. And that works. Should I add the HI files to the libgcc directory, or under config/pdp11? There's nothing target specific about them, though I don't know of other targets that might want this. And would this change fall under target maintainer write privileges, or should I get the patch reviewed first? paul
Re: RISC-V problem with weak function references and -mcmodel=medany
On 05/29/2018 04:19 AM, Sebastian Huber wrote: Changing the code to something like this void f(void) __attribute__((__weak__)); void _start(void) { void (*g)(void) = f; if (g != 0) { (*g)(); } } This testcase works for me also, using -mcmodel=medany -O tmp.c -Ttext=0x8000 -nostdlib -nostartfiles. I need enough info to reproduce your problem in order to look at it. One thing you can try is adding -Wl,--noinhibit-exec, which will produce an executable even though there was a linker error, and then you can disassemble the binary to see what you have for the weak call. That might give a clue as to what is wrong. Why doesn't the RISC-V generate a trampoline code to call far functions? RISC-V is a new target. The answer to questions like this is that we haven't needed it yet, and hence haven't implemented it yet. But I don't see any need for trampolines to support a call to 0. We can reach anywhere in the low 32-bit address space with auipc/jalr. We can also use zero-relative addressing via the x0 register if necessary. We already have some linker relaxation support for that, but it doesn't seem to be triggering for this testcase. Jim
Re: RISC-V problem with weak function references and -mcmodel=medany
Hello Jim, - Am 29. Mai 2018 um 20:27 schrieb Jim Wilson j...@sifive.com: > On 05/28/2018 06:32 AM, Sebastian Huber wrote: >> I guess, that the resolution of the weak reference to the undefined >> symbol __deregister_frame_info somehow sets __deregister_frame_info to >> the absolute address 0 which is illegal in the following "call >> __deregister_frame_info"? Is this construct with weak references and a >> -mcmodel=medany supported on RISC-V at all? > > Yes. It works for me. Given a simple testcase > > extern void *__deregister_frame_info (const void *) > __attribute__ ((weak)); > void * foo; > int > main (void) > { > if (__deregister_frame_info) > __deregister_frame_info (foo); > return 0; > } > > and compiling with -mcmodel=medany -O -Ttext=0x8000, I get would you mind trying this with -Ttext=0x9000? Please have a look at: https://sourceware.org/bugzilla/show_bug.cgi?id=23244 https://sourceware.org/ml/binutils/2018-05/msg00296.html
Re: RISC-V problem with weak function references and -mcmodel=medany
On Tue, May 29, 2018 at 11:43 AM, Sebastian Huber wrote: > would you mind trying this with -Ttext=0x9000? This gives me for the weak call 9014: 7097 auipc ra,0x7 9018: fec080e7 jalr -20(ra) # 0 <__global_pointer$+0x6fffe7d4> > Please have a look at: > https://sourceware.org/bugzilla/show_bug.cgi?id=23244 > https://sourceware.org/ml/binutils/2018-05/msg00296.html OK. I'm still catching up on mailing lists after the US holiday weekend. Jim
Project Ranger
I'd like to introduce a project we've been working on for the past year an a half. The original project goal was to see if we could derived accurate range information from the IL without requiring much effort on the client side. The idea being that a pass could simply ask "what is the range of this ssa_name on this statement? " and the compiler would go figure it out. After lots of experimenting and prototyping the project evolved into what we are introducing here. I call it the Ranger. Existing range infrastructure in the compiler works from the top down. It walks through the IL computing all ranges and propagates these values forward in case they are needed. For the most part, other passes are required to either use global information, or process things in dominator order and work lockstep with EVRP to get more context sensitive ranges. The Ranger's model is purely on-demand, and designed to have minimal overhead. When a range is requested, the Ranger walking backwards through use-def chains to determine what ranges it can find relative to the name being requested. This means it only looks at statements which are deemed necessary to evaluate a range. This can result is some significant speedups when a pass is only interested in a few specific cases, as is demonstrated in some of the pass conversions we have performed. We have also implemented a "quick and dirty" vrp-like pass using the ranger to demonstrate that it can also handle much heavier duty range work and still perform well. The code is located on an svn branch *ssa-range*. It is based on trunk at revision *259405***circa mid April 2018. **The branch currently bootstraps with no regressions. The top level ranger class is called 'path_ranger' and is found in the file ssa-range.h. It has 4 primary API's: * bool path_range_edge (irange& r, tree name, edge e); * bool path_range_entry (irange& r, tree name, basic_block bb); * bool path_range_on_stmt (irange&r, tree name, gimple *g); * bool path_range_stmt (irange& r, gimple *g); This allows queries for a range on an edge, on entry to a block, as an operand on an specific statement, or to calculate the range of the result of a statement. There are no prerequisites to use it, simply create a path ranger and start using the API. There is even an available function which can be lightly called and doesn't require knowledge of the ranger: static inline bool on_demand_get_range_on_stmt (irange &r, tree ssa, gimple *stmt) { path_ranger ranger; return ranger.path_range_on_stmt (r, ssa, stmt); } The Ranger consists of 3 primary components: * range.[ch] - A new range representation purely based on wide-int , and allows ranges to consist of multiple non-overlapping sub-ranges. * range-op.[ch] - Implements centralized tree-code operations on the irange class (allowing adding, masking, multiplying, etc). * ssa-range*.[ch] - Files containing a set of classes which implement the Ranger. We have set up a project page on the wiki which contains documentation for the approach as well as some converted pass info and a to-do list here: https://gcc.gnu.org/wiki/AndrewMacLeod/Ranger We would like to include the ranger in GCC for this release, and realize there are still numerous things to be done before its ready for integration. It has been in prototype mode until now, so we have not prepared the code for a merge yet. No real performance analysis has been done on it either, but there is an integration page where you will find information about the 4 passes that have been converted to use the Ranger and the performance of those: https://gcc.gnu.org/wiki/AndrewMacLeod/RangerIntegration One of the primary tasks over the next month or two is to improve the sharing of operation code between the VRPs and the Ranger. We haven't done a very good job of that so far. This is included along with a list ofknown issues we need to look at on the to-do page: https://gcc.gnu.org/wiki/AndrewMacLeod/RangerToDo . The Ranger is far enough along now that we have confidence in both its approach and ability to perform, and would like to solicit feedback on what you think of it, any questions, possible uses, as well as potential requirements to integrate with trunk later this stage. Please visit the project page and have a look. We've put as much documentation, comments, and to-dos there as we could think of. We will try to keep it up-to-date. Andrew, Aldy and Jeff