SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
I've add pages comparing LLVM-3.2 and coming GCC 4.8 on http://vmakarov.fedorapeople.org/spec/. The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and x86-64 SPEC2000 under link named 2013. You can find these links at the bottom of the left frame. If you prefer email for reading the comparison, here is the copy of page accessible by link named 2013: Comparison of GCC and LLVM in 2013. This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2* which was released at the very end of 2012. As usually I am focused mostly on the compiler comparison as *optimizing* compilers on major platform x86/x86-64. I don't consider other aspects of the compilers as quality of debug information (especially in optimizations modes), supported languages, standards and extensions (e.g. OMP), supported targets and ABI, support of just-in-time compilation etc. This year I did the comparison using following major options equivalent with my point of view: o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2 o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8 I tried to decrease the number of graphs which are still too many. Therefore I removed data for -O0 -g and -Os from the graphs but still I post some data about these modes below. If you need exact numbers you should look at the tables from which the graphs were generated. I had to use -O0 for compilation of SPECInt2000 254.gap for both compilers as LLVM3.2 can not generate correct code in any optimization mode for this test. Here are my conclusions from analyzing the data: o LLVM made a regress in supported non-experimental languages which makes a performance comparison much harder for me. Earlier LLVM was able to use GCC frontends (although old ones) including Fortran front-end. Now *CLANG* driver when it processes Fortran programs just calls GCC Fortran compiler. So comparison of CLANG LLVM and GCC on SPECFP2000 has no sense (it would be just a comparison of GCC 4.8 and version of GCC standardly used on a given machine) although you can find such comparisons on the Internet (e.g. on phoronix.com) Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM backend instead of GCC backend) for generation of Fortran benchmarks by LLVM. Although CLANG made LLVM less dependent on GCC, *still LLVM is heavily dependent on GCC and more generally on other GNU projects* (GOLD, binutils etc). Industrial compilers (including Intel compilers, SUN studio compilers, OPEN64, Pathscale) usually support triad of languages C, C++, and Fortran. It is a pretty big investment to implement Fortran front-end especially with language-dependent optimizations. o The difference between LLVM and GCC on integer benchmarks is only about 8% for -O3 and 3-4% for 32- and 64-bit peak performance (when LTO is used by both compilers). On floating point benchmarks, the difference is 3% and 9% for -O3 correspondingly for 32- and 64-bit modes and 6% and 12% for the peak performance. To see a perspective, the performance difference between LLVM2.9 and GCC4.7 reached 20% (on SPECFP2000 in 32- and 64-bit modes for -O3). So *LLVM made a significant progress* with the performance point of view since 2.9 version. I believe such progress is achieved mostly because of a *new RA* introduced in LLVM 3.0 and *auto-vectorization*. By the way, although new LLVM RA is much better than the old one, I think it is a mistake that the new RA still does not use graph-coloring based RA which has a potential to improve performance even more o In 2011, I used LLVM with GCC front-end and showed that a *common opinion "LLVM is faster compiler than GCC" is a myth* when you compare compilers in modes generating the same code quality. It is still close to true for LLVM with CLANG front-end. For example, in case of 32-bit SPECInt2000 the code quality generated by GCC4.8 in -O1 mode is 16% better than one generated by LLVM3.2 in -O1 mode and 1% better than code generated by LLVM3.2 in -O2 mode, but GCC compiler in -O1 mode is 2% and 10% faster than LLVM3.2 correspondingly in -O1 and -O2 mode. It means that GCC -O1 is closer to CLANG LLVM3.2 -O2 with the performance and compiler speed point of view. Where GCC is really slower (2.5 times) than CLANG LLVM3.2 is in LTO mode. o *GCC has better code size optimizations (-Os)*, GCC4.8 generates in average 6-7% smaller code (text + data segments) of SPECInt2000 than LLVM3.2. o In widely used debugging mode (-O0 -g), GCC4.8 is only about 5% slower than LLVM3.2 but generates about 16% and 13% smaller and 18% and 10% faster SPECInt2000 code correspondingly in 32-bit and 64-bit mode. o Despite that LLVM supports many targets, LLVM is focused mostly on developments two of them x86/x86-64 and ARM. I see two supporting evidence for this thesis. One is that dragonegg supports only the two mention targets. You even can not benchmark SPEC
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On Thu, Feb 7, 2013 at 4:26 PM, Vladimir Makarov wrote: > I've add pages comparing LLVM-3.2 and coming GCC 4.8 on > http://vmakarov.fedorapeople.org/spec/. > > The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and > x86-64 SPEC2000 under link named 2013. You can find these links at the > bottom of the left frame. > > If you prefer email for reading the comparison, here is the copy of page > accessible by link named 2013: > > > Comparison of GCC and LLVM in 2013. > > This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2* > which was released at the very end of 2012. > > As usually I am focused mostly on the compiler comparison as > *optimizing* compilers on major platform x86/x86-64. I don't consider > other aspects of the compilers as quality of debug information > (especially in optimizations modes), supported languages, standards > and extensions (e.g. OMP), supported targets and ABI, support of > just-in-time compilation etc. > > This year I did the comparison using following major options > equivalent with my point of view: > > o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2 > o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8 On the web-page you say that you use -Ofast -fno-fast-math (because that is what LLVM does with -O4). For GCC that's equivalent to -O3 (well, apart from that you enable -flto). So you can as well say you tested -O3 -flto. For 32bit you used -mtune=corei7 -march=i686 - did you disable CPU features like SSE on purpose? Vectorization at -O3+ should have used those (though without -ffast-math FP vectorization is seriously restricted). It would be nice to see -O3 -ffast-math vs. whatever LLVM equivalent is available. Also note that for SPEC -funroll-loops helps GCC (yes ... we don't enable that by default at -O3, we probably should). I don't know whether LLVM with -O4 creates fat objects as we do (you can link them without -flto). If not, then for compile-time you should use -fno-fat-lto-objects. Does LLVM parallelize the LTO link stage? If so you should compare with -flto=jobserver or -flto=number-of-available-cores. If not you should compare with -flto-partition=none (that will save some I/O and processing time). As a general note - we don't pay much attention to SPEC 2000 performance these days but instead look at SPEC CPU 2006 ... Thanks for the comparison! Richard. > I tried to decrease the number of graphs which are still too many. > Therefore I removed data for -O0 -g and -Os from the graphs but still > I post some data about these modes below. If you need exact numbers > you should look at the tables from which the graphs were generated. > > I had to use -O0 for compilation of SPECInt2000 254.gap for both > compilers as LLVM3.2 can not generate correct code in any optimization > mode for this test. > > Here are my conclusions from analyzing the data: > > o LLVM made a regress in supported non-experimental languages which > makes a performance comparison much harder for me. Earlier LLVM was > able to use GCC frontends (although old ones) including Fortran > front-end. Now *CLANG* driver when it processes Fortran programs > just calls GCC Fortran compiler. So comparison of CLANG LLVM and > GCC on SPECFP2000 has no sense (it would be just a comparison of GCC > 4.8 and version of GCC standardly used on a given machine) although > you can find such comparisons on the Internet (e.g. on phoronix.com) > > Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM > backend instead of GCC backend) for generation of Fortran benchmarks > by LLVM. > > Although CLANG made LLVM less dependent on GCC, *still LLVM is > heavily dependent on GCC and more generally on other GNU projects* > (GOLD, binutils etc). Industrial compilers (including Intel > compilers, SUN studio compilers, OPEN64, Pathscale) usually support > triad of languages C, C++, and Fortran. It is a pretty big > investment to implement Fortran front-end especially with > language-dependent optimizations. > > > o The difference between LLVM and GCC on integer benchmarks is only > about 8% for -O3 and 3-4% for 32- and 64-bit peak performance (when > LTO is used by both compilers). On floating point benchmarks, the > difference is 3% and 9% for -O3 correspondingly for 32- and 64-bit > modes and 6% and 12% for the peak performance. > > To see a perspective, the performance difference between LLVM2.9 and > GCC4.7 reached 20% (on SPECFP2000 in 32- and 64-bit modes for -O3). > So *LLVM made a significant progress* with the performance point of > view since 2.9 version. > > I believe such progress is achieved mostly because of a *new RA* > introduced in LLVM 3.0 and *auto-vectorization*. By the way, > although new LLVM RA is much better than the old one, I think it is > a mistake that the new RA still does not use graph-coloring based RA > which has a potential to improve performance even more > > o In 2011,
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On 02/07/2013 11:09 AM, Richard Biener wrote: On Thu, Feb 7, 2013 at 4:26 PM, Vladimir Makarov wrote: I've add pages comparing LLVM-3.2 and coming GCC 4.8 on http://vmakarov.fedorapeople.org/spec/. The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and x86-64 SPEC2000 under link named 2013. You can find these links at the bottom of the left frame. If you prefer email for reading the comparison, here is the copy of page accessible by link named 2013: Comparison of GCC and LLVM in 2013. This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2* which was released at the very end of 2012. As usually I am focused mostly on the compiler comparison as *optimizing* compilers on major platform x86/x86-64. I don't consider other aspects of the compilers as quality of debug information (especially in optimizations modes), supported languages, standards and extensions (e.g. OMP), supported targets and ABI, support of just-in-time compilation etc. This year I did the comparison using following major options equivalent with my point of view: o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2 o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8 On the web-page you say that you use -Ofast -fno-fast-math (because that is what LLVM does with -O4). For GCC that's equivalent to -O3 (well, apart from that you enable -flto). So you can as well say you tested -O3 -flto. I guess -Ofast -fno-fast-math is not just -O3 but you are right it is pretty close. For 32bit you used -mtune=corei7 -march=i686 - did you disable CPU features like SSE on purpose? Vectorization at -O3+ should have used those (though without -ffast-math FP vectorization is seriously restricted). Yes, I did it on purpose. Some 32-bit Linux distributions (e.g. Fedora) uses this architecture. Another reason is that I'd like to see how good compilers work with fp stack (I got an impression that LLVM generates less fp stack regs shuffles, so I think we could improve regstack.c). Although it is a dying architecture and probably we should pay more attention to SSE architectures It would be nice to see -O3 -ffast-math vs. whatever LLVM equivalent is available. That was my first intention. Unfortunately, a few SPECFP tests do not generate expected results (and SPEC fails). Also note that for SPEC -funroll-loops helps GCC (yes ... we don't enable that by default at -O3, we probably should). I should try this too. My intention was to use most commonly known and used options with the point of view of average GCC user. I believe that GCC has more potential to improve code for specific tests by advanced user as it has much more options and parameters to control the optimizations I don't know whether LLVM with -O4 creates fat objects as we do (you can link them without -flto). If not, then for compile-time you should use -fno-fat-lto-objects. Please, see comments above. Does LLVM parallelize the LTO link stage? No, I don't think so. If so you should compare with -flto=jobserver or -flto=number-of-available-cores. If not you should compare with -flto-partition=none (that will save some I/O and processing time). I wanted to wrote about it. LTO is well parallelized. But I did not do this as my scripts uses CPU time. As a general note - we don't pay much attention to SPEC 2000 performance these days but instead look at SPEC CPU 2006 ... I'd like to but it still requires too much CPU time even on fastest machines. Besides IMHO, some SPEC2006 tests are too memory bound and does not see many optimizations potential. But may be next time. Thanks for the comparison! Richard. Richard, probably I am wrong, but I felt your frustration a bit. The bechmarking is evil. Every time I post some benchmark results, I always see comments about benchmark credibility, wrong comparisons (used options) etc. There is always some true in these comments. But I still believe they can help us too see where we stand and where we are moving. I am also a believer that GCC is much better and refined, and more reliable compiler working on evenly good on practically any platform and I prefer to work on GCC not LLVM (e.g. I thought about doing some RA work on LLVM long ago as their old RA was quite patetic but I preferred to work on GCC instead). And thanks for the comments.
Re: var-tracking wrt. leaf regs on sparc
On Wed, Feb 06, 2013 at 03:18:27PM -0500, David Miller wrote: > From: Eric Botcazou > Date: Wed, 06 Feb 2013 11:13:30 +0100 > > > I think testing crtl->uses_only_leaf_regs is sufficient here (and > > while you're at it, you could also test the value of > > HAVE_window_save, which can be 0 if -mflat is passed on the SPARC), > > so > > > > #ifdef HAVE_window_save > > if (HAVE_window_save && !crtl->uses_only_leaf_regs) > > { > > > > } > > #endif > > Yes, this works perfectly, Jakub any objections? Perhaps some progress, but not fully working. I guess you should start with deciding when the regs should be remapped. Consider even simple testcase like (-O2 -g -dA): int foo (int a, int b) { int c = a; int d = a + b; int e = a + b; return e; } Before *.vartrack, all debug_insn as well as normal insns refer to %i0 and %i1, before your patch some NOTE_INSN_VAR_LOCATION were referring to %o[01] registers, others to %i[01] registers, with your patch all refer to %i[01] registers. leaf_renumber_regs isn't performed on notes (so, neither NOTE_INSN_VAR_LOCATION nor NOTE_INSN_CALL_ARG_LOCATION are adjusted). Then supposedly somewhere in dwarf2out we do some adjustment, but still end up with d/e loclist of: .LLST2: .uaxword.LVL0-.Ltext0 ! Location list begin address (*.LLST2) .uaxword.LVL1-.Ltext0 ! Location list end address (*.LLST2) .uahalf 0x6 ! Location expression size .byte 0x88! DW_OP_breg24 .byte 0 ! sleb128 0 .byte 0x89! DW_OP_breg25 .byte 0 ! sleb128 0 .byte 0x22! DW_OP_plus .byte 0x9f! DW_OP_stack_value .uaxword.LVL1-.Ltext0 ! Location list begin address (*.LLST2) .uaxword.LFE0-.Ltext0 ! Location list end address (*.LLST2) .uahalf 0x1 ! Location expression size .byte 0x58! DW_OP_reg8 .uaxword0 ! Location list terminator begin (*.LLST2) .uaxword0 ! Location list terminator end (*.LLST2) where I'd expect breg8/breg9 instead. Jakub
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On Thu, Feb 7, 2013 at 11:09 AM, Richard Biener wrote: > Also note that for SPEC -funroll-loops helps GCC (yes ... we don't > enable that by default at -O3, we probably should). Richi, Are you suggesting enabling -funroll-loops by default at -O3? When I checked earlier this year, GCC was too aggressive with loop unrolling on non-numerically-intensive code and that option was not uniformly beneficial, unfortunately. It would be a good change if GCC's unrolling heuristics were better. Thanks, David
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On Thu, Feb 7, 2013 at 9:28 AM, David Edelsohn wrote: > On Thu, Feb 7, 2013 at 11:09 AM, Richard Biener > wrote: > >> Also note that for SPEC -funroll-loops helps GCC (yes ... we don't >> enable that by default at -O3, we probably should). > > Richi, > > Are you suggesting enabling -funroll-loops by default at -O3? When I > checked earlier this year, GCC was too aggressive with loop unrolling > on non-numerically-intensive code and that option was not uniformly > beneficial, unfortunately. It would be a good change if GCC's > unrolling heuristics were better. yes, GCC's unroller behavior is at two extremes -- tree level full unroller is too conservative, while rtl unroller is too aggressive (e.g, in cases where ICC unroll 2 times, GCC unrolls by 9 -- even when the body contains conditional branches and the trip count is a small constant (e.g. 100). For comparison. ICC turns on loop unrolling at O3 -- but it does it conservatively compared with GCC's behavior with -funroll-loops. Full unrolling is turned on at O2 and above for all ICC, LLCM, and GCC, but GCC's one is almost useless -- it does not allow any code growth at O2 which means it only kicks in for tiny loops with very small trip count. Both ICC and LLVM are more aggressive at O2. David > > Thanks, David
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
Hi Vladimir, thanks for these numbers. ... Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM backend instead of GCC backend) for generation of Fortran benchmarks by LLVM. ... I believe such progress is achieved mostly because of a *new RA* introduced in LLVM 3.0 and *auto-vectorization*. I don't think it can be auto-vectorization, because I forgot to turn on the LLVM auto-vectorizer in dragonegg-3.2 (oops!). Ciao, Duncan.
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On Thu, Feb 7, 2013 at 7:16 PM, Xinliang David Li wrote: > On Thu, Feb 7, 2013 at 9:28 AM, David Edelsohn wrote: >> On Thu, Feb 7, 2013 at 11:09 AM, Richard Biener >> wrote: >> >>> Also note that for SPEC -funroll-loops helps GCC (yes ... we don't >>> enable that by default at -O3, we probably should). >> >> Richi, >> >> Are you suggesting enabling -funroll-loops by default at -O3? When I >> checked earlier this year, GCC was too aggressive with loop unrolling >> on non-numerically-intensive code and that option was not uniformly >> beneficial, unfortunately. It would be a good change if GCC's >> unrolling heuristics were better. > > yes, GCC's unroller behavior is at two extremes -- tree level full > unroller is too conservative, while rtl unroller is too aggressive > (e.g, in cases where ICC unroll 2 times, GCC unrolls by 9 -- even when > the body contains conditional branches and the trip count is a small > constant (e.g. 100). > > For comparison. ICC turns on loop unrolling at O3 -- but it does it > conservatively compared with GCC's behavior with -funroll-loops. Full > unrolling is turned on at O2 and above for all ICC, LLCM, and GCC, but > GCC's one is almost useless -- it does not allow any code growth at O2 > which means it only kicks in for tiny loops with very small trip > count. Both ICC and LLVM are more aggressive at O2. I meant to enable the unroller at the GIMPLE level to the extent it is only with -funroll-loops. I agree the RTL level unroller is too aggressive. Richard. > David > >> >> Thanks, David
Re: var-tracking wrt. leaf regs on sparc
From: Jakub Jelinek Date: Thu, 7 Feb 2013 18:22:32 +0100 > Then supposedly somewhere in dwarf2out we do some adjustment, > but still end up with d/e loclist of: > .LLST2: > .uaxword.LVL0-.Ltext0 ! Location list begin address > (*.LLST2) > .uaxword.LVL1-.Ltext0 ! Location list end address (*.LLST2) > .uahalf 0x6 ! Location expression size > .byte 0x88! DW_OP_breg24 > .byte 0 ! sleb128 0 > .byte 0x89! DW_OP_breg25 > .byte 0 ! sleb128 0 > .byte 0x22! DW_OP_plus > .byte 0x9f! DW_OP_stack_value > .uaxword.LVL1-.Ltext0 ! Location list begin address > (*.LLST2) > .uaxword.LFE0-.Ltext0 ! Location list end address (*.LLST2) > .uahalf 0x1 ! Location expression size > .byte 0x58! DW_OP_reg8 > .uaxword0 ! Location list terminator begin (*.LLST2) > .uaxword0 ! Location list terminator end (*.LLST2) > where I'd expect breg8/breg9 instead. The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c: diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 06cfb18..765d5c5 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset, } } - regno = DWARF_FRAME_REGNUM (REGNO (reg)); + regno = REGNO (reg); +#ifdef LEAF_REG_REMAP + if (crtl->uses_only_leaf_regs) +{ + int leaf_reg = LEAF_REG_REMAP (regno); + if (leaf_reg != -1) + regno = (unsigned) leaf_reg; +} +#endif + regno = DWARF_FRAME_REGNUM (regno); if (!optimize && fde && (fde->drap_reg == regno || fde->vdrap_reg == regno))
Re: var-tracking wrt. leaf regs on sparc
On Thu, Feb 07, 2013 at 02:38:18PM -0500, David Miller wrote: > The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c: > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index 06cfb18..765d5c5 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset, > } > } > > - regno = DWARF_FRAME_REGNUM (REGNO (reg)); > + regno = REGNO (reg); > +#ifdef LEAF_REG_REMAP > + if (crtl->uses_only_leaf_regs) > +{ > + int leaf_reg = LEAF_REG_REMAP (regno); > + if (leaf_reg != -1) > + regno = (unsigned) leaf_reg; > +} > +#endif > + regno = DWARF_FRAME_REGNUM (regno); > >if (!optimize && fde >&& (fde->drap_reg == regno || fde->vdrap_reg == regno)) This and earlier patch are ok, if it bootstraps/regtests fine, and suitable ChangeLog entry is provided. Running gdb testsuite before and after wouldn't hurt though. Jakub
Re: var-tracking wrt. leaf regs on sparc
From: David Miller Date: Thu, 07 Feb 2013 14:38:18 -0500 (EST) > From: Jakub Jelinek > Date: Thu, 7 Feb 2013 18:22:32 +0100 > >> Then supposedly somewhere in dwarf2out we do some adjustment, >> but still end up with d/e loclist of: >> .LLST2: >> .uaxword.LVL0-.Ltext0 ! Location list begin address >> (*.LLST2) >> .uaxword.LVL1-.Ltext0 ! Location list end address (*.LLST2) >> .uahalf 0x6 ! Location expression size >> .byte 0x88! DW_OP_breg24 >> .byte 0 ! sleb128 0 >> .byte 0x89! DW_OP_breg25 >> .byte 0 ! sleb128 0 >> .byte 0x22! DW_OP_plus >> .byte 0x9f! DW_OP_stack_value >> .uaxword.LVL1-.Ltext0 ! Location list begin address >> (*.LLST2) >> .uaxword.LFE0-.Ltext0 ! Location list end address (*.LLST2) >> .uahalf 0x1 ! Location expression size >> .byte 0x58! DW_OP_reg8 >> .uaxword0 ! Location list terminator begin (*.LLST2) >> .uaxword0 ! Location list terminator end (*.LLST2) >> where I'd expect breg8/breg9 instead. > > The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c: So the combined patch is below, any objections? Here is the testsuite diff: @@ -155,8 +148,8 @@ FAIL: gcc.dg/guality/vla-2.c -O2 -flto === gcc Summary === -# of expected passes 2128 -# of unexpected failures 122 +# of expected passes 2135 +# of unexpected failures 115 # of unexpected successes 31 # of expected failures 17 # of unsupported tests 136 This is undoubtedly an improvement. gcc/ 2013-02-07 David S. Miller * dwarf2out.c (based_loc_descr): Perform leaf register remapping on 'reg'. * var-tracking.c (vt_add_function_parameter): Test the presence of HAVE_window_save properly and do not remap argument registers when we have a leaf function. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 06cfb18..765d5c5 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset, } } - regno = DWARF_FRAME_REGNUM (REGNO (reg)); + regno = REGNO (reg); +#ifdef LEAF_REG_REMAP + if (crtl->uses_only_leaf_regs) +{ + int leaf_reg = LEAF_REG_REMAP (regno); + if (leaf_reg != -1) + regno = (unsigned) leaf_reg; +} +#endif + regno = DWARF_FRAME_REGNUM (regno); if (!optimize && fde && (fde->drap_reg == regno || fde->vdrap_reg == regno)) diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index 714acb69..0db1562 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -9502,31 +9502,34 @@ vt_add_function_parameter (tree parm) /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers. If the target machine has an explicit window save instruction, the actual entry value is the corresponding OUTGOING_REGNO instead. */ - if (REG_P (incoming) - && HARD_REGISTER_P (incoming) - && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) + if (HAVE_window_save && !crtl->uses_only_leaf_regs) { - parm_reg_t p; - p.incoming = incoming; - incoming - = gen_rtx_REG_offset (incoming, GET_MODE (incoming), - OUTGOING_REGNO (REGNO (incoming)), 0); - p.outgoing = incoming; - vec_safe_push (windowed_parm_regs, p); -} - else if (MEM_P (incoming) - && REG_P (XEXP (incoming, 0)) - && HARD_REGISTER_P (XEXP (incoming, 0))) -{ - rtx reg = XEXP (incoming, 0); - if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + if (REG_P (incoming) + && HARD_REGISTER_P (incoming) + && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) { parm_reg_t p; - p.incoming = reg; - reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); - p.outgoing = reg; + p.incoming = incoming; + incoming + = gen_rtx_REG_offset (incoming, GET_MODE (incoming), + OUTGOING_REGNO (REGNO (incoming)), 0); + p.outgoing = incoming; vec_safe_push (windowed_parm_regs, p); - incoming = replace_equiv_address_nv (incoming, reg); + } + else if (MEM_P (incoming) + && REG_P (XEXP (incoming, 0)) + && HARD_REGISTER_P (XEXP (incoming, 0))) + { + rtx reg = XEXP (incoming, 0); + if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + { + parm_reg_t p; + p.incoming = reg; + reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); + p.outgoing = reg; + vec_safe_push (windowed_parm_regs, p); + incoming = replace_equiv_address_nv (incoming, reg); + } } } #endif
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On Thu, Feb 07, 2013 at 07:51:20PM +0100, Duncan Sands wrote: > Hi Vladimir, thanks for these numbers. > > ... >>Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM >>backend instead of GCC backend) for generation of Fortran benchmarks >>by LLVM. > ... >>I believe such progress is achieved mostly because of a *new RA* >>introduced in LLVM 3.0 and *auto-vectorization*. > > I don't think it can be auto-vectorization, because I forgot to turn on the > LLVM auto-vectorizer in dragonegg-3.2 (oops!). > Duncan, I was under the impression that the vectorization in llvm 3.2 was still rather experimental compared to the current state in llvm 3.3svn (where it is now enabled for clang at -O3). Perhaps it would be more fruitful to compare gcc trunk and llvm svn. Jack ps Regarding LTO, my understanding is that, at least for darwin, the LTO in llvm is still primarily limited to dead-code elimination. When I last asked about this, Chris Lattner said that they felt more was to be gained from improving the IPA in the compiler than implementing the rest of LTO. Of course that was a couple years back. > Ciao, Duncan.
Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64
On 02/07/2013 01:51 PM, Duncan Sands wrote: Hi Vladimir, thanks for these numbers. ... Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM backend instead of GCC backend) for generation of Fortran benchmarks by LLVM. ... I believe such progress is achieved mostly because of a *new RA* introduced in LLVM 3.0 and *auto-vectorization*. I don't think it can be auto-vectorization, because I forgot to turn on the LLVM auto-vectorizer in dragonegg-3.2 (oops!). Thanks for pointing this out. I'll correct this. I checked it is not switched on for CLANG too. As I understand this stuff is experimental. I see on my small benchmarks that sometimes it generates wrong code and in many cases slower code too.
Re: var-tracking wrt. leaf regs on sparc
From: Jakub Jelinek Date: Thu, 7 Feb 2013 20:43:32 +0100 > This and earlier patch are ok, if it bootstraps/regtests fine, and suitable > ChangeLog entry is provided. > Running gdb testsuite before and after wouldn't hurt though. I've done all of this, and committed to trunk and the gcc-4.7 branch, thanks. In looking at the remaining failures, several have to do with an early clobber if the first incoming argument register. The issue is that this is where return values are placed, so we run into a situation where that incoming argument value can't be reconstituted in any way by the variable tracking code and thus gdb says that it has been optimized out. Many non-x86 cpus are going to run into this problem. For example, from pr36728-1.c: foo: save%sp, -96, %sp add %sp, -40, %sp mov 2, %g2 add %sp, 123, %g1 mov 25, %g4 and %g1, -32, %g1 sethi %hi(b), %g3 st %g2, [%g1] ld [%fp+92], %g2 nop ld [%g1], %i0 add %g2, 14, %g2 and %g2, -8, %g2 sub %sp, %g2, %sp stb %g4, [%sp+96] add %sp, 96, %g2 sethi %hi(a), %g4 nop return %i7+8 nop Here %i0 is written early, and then the tests can't view 'arg1' properly later in the function. Also, I noticed that calculation of the on-stack address of values with alignment regressed in gcc-4.8 vs. gcc-4.7 Again, in pr36728-1.c, 'y' can be printed properly in gcc-4.7 but in gcc-4.8 it cannot. I think it might be getting the base register wrong, I'll look more deeply if I get a chance.
Auxilio Tengo un Socio
¡No deje que la lucha de egos y roles interfieran con sus intereses! Aprenda las inigualables ventajas de cimentar sus relaciones profesionales. ¡AUXILIO! TENGO UN ¡SOCIO! ¿Dónde se llevará a cabo? Lugar: Su computadora. Fecha: 11 de Marzo de 2013. Duración: 3 Horas. Horario: De 10:00 a.m a 1:00 p.m. (Hora del Centro de México). Las relaciones humanas son complejas y complicadas… más aún cuando en ellas intervienen intereses económicos o profesionales de por medio. Un gran porcentaje de fracasos en organizaciones y empresas se deriva conflictos y mala comunicación entre sus socios, que en la mayoría de los casos llevan a la ruina proyectos prometedores a corto, mediano y largo plazo. Mediante este práctico seminario conozca, analice e implemente técnicas que permitirán a su negocio crecer asegurando sustancialmente los aspectos de las relaciones humanas, psicológicas y jurídicas, claves para lograr sociedades satisfechas y aptas para alcanzar armoniosamente las llaves del éxito. Aprenda a construir relaciones profesionales duraderas y redituables que proporcionen estabilidad a su negocio. Adquiera el folleto completo y sin compromiso, sólo responda este correo con los siguientes datos: Nombre(imprescindible) : Empresa : Teléfonos (imprescindible): E-mail: g...@gnu.org o Comuníquese a nuestro Centro Nacional de Atencion Telefonica: 01 - 800 - 212 - 9393 Reciba un muy cordial saludo Lic. Areliz Massanges Líder de Proyectos Para darte de baja y no recibir ningún tema de nuestra empresa, envíe un correo con asunto Nomasinfo
Marking nodes as addressable
I have a GIMPLE_CALL statement and I want to mark the left-hand-side value as being addressable (mark_addressable()). I am trying to force the result to be stored on the stack, and not in a register. I know the return of a call on an 64bit x86 is passed back to the caller in the rax register. I want the return value to be immediately moved onto the stack from rax after the caller resumes execution. When I do mark the LHS of the call as being addressable, the ssa-expansion fails, as the updated node is not in the var_partition when get_rtx_for_ssa_name() is called. How can I tease the return of a caller to be stored on the stack, in a temporary variable, instead of lying around in a register, or being passed to other free registers? -Matt