SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Vladimir Makarov
I've add pages comparing LLVM-3.2 and coming GCC 4.8 on 
http://vmakarov.fedorapeople.org/spec/.


The pages are accessible by links named GCC-LLVM comparison, 2013, x86 
and x86-64 SPEC2000 under link named 2013. You can find these links at 
the bottom of the left frame.


If you prefer email for reading the comparison, here is the copy of page 
accessible by link named 2013:



Comparison of GCC and LLVM in 2013.

This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2*
which was released at the very end of 2012.

As usually I am focused mostly on the compiler comparison as
*optimizing* compilers on major platform x86/x86-64.  I don't consider
other aspects of the compilers as quality of debug information
(especially in optimizations modes), supported languages, standards
and extensions (e.g. OMP), supported targets and ABI, support of
just-in-time compilation etc.

This year I did the comparison using following major options
equivalent with my point of view:

o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2
o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8

I tried to decrease the number of graphs which are still too many.
Therefore I removed data for -O0 -g and -Os from the graphs but still
I post some data about these modes below.  If you need exact numbers
you should look at the tables from which the graphs were generated.

I had to use -O0 for compilation of SPECInt2000 254.gap for both
compilers as LLVM3.2 can not generate correct code in any optimization
mode for this test.

Here are my conclusions from analyzing the data:

o LLVM made a regress in supported non-experimental languages which
  makes a performance comparison much harder for me.  Earlier LLVM was
  able to use GCC frontends (although old ones) including Fortran
  front-end.  Now *CLANG* driver when it processes Fortran programs
  just calls GCC Fortran compiler.  So comparison of CLANG LLVM and
  GCC on SPECFP2000 has no sense (it would be just a comparison of GCC
  4.8 and version of GCC standardly used on a given machine) although
  you can find such comparisons on the Internet (e.g. on phoronix.com)

  Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM
  backend instead of GCC backend) for generation of Fortran benchmarks
  by LLVM.

  Although CLANG made LLVM less dependent on GCC, *still LLVM is
  heavily dependent on GCC and more generally on other GNU projects*
  (GOLD, binutils etc).  Industrial compilers (including Intel
  compilers, SUN studio compilers, OPEN64, Pathscale) usually support
  triad of languages C, C++, and Fortran.  It is a pretty big
  investment to implement Fortran front-end especially with
  language-dependent optimizations.


o The difference between LLVM and GCC on integer benchmarks is only
  about 8% for -O3 and 3-4% for 32- and 64-bit peak performance (when
  LTO is used by both compilers).  On floating point benchmarks, the
  difference is 3% and 9% for -O3 correspondingly for 32- and 64-bit
  modes and 6% and 12% for the peak performance.

  To see a perspective, the performance difference between LLVM2.9 and
  GCC4.7 reached 20% (on SPECFP2000 in 32- and 64-bit modes for -O3).
  So *LLVM made a significant progress* with the performance point of
  view since 2.9 version.

  I believe such progress is achieved mostly because of a *new RA*
  introduced in LLVM 3.0 and *auto-vectorization*.  By the way,
  although new LLVM RA is much better than the old one, I think it is
  a mistake that the new RA still does not use graph-coloring based RA
  which has a potential to improve performance even more

o In 2011, I used LLVM with GCC front-end and showed that a *common
  opinion "LLVM is faster compiler than GCC" is a myth* when you
  compare compilers in modes generating the same code quality.

  It is still close to true for LLVM with CLANG front-end.  For
  example, in case of 32-bit SPECInt2000 the code quality generated by
  GCC4.8 in -O1 mode is 16% better than one generated by LLVM3.2 in
  -O1 mode and 1% better than code generated by LLVM3.2 in -O2 mode,
  but GCC compiler in -O1 mode is 2% and 10% faster than LLVM3.2
  correspondingly in -O1 and -O2 mode.  It means that GCC -O1 is
  closer to CLANG LLVM3.2 -O2 with the performance and compiler speed
  point of view.

  Where GCC is really slower (2.5 times) than CLANG LLVM3.2 is in LTO
  mode.

o *GCC has better code size optimizations (-Os)*, GCC4.8 generates in
  average 6-7% smaller code (text + data segments) of SPECInt2000 than
  LLVM3.2.

o In widely used debugging mode (-O0 -g), GCC4.8 is only about 5%
  slower than LLVM3.2 but generates about 16% and 13% smaller and 18%
  and 10% faster SPECInt2000 code correspondingly in 32-bit and 64-bit
  mode.

o Despite that LLVM supports many targets, LLVM is focused mostly on
  developments two of them x86/x86-64 and ARM.  I see two supporting
  evidence for this thesis.

  One is that dragonegg supports only the two mention targets.  You
  even can not benchmark SPEC

Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Richard Biener
On Thu, Feb 7, 2013 at 4:26 PM, Vladimir Makarov  wrote:
> I've add pages comparing LLVM-3.2 and coming GCC 4.8 on
> http://vmakarov.fedorapeople.org/spec/.
>
> The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and
> x86-64 SPEC2000 under link named 2013. You can find these links at the
> bottom of the left frame.
>
> If you prefer email for reading the comparison, here is the copy of page
> accessible by link named 2013:
>
>
> Comparison of GCC and LLVM in 2013.
>
> This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2*
> which was released at the very end of 2012.
>
> As usually I am focused mostly on the compiler comparison as
> *optimizing* compilers on major platform x86/x86-64.  I don't consider
> other aspects of the compilers as quality of debug information
> (especially in optimizations modes), supported languages, standards
> and extensions (e.g. OMP), supported targets and ABI, support of
> just-in-time compilation etc.
>
> This year I did the comparison using following major options
> equivalent with my point of view:
>
> o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2
> o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8

On the web-page you say that you use -Ofast -fno-fast-math (because
that is what LLVM does with -O4).  For GCC that's equivalent to -O3
(well, apart from that you enable -flto).  So you can as well say you
tested -O3 -flto.

For 32bit you used -mtune=corei7 -march=i686 - did you disable
CPU features like SSE on purpose?  Vectorization at -O3+ should
have used those (though without -ffast-math FP vectorization is
seriously restricted).

It would be nice to see -O3 -ffast-math vs. whatever LLVM equivalent
is available.

Also note that for SPEC -funroll-loops helps GCC (yes ... we don't
enable that by default at -O3, we probably should).

I don't know whether LLVM with -O4 creates fat objects as we do
(you can link them without -flto).  If not, then for compile-time
you should use -fno-fat-lto-objects.  Does LLVM parallelize the
LTO link stage?  If so you should compare with -flto=jobserver
or -flto=number-of-available-cores.  If not you should compare with
-flto-partition=none (that will save some I/O and processing time).

As a general note - we don't pay much attention to SPEC 2000
performance these days but instead look at SPEC CPU 2006 ...

Thanks for the comparison!
Richard.

> I tried to decrease the number of graphs which are still too many.
> Therefore I removed data for -O0 -g and -Os from the graphs but still
> I post some data about these modes below.  If you need exact numbers
> you should look at the tables from which the graphs were generated.
>
> I had to use -O0 for compilation of SPECInt2000 254.gap for both
> compilers as LLVM3.2 can not generate correct code in any optimization
> mode for this test.
>
> Here are my conclusions from analyzing the data:
>
> o LLVM made a regress in supported non-experimental languages which
>   makes a performance comparison much harder for me.  Earlier LLVM was
>   able to use GCC frontends (although old ones) including Fortran
>   front-end.  Now *CLANG* driver when it processes Fortran programs
>   just calls GCC Fortran compiler.  So comparison of CLANG LLVM and
>   GCC on SPECFP2000 has no sense (it would be just a comparison of GCC
>   4.8 and version of GCC standardly used on a given machine) although
>   you can find such comparisons on the Internet (e.g. on phoronix.com)
>
>   Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM
>   backend instead of GCC backend) for generation of Fortran benchmarks
>   by LLVM.
>
>   Although CLANG made LLVM less dependent on GCC, *still LLVM is
>   heavily dependent on GCC and more generally on other GNU projects*
>   (GOLD, binutils etc).  Industrial compilers (including Intel
>   compilers, SUN studio compilers, OPEN64, Pathscale) usually support
>   triad of languages C, C++, and Fortran.  It is a pretty big
>   investment to implement Fortran front-end especially with
>   language-dependent optimizations.
>
>
> o The difference between LLVM and GCC on integer benchmarks is only
>   about 8% for -O3 and 3-4% for 32- and 64-bit peak performance (when
>   LTO is used by both compilers).  On floating point benchmarks, the
>   difference is 3% and 9% for -O3 correspondingly for 32- and 64-bit
>   modes and 6% and 12% for the peak performance.
>
>   To see a perspective, the performance difference between LLVM2.9 and
>   GCC4.7 reached 20% (on SPECFP2000 in 32- and 64-bit modes for -O3).
>   So *LLVM made a significant progress* with the performance point of
>   view since 2.9 version.
>
>   I believe such progress is achieved mostly because of a *new RA*
>   introduced in LLVM 3.0 and *auto-vectorization*.  By the way,
>   although new LLVM RA is much better than the old one, I think it is
>   a mistake that the new RA still does not use graph-coloring based RA
>   which has a potential to improve performance even more
>
> o In 2011, 

Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Vladimir Makarov

On 02/07/2013 11:09 AM, Richard Biener wrote:

On Thu, Feb 7, 2013 at 4:26 PM, Vladimir Makarov  wrote:

I've add pages comparing LLVM-3.2 and coming GCC 4.8 on
http://vmakarov.fedorapeople.org/spec/.

The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and
x86-64 SPEC2000 under link named 2013. You can find these links at the
bottom of the left frame.

If you prefer email for reading the comparison, here is the copy of page
accessible by link named 2013:


Comparison of GCC and LLVM in 2013.

This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2*
which was released at the very end of 2012.

As usually I am focused mostly on the compiler comparison as
*optimizing* compilers on major platform x86/x86-64.  I don't consider
other aspects of the compilers as quality of debug information
(especially in optimizations modes), supported languages, standards
and extensions (e.g. OMP), supported targets and ABI, support of
just-in-time compilation etc.

This year I did the comparison using following major options
equivalent with my point of view:

o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2
o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8

On the web-page you say that you use -Ofast -fno-fast-math (because
that is what LLVM does with -O4).  For GCC that's equivalent to -O3
(well, apart from that you enable -flto).  So you can as well say you
tested -O3 -flto.
I guess -Ofast -fno-fast-math is not just -O3 but you are right it is 
pretty close.

For 32bit you used -mtune=corei7 -march=i686 - did you disable
CPU features like SSE on purpose?  Vectorization at -O3+ should
have used those (though without -ffast-math FP vectorization is
seriously restricted).
Yes, I did it on purpose.  Some 32-bit Linux distributions (e.g. Fedora) 
uses this architecture.  Another reason is that I'd like to see how good 
compilers work with fp stack (I got an impression that LLVM generates 
less fp stack regs shuffles, so I think we could improve regstack.c).  
Although it is a dying architecture and probably we should pay more 
attention to SSE architectures

It would be nice to see -O3 -ffast-math vs. whatever LLVM equivalent
is available.
That was my first intention.  Unfortunately, a few SPECFP tests do not 
generate expected results (and SPEC fails).



Also note that for SPEC -funroll-loops helps GCC (yes ... we don't
enable that by default at -O3, we probably should).
I should try this too.  My intention was to use most commonly known and 
used options with the point of view of average GCC user.  I believe that 
GCC has more potential to improve code for specific tests by advanced 
user as it has much more options and parameters to control the 
optimizations

I don't know whether LLVM with -O4 creates fat objects as we do
(you can link them without -flto).  If not, then for compile-time
you should use -fno-fat-lto-objects.

Please, see comments above.

   Does LLVM parallelize the
LTO link stage?

No, I don't think so.

   If so you should compare with -flto=jobserver
or -flto=number-of-available-cores.  If not you should compare with
-flto-partition=none (that will save some I/O and processing time).
I wanted to wrote about it.  LTO is well parallelized.  But I did not do 
this as my scripts uses CPU time.

As a general note - we don't pay much attention to SPEC 2000
performance these days but instead look at SPEC CPU 2006 ...
I'd like to but it still requires too much CPU time even on fastest 
machines.  Besides IMHO, some SPEC2006 tests are too memory bound and 
does not see many optimizations potential.  But may be next time.


  
Thanks for the comparison!

Richard.
Richard, probably I am wrong, but I felt your frustration a bit. The 
bechmarking is evil.  Every time I post some benchmark results, I always 
see comments about benchmark credibility, wrong comparisons (used 
options) etc.  There is always some true in these comments. But I still 
believe they can help us too  see where we stand and where we are moving.


I am also a believer that GCC is much better and refined, and more 
reliable compiler working on evenly good on practically any platform and 
I prefer to work on GCC not LLVM (e.g. I thought about doing some RA 
work on LLVM long ago as their old RA was quite patetic but I preferred 
to work on GCC instead).


And thanks for the comments.



Re: var-tracking wrt. leaf regs on sparc

2013-02-07 Thread Jakub Jelinek
On Wed, Feb 06, 2013 at 03:18:27PM -0500, David Miller wrote:
> From: Eric Botcazou 
> Date: Wed, 06 Feb 2013 11:13:30 +0100
> 
> > I think testing crtl->uses_only_leaf_regs is sufficient here (and
> > while you're at it, you could also test the value of
> > HAVE_window_save, which can be 0 if -mflat is passed on the SPARC),
> > so
> > 
> > #ifdef HAVE_window_save
> > if (HAVE_window_save && !crtl->uses_only_leaf_regs)
> >   {
> > 
> >   }
> > #endif
> 
> Yes, this works perfectly, Jakub any objections?

Perhaps some progress, but not fully working.  I guess you should start
with deciding when the regs should be remapped.  Consider even
simple testcase like (-O2 -g -dA):

int
foo (int a, int b)
{
  int c = a;
  int d = a + b;
  int e = a + b;
  return e;
}

Before *.vartrack, all debug_insn as well as normal insns refer to
%i0 and %i1, before your patch some NOTE_INSN_VAR_LOCATION were referring
to %o[01] registers, others to %i[01] registers, with your patch all refer
to %i[01] registers.  leaf_renumber_regs isn't performed on notes (so,
neither NOTE_INSN_VAR_LOCATION nor NOTE_INSN_CALL_ARG_LOCATION are
adjusted).  Then supposedly somewhere in dwarf2out we do some adjustment,
but still end up with d/e loclist of:
.LLST2:
.uaxword.LVL0-.Ltext0   ! Location list begin address (*.LLST2)
.uaxword.LVL1-.Ltext0   ! Location list end address (*.LLST2)
.uahalf 0x6 ! Location expression size
.byte   0x88! DW_OP_breg24
.byte   0   ! sleb128 0
.byte   0x89! DW_OP_breg25
.byte   0   ! sleb128 0
.byte   0x22! DW_OP_plus
.byte   0x9f! DW_OP_stack_value
.uaxword.LVL1-.Ltext0   ! Location list begin address (*.LLST2)
.uaxword.LFE0-.Ltext0   ! Location list end address (*.LLST2)
.uahalf 0x1 ! Location expression size
.byte   0x58! DW_OP_reg8
.uaxword0   ! Location list terminator begin (*.LLST2)
.uaxword0   ! Location list terminator end (*.LLST2)
where I'd expect breg8/breg9 instead.

Jakub


Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread David Edelsohn
On Thu, Feb 7, 2013 at 11:09 AM, Richard Biener
 wrote:

> Also note that for SPEC -funroll-loops helps GCC (yes ... we don't
> enable that by default at -O3, we probably should).

Richi,

Are you suggesting enabling -funroll-loops by default at -O3?  When I
checked earlier this year, GCC was too aggressive with loop unrolling
on non-numerically-intensive code and that option was not uniformly
beneficial, unfortunately.  It would be a good change if GCC's
unrolling heuristics were better.

Thanks, David


Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Xinliang David Li
On Thu, Feb 7, 2013 at 9:28 AM, David Edelsohn  wrote:
> On Thu, Feb 7, 2013 at 11:09 AM, Richard Biener
>  wrote:
>
>> Also note that for SPEC -funroll-loops helps GCC (yes ... we don't
>> enable that by default at -O3, we probably should).
>
> Richi,
>
> Are you suggesting enabling -funroll-loops by default at -O3?  When I
> checked earlier this year, GCC was too aggressive with loop unrolling
> on non-numerically-intensive code and that option was not uniformly
> beneficial, unfortunately.  It would be a good change if GCC's
> unrolling heuristics were better.

yes, GCC's unroller behavior is at two extremes -- tree level full
unroller is too conservative, while rtl unroller is too aggressive
(e.g, in cases where ICC unroll 2 times, GCC unrolls by 9 -- even when
the body contains conditional branches and the trip count is a small
constant (e.g. 100).

For comparison. ICC turns on loop unrolling at O3 -- but it does it
conservatively compared with GCC's behavior with -funroll-loops.  Full
unrolling is turned on at O2 and above for all ICC, LLCM, and GCC, but
GCC's one is almost useless -- it does not allow any code growth at O2
which means it only kicks in for tiny loops with very small trip
count. Both ICC and LLVM are more aggressive at O2.

David

>
> Thanks, David


Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Duncan Sands

Hi Vladimir, thanks for these numbers.

...

   Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM
   backend instead of GCC backend) for generation of Fortran benchmarks
   by LLVM.

...

   I believe such progress is achieved mostly because of a *new RA*
   introduced in LLVM 3.0 and *auto-vectorization*.


I don't think it can be auto-vectorization, because I forgot to turn on the
LLVM auto-vectorizer in dragonegg-3.2 (oops!).

Ciao, Duncan.


Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Richard Biener
On Thu, Feb 7, 2013 at 7:16 PM, Xinliang David Li  wrote:
> On Thu, Feb 7, 2013 at 9:28 AM, David Edelsohn  wrote:
>> On Thu, Feb 7, 2013 at 11:09 AM, Richard Biener
>>  wrote:
>>
>>> Also note that for SPEC -funroll-loops helps GCC (yes ... we don't
>>> enable that by default at -O3, we probably should).
>>
>> Richi,
>>
>> Are you suggesting enabling -funroll-loops by default at -O3?  When I
>> checked earlier this year, GCC was too aggressive with loop unrolling
>> on non-numerically-intensive code and that option was not uniformly
>> beneficial, unfortunately.  It would be a good change if GCC's
>> unrolling heuristics were better.
>
> yes, GCC's unroller behavior is at two extremes -- tree level full
> unroller is too conservative, while rtl unroller is too aggressive
> (e.g, in cases where ICC unroll 2 times, GCC unrolls by 9 -- even when
> the body contains conditional branches and the trip count is a small
> constant (e.g. 100).
>
> For comparison. ICC turns on loop unrolling at O3 -- but it does it
> conservatively compared with GCC's behavior with -funroll-loops.  Full
> unrolling is turned on at O2 and above for all ICC, LLCM, and GCC, but
> GCC's one is almost useless -- it does not allow any code growth at O2
> which means it only kicks in for tiny loops with very small trip
> count. Both ICC and LLVM are more aggressive at O2.

I meant to enable the unroller at the GIMPLE level to the extent it is
only with -funroll-loops.  I agree the RTL level unroller is too aggressive.

Richard.

> David
>
>>
>> Thanks, David


Re: var-tracking wrt. leaf regs on sparc

2013-02-07 Thread David Miller
From: Jakub Jelinek 
Date: Thu, 7 Feb 2013 18:22:32 +0100

> Then supposedly somewhere in dwarf2out we do some adjustment,
> but still end up with d/e loclist of:
> .LLST2:
> .uaxword.LVL0-.Ltext0   ! Location list begin address 
> (*.LLST2)
> .uaxword.LVL1-.Ltext0   ! Location list end address (*.LLST2)
> .uahalf 0x6 ! Location expression size
> .byte   0x88! DW_OP_breg24
> .byte   0   ! sleb128 0
> .byte   0x89! DW_OP_breg25
> .byte   0   ! sleb128 0
> .byte   0x22! DW_OP_plus
> .byte   0x9f! DW_OP_stack_value
> .uaxword.LVL1-.Ltext0   ! Location list begin address 
> (*.LLST2)
> .uaxword.LFE0-.Ltext0   ! Location list end address (*.LLST2)
> .uahalf 0x1 ! Location expression size
> .byte   0x58! DW_OP_reg8
> .uaxword0   ! Location list terminator begin (*.LLST2)
> .uaxword0   ! Location list terminator end (*.LLST2)
> where I'd expect breg8/breg9 instead.

The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c:

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 06cfb18..765d5c5 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset,
}
 }
 
-  regno = DWARF_FRAME_REGNUM (REGNO (reg));
+  regno = REGNO (reg);
+#ifdef LEAF_REG_REMAP
+  if (crtl->uses_only_leaf_regs)
+{
+  int leaf_reg = LEAF_REG_REMAP (regno);
+  if (leaf_reg != -1)
+   regno = (unsigned) leaf_reg;
+}
+#endif
+  regno = DWARF_FRAME_REGNUM (regno);
 
   if (!optimize && fde
   && (fde->drap_reg == regno || fde->vdrap_reg == regno))


Re: var-tracking wrt. leaf regs on sparc

2013-02-07 Thread Jakub Jelinek
On Thu, Feb 07, 2013 at 02:38:18PM -0500, David Miller wrote:
> The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c:
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 06cfb18..765d5c5 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset,
>   }
>  }
>  
> -  regno = DWARF_FRAME_REGNUM (REGNO (reg));
> +  regno = REGNO (reg);
> +#ifdef LEAF_REG_REMAP
> +  if (crtl->uses_only_leaf_regs)
> +{
> +  int leaf_reg = LEAF_REG_REMAP (regno);
> +  if (leaf_reg != -1)
> + regno = (unsigned) leaf_reg;
> +}
> +#endif
> +  regno = DWARF_FRAME_REGNUM (regno);
>  
>if (!optimize && fde
>&& (fde->drap_reg == regno || fde->vdrap_reg == regno))

This and earlier patch are ok, if it bootstraps/regtests fine, and suitable
ChangeLog entry is provided.
Running gdb testsuite before and after wouldn't hurt though.

Jakub


Re: var-tracking wrt. leaf regs on sparc

2013-02-07 Thread David Miller
From: David Miller 
Date: Thu, 07 Feb 2013 14:38:18 -0500 (EST)

> From: Jakub Jelinek 
> Date: Thu, 7 Feb 2013 18:22:32 +0100
> 
>> Then supposedly somewhere in dwarf2out we do some adjustment,
>> but still end up with d/e loclist of:
>> .LLST2:
>> .uaxword.LVL0-.Ltext0   ! Location list begin address 
>> (*.LLST2)
>> .uaxword.LVL1-.Ltext0   ! Location list end address (*.LLST2)
>> .uahalf 0x6 ! Location expression size
>> .byte   0x88! DW_OP_breg24
>> .byte   0   ! sleb128 0
>> .byte   0x89! DW_OP_breg25
>> .byte   0   ! sleb128 0
>> .byte   0x22! DW_OP_plus
>> .byte   0x9f! DW_OP_stack_value
>> .uaxword.LVL1-.Ltext0   ! Location list begin address 
>> (*.LLST2)
>> .uaxword.LFE0-.Ltext0   ! Location list end address (*.LLST2)
>> .uahalf 0x1 ! Location expression size
>> .byte   0x58! DW_OP_reg8
>> .uaxword0   ! Location list terminator begin (*.LLST2)
>> .uaxword0   ! Location list terminator end (*.LLST2)
>> where I'd expect breg8/breg9 instead.
> 
> The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c:

So the combined patch is below, any objections?

Here is the testsuite diff:

@@ -155,8 +148,8 @@ FAIL: gcc.dg/guality/vla-2.c  -O2 -flto

=== gcc Summary ===

-# of expected passes   2128
-# of unexpected failures   122
+# of expected passes   2135
+# of unexpected failures   115
 # of unexpected successes  31
 # of expected failures 17
 # of unsupported tests 136

This is undoubtedly an improvement.

gcc/

2013-02-07  David S. Miller  

* dwarf2out.c (based_loc_descr): Perform leaf register remapping
on 'reg'.
* var-tracking.c (vt_add_function_parameter): Test the presence of
HAVE_window_save properly and do not remap argument registers when
we have a leaf function.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 06cfb18..765d5c5 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset,
}
 }
 
-  regno = DWARF_FRAME_REGNUM (REGNO (reg));
+  regno = REGNO (reg);
+#ifdef LEAF_REG_REMAP
+  if (crtl->uses_only_leaf_regs)
+{
+  int leaf_reg = LEAF_REG_REMAP (regno);
+  if (leaf_reg != -1)
+   regno = (unsigned) leaf_reg;
+}
+#endif
+  regno = DWARF_FRAME_REGNUM (regno);
 
   if (!optimize && fde
   && (fde->drap_reg == regno || fde->vdrap_reg == regno))
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 714acb69..0db1562 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -9502,31 +9502,34 @@ vt_add_function_parameter (tree parm)
   /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers.
  If the target machine has an explicit window save instruction, the
  actual entry value is the corresponding OUTGOING_REGNO instead.  */
-  if (REG_P (incoming)
-  && HARD_REGISTER_P (incoming)
-  && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming))
+  if (HAVE_window_save && !crtl->uses_only_leaf_regs)
 {
-  parm_reg_t p;
-  p.incoming = incoming;
-  incoming
-   = gen_rtx_REG_offset (incoming, GET_MODE (incoming),
- OUTGOING_REGNO (REGNO (incoming)), 0);
-  p.outgoing = incoming;
-  vec_safe_push (windowed_parm_regs, p);
-}
-  else if (MEM_P (incoming)
-  && REG_P (XEXP (incoming, 0))
-  && HARD_REGISTER_P (XEXP (incoming, 0)))
-{
-  rtx reg = XEXP (incoming, 0);
-  if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg))
+  if (REG_P (incoming)
+ && HARD_REGISTER_P (incoming)
+ && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming))
{
  parm_reg_t p;
- p.incoming = reg;
- reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg)));
- p.outgoing = reg;
+ p.incoming = incoming;
+ incoming
+   = gen_rtx_REG_offset (incoming, GET_MODE (incoming),
+ OUTGOING_REGNO (REGNO (incoming)), 0);
+ p.outgoing = incoming;
  vec_safe_push (windowed_parm_regs, p);
- incoming = replace_equiv_address_nv (incoming, reg);
+   }
+  else if (MEM_P (incoming)
+  && REG_P (XEXP (incoming, 0))
+  && HARD_REGISTER_P (XEXP (incoming, 0)))
+   {
+ rtx reg = XEXP (incoming, 0);
+ if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg))
+   {
+ parm_reg_t p;
+ p.incoming = reg;
+ reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg)));
+ p.outgoing = reg;
+ vec_safe_push (windowed_parm_regs, p);
+ incoming = replace_equiv_address_nv (incoming, reg);
+   }
}
 }
 #endif


Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Jack Howarth
On Thu, Feb 07, 2013 at 07:51:20PM +0100, Duncan Sands wrote:
> Hi Vladimir, thanks for these numbers.
>
> ...
>>Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM
>>backend instead of GCC backend) for generation of Fortran benchmarks
>>by LLVM.
> ...
>>I believe such progress is achieved mostly because of a *new RA*
>>introduced in LLVM 3.0 and *auto-vectorization*.
>
> I don't think it can be auto-vectorization, because I forgot to turn on the
> LLVM auto-vectorizer in dragonegg-3.2 (oops!).
>

Duncan,
   I was under the impression that the vectorization in llvm 3.2 was still
rather experimental compared to the current state in llvm 3.3svn (where it is
now enabled for clang at -O3). Perhaps it would be more fruitful to compare
gcc trunk and llvm svn.
Jack
ps Regarding LTO, my understanding is that, at least for darwin, the LTO
in llvm is still primarily limited to dead-code elimination. When I last
asked about this, Chris Lattner said that they felt more was to be gained
from improving the IPA in the compiler than implementing the rest of LTO. Of
course that was a couple years back.

> Ciao, Duncan.


Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

2013-02-07 Thread Vladimir Makarov

On 02/07/2013 01:51 PM, Duncan Sands wrote:

Hi Vladimir, thanks for these numbers.

...

   Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM
   backend instead of GCC backend) for generation of Fortran benchmarks
   by LLVM.

...

   I believe such progress is achieved mostly because of a *new RA*
   introduced in LLVM 3.0 and *auto-vectorization*.


I don't think it can be auto-vectorization, because I forgot to turn 
on the

LLVM auto-vectorizer in dragonegg-3.2 (oops!).


Thanks for pointing this out.  I'll correct this.

I checked it is not switched on for CLANG too.  As I understand this 
stuff is experimental.  I see on my small benchmarks that sometimes it 
generates wrong code and in many cases slower code too.




Re: var-tracking wrt. leaf regs on sparc

2013-02-07 Thread David Miller
From: Jakub Jelinek 
Date: Thu, 7 Feb 2013 20:43:32 +0100

> This and earlier patch are ok, if it bootstraps/regtests fine, and suitable
> ChangeLog entry is provided.
> Running gdb testsuite before and after wouldn't hurt though.

I've done all of this, and committed to trunk and the gcc-4.7 branch,
thanks.

In looking at the remaining failures, several have to do with
an early clobber if the first incoming argument register.

The issue is that this is where return values are placed, so we run
into a situation where that incoming argument value can't be
reconstituted in any way by the variable tracking code and thus gdb
says that it has been optimized out.

Many non-x86 cpus are going to run into this problem.  For example,
from pr36728-1.c:

foo:
save%sp, -96, %sp
add %sp, -40, %sp
mov 2, %g2
add %sp, 123, %g1
mov 25, %g4
and %g1, -32, %g1
sethi   %hi(b), %g3
st  %g2, [%g1]
ld  [%fp+92], %g2
nop
ld  [%g1], %i0
add %g2, 14, %g2
and %g2, -8, %g2
sub %sp, %g2, %sp
stb %g4, [%sp+96]
add %sp, 96, %g2
sethi   %hi(a), %g4
nop
return  %i7+8
 nop

Here %i0 is written early, and then the tests can't view 'arg1'
properly later in the function.

Also, I noticed that calculation of the on-stack address of values
with alignment regressed in gcc-4.8 vs. gcc-4.7 Again, in pr36728-1.c,
'y' can be printed properly in gcc-4.7 but in gcc-4.8 it cannot.

I think it might be getting the base register wrong, I'll look
more deeply if I get a chance.


Auxilio Tengo un Socio

2013-02-07 Thread Lic. Areliz Massanges
¡No deje que la lucha de egos y roles interfieran con sus intereses! Aprenda 
las inigualables ventajas de cimentar sus relaciones profesionales.

¡AUXILIO! TENGO UN ¡SOCIO! 

¿Dónde se llevará a cabo?
Lugar: Su computadora.
Fecha: 11 de Marzo de 2013.
Duración: 3 Horas.
Horario: De 10:00 a.m a 1:00 p.m. (Hora del Centro de México).

Las relaciones humanas son complejas y complicadas… más aún cuando en ellas 
intervienen intereses  económicos o profesionales de por medio. Un gran 
porcentaje de fracasos en organizaciones y empresas se deriva conflictos y mala 
comunicación entre sus socios, que en la mayoría de los casos llevan a la ruina 
proyectos prometedores a corto, mediano y largo plazo.

Mediante este práctico seminario conozca, analice e implemente técnicas que 
permitirán a su negocio crecer asegurando sustancialmente los aspectos de las 
relaciones humanas, psicológicas y jurídicas, claves para lograr sociedades 
satisfechas y aptas para alcanzar armoniosamente las llaves del éxito. Aprenda 
a construir relaciones profesionales duraderas y redituables que proporcionen 
estabilidad a su negocio.

Adquiera el folleto completo y sin compromiso, sólo responda este correo con 
los siguientes datos:
Nombre(imprescindible) : 
Empresa :
Teléfonos (imprescindible):
E-mail: g...@gnu.org

o Comuníquese a nuestro Centro Nacional de Atencion Telefonica: 01 - 800 - 212 
- 9393

Reciba un muy cordial saludo
Lic. Areliz Massanges
Líder de Proyectos

Para darte de baja y no recibir ningún tema de nuestra empresa, envíe un correo 
con asunto Nomasinfo




Marking nodes as addressable

2013-02-07 Thread Matt Davis
I have a GIMPLE_CALL statement and I want to mark the left-hand-side
value as being addressable (mark_addressable()).  I am trying to force
the result to be stored on the stack, and not in a register.  I know
the return of a call on an 64bit x86 is passed back to the caller in
the rax register.  I want the return value to be immediately moved
onto the stack from rax after the caller resumes execution.  When I do
mark the LHS of the call as being addressable, the ssa-expansion
fails, as the updated node is not in the var_partition when
get_rtx_for_ssa_name() is called.  How can I tease the return of a
caller to be stored on the stack, in a temporary variable, instead of
lying around in a register, or being passed to other free registers?

-Matt