New size data -- hopefully it is sane this time.
Changes in experiment
1) shared libstdc++ is used with trunk gcc
2) bfd linker is used in both trunk and patched 4.4.3 compiler (which
used gold).
The size comparison for all C benchmarks in previous report is still
valid. The following is the corr
On Thu, Nov 18, 2010 at 4:12 PM, Jan Hubicka wrote:
> Hi,
>> I'll get back to you with our local inlining changes. We're looking to move
>> development closer to trunk to reduce this divergence in the future.
>>
>> Our tuning was done primarily on big c++ programs. A significant size
>> improvem
Hi,
> I'll get back to you with our local inlining changes. We're looking to move
> development closer to trunk to reduce this divergence in the future.
>
> Our tuning was done primarily on big c++ programs. A significant size
> improvement came from aggressively inlining functions which might b
I found an error in my size experiment set up -- (libstdc++ shared vs
non shared) -- please discard the size numbers -- will remeasure.
Thanks,
David
On Thu, Nov 18, 2010 at 4:02 AM, Jan Hubicka wrote:
> Hi,
> and for size, could you please also do -Os comparsions? I am aware that -O2
> inline
On Thu, Nov 18, 2010 at 3:58 AM, Jan Hubicka wrote:
>> Some text size measurement.
>>
>> Summary:
>> 1) LTO with -O3 bloats up code considerably;
> Yes, you need either -fwhole-program or -fuse-linker-plugin to make it behave
> sanely.
>
> For Mozilla I have best experience with -fuse-linker-plugi
Hi,
and for size, could you please also do -Os comparsions? I am aware that -O2
inliner is tuned somewhat up at C++. This is given by fact that we do have C++
benchmark suite we use to monitor inlining.
http://gcc.opensuse.org/c++bench-frescobaldi/
Programs there are a lot more aggressive on abs
> Some text size measurement.
>
> Summary:
> 1) LTO with -O3 bloats up code considerably;
Yes, you need either -fwhole-program or -fuse-linker-plugin to make it behave
sanely.
For Mozilla I have best experience with -fuse-linker-plugin --param
inline-unit-growth=5 That gives me about 16% code s
Some text size measurement.
Summary:
1) LTO with -O3 bloats up code considerably;
2) LTO with -O2 reduces text size compared with -O2
3) Google 4.4.3 based compiler is really effective in reducing C++
program size -- this is where the focus of the tuning was done.
Witnessed by eon in SPEC2k and al
On Tue, Nov 16, 2010 at 6:35 AM, Jan Hubicka wrote:
>> More FDO related performance numbers
>>
>> Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance
>> by 5% geomean
>> Experiment 2: our internal gcc compiler (4.4.3 based with many local
>> patches) O2 + FDO vs O2 (trunk gcc):
> More FDO related performance numbers
>
> Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance
> by 5% geomean
> Experiment 2: our internal gcc compiler (4.4.3 based with many local
> patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6%
> geomean
> Experiment 3: our
2010/11/16 Jan Hubicka :
>> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote:
>> >> > Fortunately linker plugin solves the problem here and this is why I
>> >> > want to
>> >> > have it by default. GCC then can do effectively -fwhole-program for
>> >> > binaries
>> >> > (since linker knows wh
More FDO related performance numbers
Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance
by 5% geomean
Experiment 2: our internal gcc compiler (4.4.3 based with many local
patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6%
geomean
Experiment 3: our internal gcc (4.
More performance data:
-O2 -funroll-all-loops vs O2: +1.1% geomean
O2 O2 unroll-all-loops
164.gzip13241336 0.94%
175.vpr16941670 -1.44%
> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote:
> >> > Fortunately linker plugin solves the problem here and this is why I want
> >> > to
> >> > have it by default. GCC then can do effectively -fwhole-program for
> >> > binaries
> >> > (since linker knows what will be bound elsewhere) and
On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote:
>> > Fortunately linker plugin solves the problem here and this is why I want to
>> > have it by default. GCC then can do effectively -fwhole-program for
>> > binaries
>> > (since linker knows what will be bound elsewhere) and take advantage of
> > Fortunately linker plugin solves the problem here and this is why I want to
> > have it by default. GCC then can do effectively -fwhole-program for
> > binaries
> > (since linker knows what will be bound elsewhere) and take advantage of
> > visibility((hidden)) hints for shared libraries same
> Fortunately linker plugin solves the problem here and this is why I want to
> have it by default. GCC then can do effectively -fwhole-program for binaries
> (since linker knows what will be bound elsewhere) and take advantage of
> visibility((hidden)) hints for shared libraries same way. Most o
> On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka wrote:
> >> This means O3 level inlining should be turned on also for lto build by
> >> default -- as -O2 lto performance is too unimpressive.
> >
> > I am just re-tunning the inliner and hope to get more speedups for smaller
> > costs than we get rig
On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka wrote:
>> This means O3 level inlining should be turned on also for lto build by
>> default -- as -O2 lto performance is too unimpressive.
>
> I am just re-tunning the inliner and hope to get more speedups for smaller
> costs than we get right now. I h
> > This means O3 level inlining should be turned on also for lto build by
> > default -- as -O2 lto performance is too unimpressive.
>
> I am just re-tunning the inliner and hope to get more speedups for smaller
> costs than we get right now. I however don't think we can resonably enable it
> as
> This means O3 level inlining should be turned on also for lto build by
> default -- as -O2 lto performance is too unimpressive.
I am just re-tunning the inliner and hope to get more speedups for smaller
costs than we get right now. I however don't think we can resonably enable it
as it is at LT
This means O3 level inlining should be turned on also for lto build by
default -- as -O2 lto performance is too unimpressive.
David
On Mon, Nov 15, 2010 at 3:36 PM, Xinliang David Li wrote:
> Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
> data come later.
>
>
Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
data come later.
164.gzip13241322 -0.10%
175.vpr16941703 0.51%
176.gcc22932347
> I did some measurement (64bit).
>
> Experiment 1:
>
> -O2 -funroll-loops vs -O2
>
> It improves performance (geomean) by 0.56%, not too much:
> O2 O2 unroll-loops
> 164.gzip13241331 0.56%
I did some measurement (64bit).
Experiment 1:
-O2 -funroll-loops vs -O2
It improves performance (geomean) by 0.56%, not too much:
O2 O2 unroll-loops
164.gzip13241331 0.56%
175.v
2010/11/15 Jan Hubicka :
>> For peak, FDO is the most effective option. It can boost performance
>> by 7-10% depending on the program. The options you suggested probably
>> won't make too big a dent. -funroll-loops can hurt performance
>> without profiling. More aggressive inlining, ipa-cp, unswi
> For peak, FDO is the most effective option. It can boost performance
> by 7-10% depending on the program. The options you suggested probably
> won't make too big a dent. -funroll-loops can hurt performance
> without profiling. More aggressive inlining, ipa-cp, unswitching etc
-funroll-loops ov
For peak, FDO is the most effective option. It can boost performance
by 7-10% depending on the program. The options you suggested probably
won't make too big a dent. -funroll-loops can hurt performance
without profiling. More aggressive inlining, ipa-cp, unswitching etc
enabled by O3 may help a l
Hello,
On 14.11.2010 0:08, Xinliang David Li wrote:
I re-measured the performance difference using trunk gcc and trunk
clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc
because clang/llvm's type based aliasing is not incomplete and not
enabled by default. I also added -fomit-fram
Thanks, this works.
gcc vs llvm
176.gcc: +3.7%
252.eon: +6.1%
David
On Sat, Nov 13, 2010 at 3:14 PM, H.J. Lu wrote:
> On Sat, Nov 13, 2010 at 1:08 PM, Xinliang David Li wrote:
>>
>> Though gcc leads LLVM in performance overrall, there are a couple of
>> benchmarks gcc is worse: vpr and crafty
On Sat, Nov 13, 2010 at 1:08 PM, Xinliang David Li wrote:
>
> Though gcc leads LLVM in performance overrall, there are a couple of
> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
> twolf (32bit), vortex (64bit). This needs to be triaged. gcc
> miscompiles gcc and eon in
On Sat, Nov 13, 2010 at 2:39 PM, Paolo Bonzini wrote:
> On 11/13/2010 10:08 PM, Xinliang David Li wrote:
>>
>> Though gcc leads LLVM in performance overrall, there are a couple of
>> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
>> twolf (32bit), vortex (64bit). This needs
On 11/13/2010 10:08 PM, Xinliang David Li wrote:
Though gcc leads LLVM in performance overrall, there are a couple of
benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
twolf (32bit), vortex (64bit). This needs to be triaged. gcc
miscompiles gcc and eon in 32bit -- is there
I re-measured the performance difference using trunk gcc and trunk
clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc
because clang/llvm's type based aliasing is not incomplete and not
enabled by default. I also added -fomit-frame-pointer to clang/llvm as
this is gcc's default. The b
On Mon, Sep 27, 2010 at 11:04:10AM -0700, Neil Vachharajani wrote:
> On Thu, Apr 29, 2...@4:07 PM, Steven Bosscher wrote:
> > 2010/4/30 Jan Hubicka :
> >> Yep, I read that page (and saw some of implementation too). Just was not
> >> able
> >> to follow the precise feature set of LIPO (i.e. if it
On Fri, Apr 30, 2010 at 12:07 PM, Xinliang David Li wrote:
>
> On Fri, Apr 30, 2010 at 11:12 AM, Jan Hubicka wrote:
> >> >
> >> > Interesting. My plan for profiling with LTO is to ultimately make it
> >> > linktime
> >> > transform. This will be more difficult with WHOPR (i.e. instrumenting
>
On Thu, Apr 29, 2010 at 4:07 PM, Steven Bosscher wrote:
> 2010/4/30 Jan Hubicka :
>> Yep, I read that page (and saw some of implementation too). Just was not
>> able
>> to follow the precise feature set of LIPO (i.e. if it gets better SPEC
>> results
>> than LTO+FDO then why)
>
> OK, that's an
> On Sun, May 2, 2010 at 6:45 AM, Jan Hubicka wrote:
> That depends. The following cases exist in vortex:
>
> 1) the value is runtime constant -- it is read from input file but
> never changed -- e.g.: QueBug. Nothing can be done by the compiler in
> this case;
>
> 2) Global variable written onl
On Sun, May 2, 2010 at 6:45 AM, Jan Hubicka wrote:
>> On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote:
>> >>
>> >> Vortex needs -fno-strict-aliasing. It casts between two record types
>> >> with one record being a 'prefix' of another.
>> >
>> > So today runs are complette. Thanks to Richi who
> On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote:
> >>
> >> Vortex needs -fno-strict-aliasing. It casts between two record types
> >> with one record being a 'prefix' of another.
> >
> > So today runs are complette. Thanks to Richi who fixed ICE in symtab
> > merging
> > that affected perl a
On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote:
>>
>> Vortex needs -fno-strict-aliasing. It casts between two record types
>> with one record being a 'prefix' of another.
>
> So today runs are complette. Thanks to Richi who fixed ICE in symtab merging
> that affected perl and GCC. With vorte
>
> Vortex needs -fno-strict-aliasing. It casts between two record types
> with one record being a 'prefix' of another.
So today runs are complette. Thanks to Richi who fixed ICE in symtab merging
that affected perl and GCC. With vortex problem was that in addition to
-fno-strict-aliasing it i
On Fri, Apr 30, 2010 at 11:12 AM, Jan Hubicka wrote:
>> >
>> > Interesting. My plan for profiling with LTO is to ultimately make it
>> > linktime
>> > transform. This will be more difficult with WHOPR (i.e. instrumenting need
>> > function bodies that are not available at WPA time), but I belie
> >
> > Interesting. My plan for profiling with LTO is to ultimately make it
> > linktime
> > transform. This will be more difficult with WHOPR (i.e. instrumenting need
> > function bodies that are not available at WPA time), but I believe it is
> > solvable: just assign uids to the edges and do
On Fri, Apr 30, 2010 at 1:37 AM, Jan Hubicka wrote:
>> In theory, LIPO should not generate better results than LTO+FDO. What
>> makes LIPO attractive is that it allows distributed build from the
>> beginning. Its integration with large distributed build system is also
>> easy. Another point is th
> In theory, LIPO should not generate better results than LTO+FDO. What
> makes LIPO attractive is that it allows distributed build from the
> beginning. Its integration with large distributed build system is also
> easy. Another point is that LIPO can be decoupled from FDO as well.
The integrati
On Thu, Apr 29, 2010 at 4:03 PM, Jan Hubicka wrote:
>> 2010/4/30 Jan Hubicka :
>> >> Thanks for the suggestion. Raksit currently is busy with merging trunk
>> >> changes back to lw-ipo branch which can be a daunting task. After that
>> >> this can be done. (Our internal release is based on 4.4).
2010/4/30 Jan Hubicka :
> Yep, I read that page (and saw some of implementation too). Just was not able
> to follow the precise feature set of LIPO (i.e. if it gets better SPEC results
> than LTO+FDO then why)
OK, that's an interesting question. The first question (if...) is
something you'll have
> 2010/4/30 Jan Hubicka :
> >> Thanks for the suggestion. Raksit currently is busy with merging trunk
> >> changes back to lw-ipo branch which can be a daunting task. After that
> >> this can be done. (Our internal release is based on 4.4).
> >
> > I must say that LIPO is something I always intend
2010/4/30 Jan Hubicka :
>> Thanks for the suggestion. Raksit currently is busy with merging trunk
>> changes back to lw-ipo branch which can be a daunting task. After that
>> this can be done. (Our internal release is based on 4.4).
>
> I must say that LIPO is something I always intend to look int
> Thanks for the suggestion. Raksit currently is busy with merging trunk
> changes back to lw-ipo branch which can be a daunting task. After that
> this can be done. (Our internal release is based on 4.4).
I must say that LIPO is something I always intend to look into but didn't
seriously find ti
On Thu, Apr 29, 2010 at 12:25:15PM -0400, Vladimir Makarov wrote:
>
> Currently Graphite gives small improvements on x86 (one exception is
> 2% for peak x86 SPECFP2000) and mostly degradation on x86_64 (with
> maximum one more than 10% for SPECFP2000 because of big degradations
> on mgrid and
Thanks for the suggestion. Raksit currently is busy with merging trunk
changes back to lw-ipo branch which can be a daunting task. After that
this can be done. (Our internal release is based on 4.4).
David
On Thu, Apr 29, 2010 at 2:38 PM, Steven Bosscher wrote:
> On Thu, Apr 29, 2010 at 11:27 P
> I noticed eon's peak options do not include FDO, is that intended?
I think it is just bug in page header, but I will double check.
Base and peak should match otherwise.
Honza
On Thu, Apr 29, 2010 at 11:27 PM, Jan Hubicka wrote:
> It would be interesting to know if same improvement happens with LTO and if
> not what LIPO does. I will unbreak vortex on our tester.
Perhaps you can add a LIPO tester? It looks like a very interesting
and promising approach.
Ciao!
Steven
I noticed eon's peak options do not include FDO, is that intended?
David
On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka wrote:
>> Thanks for the comments. FDO will probably improve SPEC2000 score.
>> Although it is not obvious for some tests because the train data sets
>> for them are different
On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka wrote:
>> Thanks for the comments. FDO will probably improve SPEC2000 score.
>> Although it is not obvious for some tests because the train data sets
>> for them are different from the reference data sets and it might
>> actually mislead the compiler.
BTW we are also tracking SPEC2k6 with and without LTO (not FDO runs)
http://gcc.opensuse.org/SPEC/CINT/sb-barbella.suse.de-ai-64/recent.html
http://gcc.opensuse.org/SPEC/CINT/sb-barbella.suse.de-head-64-2006/recent.html
not all 2k6 tests pass with LTO so it will need a bit care to compare results
> Thanks for the comments. FDO will probably improve SPEC2000 score.
> Although it is not obvious for some tests because the train data sets
> for them are different from the reference data sets and it might
> actually mislead the compiler.
There are several studies on the topic and it is
Point well put. The benchmark suite should have good mixture of
programs with different sizes. SPEC2k programs cluster at the lower
end of the spectrum though.
David
On Thu, Apr 29, 2010 at 12:43 PM, Vladimir Makarov wrote:
> Xinliang David Li wrote:
>>>
>>> Thanks for the comments. FDO will pr
Xinliang David Li wrote:
Thanks for the comments. FDO will probably improve SPEC2000 score.
Although it is not obvious for some tests because the train data sets for
them are different from the reference data sets and it might actually
mislead the compiler.
FDO is important for optimizations
>>
>
> Thanks for the comments. FDO will probably improve SPEC2000 score.
> Although it is not obvious for some tests because the train data sets for
> them are different from the reference data sets and it might actually
> mislead the compiler.
>
> FDO is important for optimizations where all p
Xinliang David Li wrote:
On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov wrote:
GCC-4.5.0 and LLVM-2.7 were released recently. To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases
On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov wrote:
> GCC-4.5.0 and LLVM-2.7 were released recently. To understand
> where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
> for x86/x86-64 and posted the comparison of it with the
> previous GCC releases and LLVM-2.7.
>
> Eve
Vladimir Makarov wrote:
Jan Hubicka wrote:
Seems like something sensitive for setup. In our daily benchmarking LTO
fatster on wupwise (2116 compared to 1600), and facerec is 2003
compared to
2041 (so about the same).
http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ai-64/list.html
ht
Jan Hubicka wrote:
GCC-4.5.0 and LLVM-2.7 were released recently. To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases and LLVM-2.7.
Even benchmarking SPEC2000 takes a lot of time on t
> GCC-4.5.0 and LLVM-2.7 were released recently. To understand
> where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
> for x86/x86-64 and posted the comparison of it with the
> previous GCC releases and LLVM-2.7.
>
> Even benchmarking SPEC2000 takes a lot of time on the fastest
GCC-4.5.0 and LLVM-2.7 were released recently. To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases and LLVM-2.7.
Even benchmarking SPEC2000 takes a lot of time on the fastest
machine I
68 matches
Mail list logo