date:20100429

Re: Why not contribute? (to GCC)

2010-04-29 Thread Paolo Bonzini


On 04/28/2010 12:33 AM, Alfred M. Szmidt wrote:

1) The back-and-forth is too much for casual contributors. If it is
more effort to do the legal work than to submit the first patch,
then they will never submit any patch at all.

Please do not exaggerate, if people have time for threads like these,
then they have time to send a short email with some questions or wait
a few days for a piece of paper to sign.


People are fine with spending time to improve things.

I find it harder to understand why one should argue to maintain the 
status quo.


Paolo

Re: LTO question

2010-04-29 Thread Richard Guenther

2010/4/29 Jan Hubicka :
>> > On 4/28/10 10:26 , Manuel López-Ibá?ez wrote:
>> >  Not yet, I mistakenly thought -fwhole-program is the same as -fwhopr
>> >  and it is just for solving scaling issue of large program.(These two
>> >  options do look similar :-). I shall try next.
>> > >>>
>> > >>> Yep, -fwhopr is not ideal name, but I guess there is not much
>> > >>> to do about it.
>> > >
>> > > It is marked as experimental, so if it is going to stay for GCC 4.6,
>> > > then we should change the name. I think one possibility discussed
>> > > somewhere is that LTO scales back automatically, so the option would
>> > > be not necessary.
>> >
>> > Yes.  I think we should just keep -flto and make it use split
>> > compilation if needed.  -fwhopr is only needed to explicitly enable it.
>> >  My suggestion is to just keep -flto and invoke whopr with -flto=split
>> > or -flto=big (until the automatic threshold is added).
>>
>> Yep, I like this idea too.  I hope to be able to drop "experimental" status
>> from mainline whopr soonish (basically I need to implement references and 
>> then
>> I will burn a lot of time fixing how clones are streamed to enable ipa-cp).
>
> And do something about paralelizing the whopr build.  I guess it means storing
> ltrans partition list into file and making collect2 to execute compilation
> and re-invent the Makefile code?
> It would be great if someone took look at this, I am not at all familiar with 
> that
> code and in a way I would preffer it to stay that way ;))

I will look at moving the LTRANS driving to the driver, it should be
easy to do parallel execs from it and hopefully make debugging
WPA/LTRANS less of a headache.

Richard.

> Honza
>

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Richard Guenther

On Thu, Apr 29, 2010 at 6:11 AM, Dave Korn
 wrote:
> On 26/04/2010 10:46, Richard Guenther wrote:
>> On Mon, Apr 26, 2010 at 4:25 AM, Dave Korn wrote:
>
>>>  If I understand correctly, what we farm out to gold/lto-plugin is the task
>>> of identifying a) which archive members are required to be pulled into the
>>> final link depending on the undefined references found in the existing 
>>> object
>>> files and b) what offsets those members begin at.
>
>> That's correct.  Now I delayed hacking this all into collect2
>> (because I don't think that's the correct place to do it).  I first
>> wanted to cleanup the driver <-> lto1 interface so we do _not_
>> rely on collect2 to identify LTO objects but instead have lto1
>> do that work and communicate final link objects to the driver
>> back via a response file (same for -fwhopr WPA / LTRANS
>> stage - do not exec lto1 from lto1 but rather tell the driver
>> the set of LTRANS files).
>>
>> That's also the only easy way to get rid of the .comm __gnu_lto_v2
>> marker and simply check the availability of one of the always
>> present LTO sections.  For ar archieves it is easy to iterate
>> through them and we need to re-write them anyway if we want
>> to support partly LTO / non-LTO objects inside an archive.
>>
>> Now of course if ar support ends up to look easy and sane in
>> the collect2 framework then we can choose to not wait for
>> somebody doing the above suggested cleanups ...
>
>  Actually, I'm about to argue that that's the correct place to do it, anyway.

Correct (I'll be working on that soon).

>  Isn't there going to be a problem that if we teach lto1 to look inside
> archives and extract members, it doesn't have the knowledge that the linker
> would have (cf. gold plugin) of which archive members will actually be needed
> in the final link, and that therefore it would have to assume any of the
> member objects might be needed and effectively recompile the entire library
> every link time?

Well, we'd then need to re-architect the symbol merging and
LTO unit read-in to properly honor linking semantics (drop
a LTO unit from an archive if it doesn't resolve any unresolved
symbols).  I don't know how easy that will be, but it shouldn't
be impossible at least.

>  I'm sketching a plan where collect2 invokes 'ld' as if to do an ordinary
> non-LTO link, but passes it a flag like "--lto-assist" which causes it to
> output a list of just the archive members that it actually needs to complete
> the link, in response file "filenam...@offset" format.  ISTM that this is the
> simplest method to avoid recompiling entire archives (sort of building a
> linker into the compiler!), and I guess I should also make it check for an LTO
> marker (whether symbol or section) and only output those members that actually
> contain any LTO data.

Yes - that would be basically a linker plugin without plugin support.
And I'd go even further and have LD provide a complete symbol
resolution set like we get from the gold linker-plugin.

That wouldn't help for old or non-gnu LDs of course.

>  Making lto1 understand archives seems logical at first, but I don't think
> it's much use without knowing which archive members we want in advance, and in
> that case the existing code that reads a single archive member by pretending
> it's an ordinary object file with a constant offset from the start of file
> marker already does all we need, or so it seems to me.

I think we should try without lto1 understanding archives first
(or we are basically re-implementing a linker in lto1).

Richard.

>    cheers,
>      DaveK
>
>

Re: LTO question

2010-04-29 Thread Jan Hubicka

> 2010/4/29 Jan Hubicka :
> >> > On 4/28/10 10:26 , Manuel López-Ibá?ez wrote:
> >> >  Not yet, I mistakenly thought -fwhole-program is the same as -fwhopr
> >> >  and it is just for solving scaling issue of large program.(These two
> >> >  options do look similar :-). I shall try next.
> >> > >>>
> >> > >>> Yep, -fwhopr is not ideal name, but I guess there is not much
> >> > >>> to do about it.
> >> > >
> >> > > It is marked as experimental, so if it is going to stay for GCC 4.6,
> >> > > then we should change the name. I think one possibility discussed
> >> > > somewhere is that LTO scales back automatically, so the option would
> >> > > be not necessary.
> >> >
> >> > Yes.  I think we should just keep -flto and make it use split
> >> > compilation if needed.  -fwhopr is only needed to explicitly enable it.
> >> >  My suggestion is to just keep -flto and invoke whopr with -flto=split
> >> > or -flto=big (until the automatic threshold is added).
> >>
> >> Yep, I like this idea too.  I hope to be able to drop "experimental" status
> >> from mainline whopr soonish (basically I need to implement references and 
> >> then
> >> I will burn a lot of time fixing how clones are streamed to enable ipa-cp).
> >
> > And do something about paralelizing the whopr build.  I guess it means 
> > storing
> > ltrans partition list into file and making collect2 to execute compilation
> > and re-invent the Makefile code?
> > It would be great if someone took look at this, I am not at all familiar 
> > with that
> > code and in a way I would preffer it to stay that way ;))
> 
> I will look at moving the LTRANS driving to the driver, it should be
> easy to do parallel execs from it and hopefully make debugging
> WPA/LTRANS less of a headache.

That would be great, thanks!
It would be also great to get the parallel build working, but I guess it can be 
done
inrementally.

One problem is that we output .o files via assembly.  We produce a lot of 
temporary
data and producing all temporary .s files and processing them alter from collect
will increase memory use etc...
So probably doing as from wpa itself is sort of needed to avoid this bottleneck.

Honza
> 
> Richard.
> 
> > Honza
> >

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Steven Bosscher

On Thu, Apr 29, 2010 at 10:57 AM, Richard Guenther
 wrote:
> Yes - that would be basically a linker plugin without plugin support.
> And I'd go even further and have LD provide a complete symbol
> resolution set like we get from the gold linker-plugin.
>
> That wouldn't help for old or non-gnu LDs of course.

Right. The way this seems to be going, we're looking at LTO support
for archives only for targets where GNU binutils is used. But what are
the alternatives? You have to somehow know what symbols you want to
extract from an archive, without implementing ld again.

What would be helpful, is when things get set up in such a way that
binutils ld is just one tool that can give you this resolution file,
but leave the option open to call another tool. That would allow us to
write a special tool for targets without binutils. I'm thinking of
course of my latest pet project, LTO for Mach-O. There is no working
Mach-O linker in binutils (or at least it's not the standard ld) but
it may be possible to just write a separate tool for Mach-O that
generates the resolution file.

Users would still need to install the extra tool, but at least it
would be possible to make things work.

Ciao!
Steven

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Richard Guenther

On Thu, Apr 29, 2010 at 11:19 AM, Steven Bosscher  wrote:
> On Thu, Apr 29, 2010 at 10:57 AM, Richard Guenther
>  wrote:
>> Yes - that would be basically a linker plugin without plugin support.
>> And I'd go even further and have LD provide a complete symbol
>> resolution set like we get from the gold linker-plugin.
>>
>> That wouldn't help for old or non-gnu LDs of course.
>
> Right. The way this seems to be going, we're looking at LTO support
> for archives only for targets where GNU binutils is used. But what are
> the alternatives? You have to somehow know what symbols you want to
> extract from an archive, without implementing ld again.
>
> What would be helpful, is when things get set up in such a way that
> binutils ld is just one tool that can give you this resolution file,
> but leave the option open to call another tool. That would allow us to
> write a special tool for targets without binutils. I'm thinking of
> course of my latest pet project, LTO for Mach-O. There is no working
> Mach-O linker in binutils (or at least it's not the standard ld) but
> it may be possible to just write a separate tool for Mach-O that
> generates the resolution file.
>
> Users would still need to install the extra tool, but at least it
> would be possible to make things work.

Indeed.  That extra tool is usually collect2 though (so if you
can write such tool in a very portable way ...)

Richard.

> Ciao!
> Steven
>

Re: pattern "s_" not used when generating rtl for float comparison on mips?

2010-04-29 Thread Amker.Cheng

> Indeed, looking at GCC 4.5 there's no cstore expander for floating-point
> variables.  Maybe you can make a patch! :-)
>
yes, it seems gcc always generates set/compare/jump/set sequence,
then optimizes it out in if-convert pass. Maybe it was left behind by
early mips1, which has no conditional move instructions.

it is some kinda related with my current work, I'll try to see if I could
help with it after more study.

Thanks.

-- 
Best Regards.

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Jan Hubicka

> Well, we'd then need to re-architect the symbol merging and
> LTO unit read-in to properly honor linking semantics (drop
> a LTO unit from an archive if it doesn't resolve any unresolved
> symbols).  I don't know how easy that will be, but it shouldn't
> be impossible at least.

We also should keep in mind that we really ought to be able to produce LTO .o 
files
without actual assembly in, so either we should not tie this too much with 
linking
process anyway or we need to output fake symbols into the LTO .o file when 
assembly
is not done.
(I guess one can just output empty variables and functions, but then .o will 
link
without LTO and lead to wrong code).

This is IMO quite important feature, we don't want to double compile times 
forever.

Honza
> 
> >  I'm sketching a plan where collect2 invokes 'ld' as if to do an ordinary
> > non-LTO link, but passes it a flag like "--lto-assist" which causes it to
> > output a list of just the archive members that it actually needs to complete
> > the link, in response file "filenam...@offset" format.  ISTM that this is 
> > the
> > simplest method to avoid recompiling entire archives (sort of building a
> > linker into the compiler!), and I guess I should also make it check for an 
> > LTO
> > marker (whether symbol or section) and only output those members that 
> > actually
> > contain any LTO data.
> 
> Yes - that would be basically a linker plugin without plugin support.
> And I'd go even further and have LD provide a complete symbol
> resolution set like we get from the gold linker-plugin.
> 
> That wouldn't help for old or non-gnu LDs of course.
> 
> >  Making lto1 understand archives seems logical at first, but I don't think
> > it's much use without knowing which archive members we want in advance, and 
> > in
> > that case the existing code that reads a single archive member by pretending
> > it's an ordinary object file with a constant offset from the start of file
> > marker already does all we need, or so it seems to me.
> 
> I think we should try without lto1 understanding archives first
> (or we are basically re-implementing a linker in lto1).
> 
> Richard.
> 
> >    cheers,
> >      DaveK
> >
> >

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Ian Lance Taylor

Richard Guenther  writes:

> Well, we'd then need to re-architect the symbol merging and
> LTO unit read-in to properly honor linking semantics (drop
> a LTO unit from an archive if it doesn't resolve any unresolved
> symbols).  I don't know how easy that will be, but it shouldn't
> be impossible at least.

Yes, as I understand it, it was the need to replicate linking
semantics that led to the gold plugin framework in the first place.
(There is nothing stopping anybody from adding the same plugin
framework to GNU ld, by the way, it's just that nobody has done the
work.)  Of course all the linker semantics can be copied into the
compiler.  But it's moderately complex.

If you go this route, note that different object file formats have
different rules for when to include an object from an archive, so make
the code flexible.  E.g., in BFD, the code to select an object from an
archive is completely different for COFF and for ELF.

The Darwin linker supports a different plugin framework, by the way.
As far as I know it would be possible to adapt lto-plugin for that
framework.

Ian

Re: GCC 4.4.4 Release Candidate available from gcc.gnu.org

2010-04-29 Thread Rainer Orth

Jakub Jelinek  writes:

> The branch is now frozen and all checkins until after the final release
> of GCC 4.4.4 require explicit RM approval.
>
> If all goes well, I'd like to release 4.4.4 next week.

I've got a couple of patches I might backport to the 4.4 branch after it
is unfrozen again, but this would make sense only if there are concrete
plans for another release after 4.4.4.  Any word on this?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

GCC viewcvs issue

2010-04-29 Thread Jie Zhang


This URL

http://gcc.gnu.org/viewcvs/branches/gcc-4_4-branch/gcc/tree-ssa-alias.c?annotate=155646

which tries to annotate the latest revision of tree-ssa-alias.c on 4.4 
branch gives


An Exception Has Occurred
Python Traceback

Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/viewvc/lib/viewvc.py", line 
4317, in main

request.run_viewvc()
  File "/usr/lib/python2.3/site-packages/viewvc/lib/viewvc.py", line 
397, in run_viewvc

self.view_func(self)
  File "/usr/lib/python2.3/site-packages/viewvc/lib/viewvc.py", line 
1769, in view_annotate

markup_or_annotate(request, 1)
  File "/usr/lib/python2.3/site-packages/viewvc/lib/viewvc.py", line 
1696, in markup_or_annotate

path[-1], mime_type)
  File "/usr/lib/python2.3/site-packages/viewvc/lib/viewvc.py", line 
1589, in markup_stream_pygments

encoding='utf-8'), ps)
  File "/usr/lib/python2.3/site-packages/pygments/__init__.py", line 
85, in highlight

return format(lex(code, lexer), formatter, outfile)
  File "/usr/lib/python2.3/site-packages/pygments/__init__.py", line 
68, in format

formatter.format(tokens, outfile)
  File "/usr/lib/python2.3/site-packages/pygments/formatter.py", line 
92, in format

return self.format_unencoded(tokensource, outfile)
  File "/usr/lib/python2.3/site-packages/pygments/formatters/html.py", 
line 704, in format_unencoded

for t, piece in source:
  File "/usr/lib/python2.3/site-packages/pygments/formatters/html.py", 
line 611, in _format_lines

for ttype, value in tokensource:
  File "/usr/lib/python2.3/site-packages/pygments/lexer.py", line 162, 
in streamer

for i, t, v in self.get_tokens_unprocessed(text):
  File "/usr/lib/python2.3/site-packages/pygments/lexers/compiled.py", 
line 155, in get_tokens_unprocessed

for index, token, value in \
  File "/usr/lib/python2.3/site-packages/pygments/lexer.py", line 479, 
in get_tokens_unprocessed

m = rexmatch(text, pos)
RuntimeError: maximum recursion limit exceeded


Similar issue for 4.3 branch. trunk, 4.2 and 4.1 are OK.


Regards,
--
Jie Zhang
CodeSourcery
(650) 331-3385 x735

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Richard Guenther

2010/4/29 Jan Hubicka :
>> Well, we'd then need to re-architect the symbol merging and
>> LTO unit read-in to properly honor linking semantics (drop
>> a LTO unit from an archive if it doesn't resolve any unresolved
>> symbols).  I don't know how easy that will be, but it shouldn't
>> be impossible at least.
>
> We also should keep in mind that we really ought to be able to produce LTO .o 
> files
> without actual assembly in, so either we should not tie this too much with 
> linking
> process anyway or we need to output fake symbols into the LTO .o file when 
> assembly
> is not done.
> (I guess one can just output empty variables and functions, but then .o will 
> link
> without LTO and lead to wrong code).
>
> This is IMO quite important feature, we don't want to double compile times 
> forever.

Well, what we should do anyway is short-cut compilation after
LTO bytecode output and go directly to expansion.  Otherwise
we risk to have different sets of symbols with the intermediate
object files and thus symbol resultion with GOLD does not
work compeltely (there are bugs about this already).

So compile-time wouldn't be _that_ bad (just RTL opts and
assembling).

Richard.

> Honza
>>
>> >  I'm sketching a plan where collect2 invokes 'ld' as if to do an ordinary
>> > non-LTO link, but passes it a flag like "--lto-assist" which causes it to
>> > output a list of just the archive members that it actually needs to 
>> > complete
>> > the link, in response file "filenam...@offset" format.  ISTM that this is 
>> > the
>> > simplest method to avoid recompiling entire archives (sort of building a
>> > linker into the compiler!), and I guess I should also make it check for an 
>> > LTO
>> > marker (whether symbol or section) and only output those members that 
>> > actually
>> > contain any LTO data.
>>
>> Yes - that would be basically a linker plugin without plugin support.
>> And I'd go even further and have LD provide a complete symbol
>> resolution set like we get from the gold linker-plugin.
>>
>> That wouldn't help for old or non-gnu LDs of course.
>>
>> >  Making lto1 understand archives seems logical at first, but I don't think
>> > it's much use without knowing which archive members we want in advance, 
>> > and in
>> > that case the existing code that reads a single archive member by 
>> > pretending
>> > it's an ordinary object file with a constant offset from the start of file
>> > marker already does all we need, or so it seems to me.
>>
>> I think we should try without lto1 understanding archives first
>> (or we are basically re-implementing a linker in lto1).
>>
>> Richard.
>>
>> >    cheers,
>> >      DaveK
>> >
>> >
>

Re: LTO vs static library archives [was Re: lto1: internal compiler error: in lto_symtab_merge_decls_1, at lto-symtab.c:549]

2010-04-29 Thread Jan Hubicka

> 2010/4/29 Jan Hubicka :
> >> Well, we'd then need to re-architect the symbol merging and
> >> LTO unit read-in to properly honor linking semantics (drop
> >> a LTO unit from an archive if it doesn't resolve any unresolved
> >> symbols).  I don't know how easy that will be, but it shouldn't
> >> be impossible at least.
> >
> > We also should keep in mind that we really ought to be able to produce LTO 
> > .o files
> > without actual assembly in, so either we should not tie this too much with 
> > linking
> > process anyway or we need to output fake symbols into the LTO .o file when 
> > assembly
> > is not done.
> > (I guess one can just output empty variables and functions, but then .o 
> > will link
> > without LTO and lead to wrong code).
> >
> > This is IMO quite important feature, we don't want to double compile times 
> > forever.
> 
> Well, what we should do anyway is short-cut compilation after
> LTO bytecode output and go directly to expansion.  Otherwise
> we risk to have different sets of symbols with the intermediate
> object files and thus symbol resultion with GOLD does not
> work compeltely (there are bugs about this already).
> 
> So compile-time wouldn't be _that_ bad (just RTL opts and
> assembling).

Well, but in that case the resulting assembly would be completely useless 
anyway.
RTL opts and expansion play important factor, so it would be nice to avoid 
those.

Honza

gcc 4.5.0 with Graphite - building & testing

2010-04-29 Thread Solar Designer

Hi,

I wrote a lengthy wiki page with step-by-step instructions and demos on
building/installing/using gcc 4.5.0 with Graphite under a non-root user
account (my test system only had gcc 3.4.5 installed).  I think those
instructions could be useful to some users of gcc, so feel free to copy
them or link to them.  At least I haven't seen similarly straightforward
yet very specific and complete instructions elsewhere.  The wiki page is:

http://openwall.info/wiki/internal/gcc-local-build

This wiki page also shows the following (as it goes through the
parallelization demos):

I encountered several shortcomings of gcc while testing its
parallelization capabilities.  This includes unreasonable non-use of
SSE2 with unsigned ints (whereas SSE2 is used just fine with signed
ones), and unreasonable non-parallelization when the loop index variable
is declared unsigned (auto-parallelization of the same loop does occur
when the index variable is signed).

I do not file bug reports for these just yet because I do not know what
the intended behavior is - maybe there is in fact some reason (unknown
to me) for avoiding SSE2 and parallelization for unsigned ints.

Also, I was not able to trigger an equivalent of OpenMP reduction
without in fact using an OpenMP directive.  Maybe this is as intended
for the time being, although perhaps this is an area for improvement.
Without this, auto-parallelization appears to be of little practical
use, because most (all?) loops that do get parallelized involve writes,
which, in my testing, almost implies that either their data does not fit
in cache or they incur cache coherence overhead.

I am not complaining (in fact, I am happy to see gcc gain these
features), nor am I asking for any help with this.  I am merely sharing
my findings in hope that someone will find this feedback useful. :-)

Thanks,

Alexander

Re: LTO question

2010-04-29 Thread Xinliang David Li

Just curious, what is the base line size of your comparison? Did you
turn on GC (-ffunction-sections -fdata-sections -Wl,--gc-sections)?

David

On Wed, Apr 28, 2010 at 2:44 AM, Bingfeng Mei  wrote:
> Thanks, I will check what I can do with collect2. LTO
> seems to save 6-9% code size for applications I tested
> and should be very useful for us.
>
> Bingfeng
>
>> -Original Message-
>> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> Sent: 28 April 2010 10:33
>> To: Bingfeng Mei
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: LTO question
>>
>> On Tue, Apr 27, 2010 at 6:30 PM, Bingfeng Mei
>>  wrote:
>> > Hello,
>> > I have been playing with LTO. I notice that LTO doesn't work when
>> > object files are achived into static library files and the final
>> > binary is linked against them, although these object files
>> are compiled
>> > with -flto and I can see all the lto related sections in .a files.
>> > Is this what is described in LTO Wiki page?
>> >
>> > "As an added feature, LTO will take advantage of the plugin feature
>> > in gold. This allows the compiler to pick up object files that may
>> > have been stored in library archives. "
>> >
>> > So do I have to use gold to solve this issue?
>>
>> Yes.  Or you fix collect2 to do processing of archives and hand
>> lto1 the required information (it expects archive components
>> with LTO bytecode like archiv...@offset with offset being the
>> offset of the .o file with LTO bytecode inside the archive).  See
>> lto/lto-elf.c:lto_obj_file_open for "details".
>>
>> Richard.
>>
>> > Many thanks,
>> > Bingfeng
>> >
>>
>>
>

GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Vladimir Makarov


 GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases and LLVM-2.7.

 Even benchmarking SPEC2000 takes a lot of time on the fastest
machine I have. So I don't plan to use SPEC2006 for this in near
future.

 You can find the comparison on
http://vmakarov.fedorapeople.org/spec/ (please just click links at the
bottom of the left frame starting with link "GCC release comparison").

 If you need exact numbers, please use the tables (the links to them
are also given) which were used to generate the corresponding bar
graphs.


 In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
first considerable compilation speed improvement since GCC-4.2.
GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
SPECFP2000 in -O2 mode) code too in comparison with the previous
release.  That is not including LTO and Graphite which can gives even
more (especially LTO) in many cases.

 GCC-4.5.0 has new big optimizations LTO and Graphite (more
accurately graphite was introduced in the previous release).
Therefore I ran additional benchmarks to test them.

 LTO is a promising technology especially for integer benchmarks for
which it results in smaller and faster code.  But it might result in
degradations too on SPECFP2000 mainly because of big degradations on a
few benchmarks like wupwise or facerec.  Another annoying thing about
LTO, it considerably slows down the compiler.

 Currently Graphite gives small improvements on x86 (one exception is
2% for peak x86 SPECFP2000) and mostly degradation on x86_64 (with
maximum one more than 10% for SPECFP2000 because of big degradations
on mgrid and swim).  So further work is needed on the project because
it seems not mature yet.

 As for LLVM, LLVM became slower (e.g. in comparison with llvm-2.5 on
15%-50% for x86-64).  So the gap between compilation speed of GCC and
LLVM decreased and sometimes achieves 4% on x86_64 and 8% on x86 (both
for SPECInt2000 in -O2 mode).  May be I am wrong but I don't think
CLANG will improve this situation significantly (in -O2 and -O3 mode)
because optimizations still take most of time of any serious
optimizing compiler.

 LLVM did a progress in code performance especially for floating
point benchmarks.  But the gap between LLVM-2.7 and GCC-4.5 in peak
performance (not including GCC LTO and Graphite) still 6-7% on
SPECInt200 and 13-17% on SPECFP2000.

 In general, IMHO GCC-4.5.0 is a good and promising release.

RE: LTO question

2010-04-29 Thread Bingfeng Mei

I turned on -ffunction-sections and compiled with -Os. 
The size gain at -O2 is less though.

Bingfeng

> -Original Message-
> From: Xinliang David Li [mailto:davi...@google.com] 
> Sent: 29 April 2010 17:17
> To: Bingfeng Mei
> Cc: Richard Guenther; gcc@gcc.gnu.org
> Subject: Re: LTO question
> 
> Just curious, what is the base line size of your comparison? Did you
> turn on GC (-ffunction-sections -fdata-sections -Wl,--gc-sections)?
> 
> David
> 
> On Wed, Apr 28, 2010 at 2:44 AM, Bingfeng Mei 
>  wrote:
> > Thanks, I will check what I can do with collect2. LTO
> > seems to save 6-9% code size for applications I tested
> > and should be very useful for us.
> >
> > Bingfeng
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: 28 April 2010 10:33
> >> To: Bingfeng Mei
> >> Cc: gcc@gcc.gnu.org
> >> Subject: Re: LTO question
> >>
> >> On Tue, Apr 27, 2010 at 6:30 PM, Bingfeng Mei
> >>  wrote:
> >> > Hello,
> >> > I have been playing with LTO. I notice that LTO doesn't work when
> >> > object files are achived into static library files and the final
> >> > binary is linked against them, although these object files
> >> are compiled
> >> > with -flto and I can see all the lto related sections in 
> .a files.
> >> > Is this what is described in LTO Wiki page?
> >> >
> >> > "As an added feature, LTO will take advantage of the 
> plugin feature
> >> > in gold. This allows the compiler to pick up object 
> files that may
> >> > have been stored in library archives. "
> >> >
> >> > So do I have to use gold to solve this issue?
> >>
> >> Yes.  Or you fix collect2 to do processing of archives and hand
> >> lto1 the required information (it expects archive components
> >> with LTO bytecode like archiv...@offset with offset being the
> >> offset of the .o file with LTO bytecode inside the archive).  See
> >> lto/lto-elf.c:lto_obj_file_open for "details".
> >>
> >> Richard.
> >>
> >> > Many thanks,
> >> > Bingfeng
> >> >
> >>
> >>
> >
> 
>

Re: LTO question

2010-04-29 Thread Xinliang David Li

On Thu, Apr 29, 2010 at 9:28 AM, Bingfeng Mei  wrote:
> I turned on -ffunction-sections and compiled with -Os.
> The size gain at -O2 is less though.

Interesting.

Thanks,

David
>
> Bingfeng
>
>> -Original Message-
>> From: Xinliang David Li [mailto:davi...@google.com]
>> Sent: 29 April 2010 17:17
>> To: Bingfeng Mei
>> Cc: Richard Guenther; gcc@gcc.gnu.org
>> Subject: Re: LTO question
>>
>> Just curious, what is the base line size of your comparison? Did you
>> turn on GC (-ffunction-sections -fdata-sections -Wl,--gc-sections)?
>>
>> David
>>
>> On Wed, Apr 28, 2010 at 2:44 AM, Bingfeng Mei
>>  wrote:
>> > Thanks, I will check what I can do with collect2. LTO
>> > seems to save 6-9% code size for applications I tested
>> > and should be very useful for us.
>> >
>> > Bingfeng
>> >
>> >> -Original Message-
>> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> >> Sent: 28 April 2010 10:33
>> >> To: Bingfeng Mei
>> >> Cc: gcc@gcc.gnu.org
>> >> Subject: Re: LTO question
>> >>
>> >> On Tue, Apr 27, 2010 at 6:30 PM, Bingfeng Mei
>> >>  wrote:
>> >> > Hello,
>> >> > I have been playing with LTO. I notice that LTO doesn't work when
>> >> > object files are achived into static library files and the final
>> >> > binary is linked against them, although these object files
>> >> are compiled
>> >> > with -flto and I can see all the lto related sections in
>> .a files.
>> >> > Is this what is described in LTO Wiki page?
>> >> >
>> >> > "As an added feature, LTO will take advantage of the
>> plugin feature
>> >> > in gold. This allows the compiler to pick up object
>> files that may
>> >> > have been stored in library archives. "
>> >> >
>> >> > So do I have to use gold to solve this issue?
>> >>
>> >> Yes.  Or you fix collect2 to do processing of archives and hand
>> >> lto1 the required information (it expects archive components
>> >> with LTO bytecode like archiv...@offset with offset being the
>> >> offset of the .o file with LTO bytecode inside the archive).  See
>> >> lto/lto-elf.c:lto_obj_file_open for "details".
>> >>
>> >> Richard.
>> >>
>> >> > Many thanks,
>> >> > Bingfeng
>> >> >
>> >>
>> >>
>> >
>>
>>
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jan Hubicka

>  GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
> where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
> for x86/x86-64 and posted the comparison of it with the
> previous GCC releases and LLVM-2.7.
>
>  Even benchmarking SPEC2000 takes a lot of time on the fastest
> machine I have. So I don't plan to use SPEC2006 for this in near
> future.
>
>  You can find the comparison on
> http://vmakarov.fedorapeople.org/spec/ (please just click links at the
> bottom of the left frame starting with link "GCC release comparison").
>
>  If you need exact numbers, please use the tables (the links to them
> are also given) which were used to generate the corresponding bar
> graphs.
>
>
>  In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
> first considerable compilation speed improvement since GCC-4.2.
> GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
> SPECFP2000 in -O2 mode) code too in comparison with the previous
> release.  That is not including LTO and Graphite which can gives even
> more (especially LTO) in many cases.
>
>  GCC-4.5.0 has new big optimizations LTO and Graphite (more
> accurately graphite was introduced in the previous release).
> Therefore I ran additional benchmarks to test them.
>
>  LTO is a promising technology especially for integer benchmarks for
> which it results in smaller and faster code.  But it might result in
> degradations too on SPECFP2000 mainly because of big degradations on a
> few benchmarks like wupwise or facerec.  Another annoying thing about
> LTO, it considerably slows down the compiler.

Seems like something sensitive for setup.  In our daily benchmarking LTO
fatster on wupwise (2116 compared to 1600), and facerec is 2003 compared to
2041 (so about the same).

http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ai-64/list.html
http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ipa-64/list.html

Did you test with -fwhole-program?

Honza

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Vladimir Makarov


Jan Hubicka wrote:

 GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases and LLVM-2.7.

 Even benchmarking SPEC2000 takes a lot of time on the fastest
machine I have. So I don't plan to use SPEC2006 for this in near
future.

 You can find the comparison on
http://vmakarov.fedorapeople.org/spec/ (please just click links at the
bottom of the left frame starting with link "GCC release comparison").

 If you need exact numbers, please use the tables (the links to them
are also given) which were used to generate the corresponding bar
graphs.


 In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
first considerable compilation speed improvement since GCC-4.2.
GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
SPECFP2000 in -O2 mode) code too in comparison with the previous
release.  That is not including LTO and Graphite which can gives even
more (especially LTO) in many cases.

 GCC-4.5.0 has new big optimizations LTO and Graphite (more
accurately graphite was introduced in the previous release).
Therefore I ran additional benchmarks to test them.

 LTO is a promising technology especially for integer benchmarks for
which it results in smaller and faster code.  But it might result in
degradations too on SPECFP2000 mainly because of big degradations on a
few benchmarks like wupwise or facerec.  Another annoying thing about
LTO, it considerably slows down the compiler.



Seems like something sensitive for setup.  In our daily benchmarking LTO
fatster on wupwise (2116 compared to 1600), and facerec is 2003 compared to
2041 (so about the same).

http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ai-64/list.html
http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ipa-64/list.html

Did you test with -fwhole-program?
  
Yes, I used -flto -fwhole-program.  All this info is on the page.  The 
test machine are also not experimental ones (the both are Dell machines).


I used the released sources may be a reason for the difference is in 
different sources.  In any case, I'll check the current trunk on these 
machines.

GIMPLE Front End (GSOC 2010)

2010-04-29 Thread Sandeep Soni

-- Forwarded message --
From: Diego Novillo 
Date: Thu, Apr 29, 2010 at 7:05 PM
Subject: Re: [LTO] Open items in the ToDo list
To: Sandeep Soni 
Cc: Andi Hellmund 

Thanks Sandeep.  Could you please add your proposal to the GCC wiki?
Creating a new wiki page is fairly easy.  You add your new page and
then link it from the main page, put it somewhere in the index for
'Current projects' (keep it alphabetized, please).  You can copy the
format from another existing project page.

Thanks.  Diego.

Hi,

I added the following page to the wiki.

http://gcc.gnu.org/wiki/GimpleFrontEnd

Any comments/suggestions or ideas related to the project are welcome.

Thanks.

--
cheers
sandy

-- 
Cheers
Sandy

Problem with SSA form usign cgraph_nodes and push_cfun

2010-04-29 Thread Massimo Nazaria

Hi everybody!
I am working on a gcc-pass which processes every statement using this code:

  for (node = cgraph_nodes; node; node = node->next)
{
  if (node->analyzed && cgraph_is_master_clone (node))
{
  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
  
  FOR_EACH_BB (bb)
{
// Here I would like to use SSA_NAME_DEF_STMT (gimple_assign_rhs1 
(stmt));


With the code above I can't use SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt)) 
(I get 'segmentation fault'). I think the reason is that the statements are not 
in SSA-form.

Insead, if I use

  FOR_EACH_BB (bb) { ... }

without using "for (node = graph_nodes; ...", I have the statements in SSA-form 
and SSA_NAME_DEF_STMT is OK.

Unfortunately, with this solution, I can not process every function at once...

How can I do to use SSA_NAME_DEF_STMT while processing every function?

Thank you!

Max

Re: GIMPLE Front End (GSOC 2010)

2010-04-29 Thread Manuel López-Ibáñez

On 29 April 2010 19:25, Sandeep Soni  wrote:
> I added the following page to the wiki.
>
> http://gcc.gnu.org/wiki/GimpleFrontEnd
>
> Any comments/suggestions or ideas related to the project are welcome.

Hi Sandy,

It may be helpful to take a look to wiki pages of previous SoC
projects, such as
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings, for
formatting/structure ideas.

Cheers,

Manuel.

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Vladimir Makarov


Vladimir Makarov wrote:

Jan Hubicka wrote:


Seems like something sensitive for setup.  In our daily benchmarking LTO
fatster on wupwise (2116 compared to 1600), and facerec is 2003 
compared to

2041 (so about the same).

http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ai-64/list.html
http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-ipa-64/list.html

Did you test with -fwhole-program?
  
Yes, I used -flto -fwhole-program.  All this info is on the page.  The 
test machine are also not experimental ones (the both are Dell machines).


I used the released sources may be a reason for the difference is in 
different sources.  In any case, I'll check the current trunk on these 
machines.




The following I got on the today trunk for x86_64 (2.93 GHz Core i7):

 wupwise
-O3   2670 
   
-O3 -flto -fwhole-program 2211

-O3 -ffast-math   2753
-O3 -flto -fwhole-program -ffast-math 4325


So nothing is wrong with my test machine.  We simply measure different 
things.  You use -ffast-math, I don't use it.


For the comparison I used simple combination of options for GCC and 
LLVM.  For me it is obvious that GCC results can be improved more than 
LLVM by finding right options because it has much  more optimizations.


Still it would be nice to fix LTO SPEC2000 degradations when -ffast-math 
is not used.

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov  wrote:
>  GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
> where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
> for x86/x86-64 and posted the comparison of it with the
> previous GCC releases and LLVM-2.7.
>
>  Even benchmarking SPEC2000 takes a lot of time on the fastest
> machine I have. So I don't plan to use SPEC2006 for this in near
> future.
>
>  You can find the comparison on
> http://vmakarov.fedorapeople.org/spec/ (please just click links at the
> bottom of the left frame starting with link "GCC release comparison").
>
>  If you need exact numbers, please use the tables (the links to them
> are also given) which were used to generate the corresponding bar
> graphs.
>
>
>  In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
> first considerable compilation speed improvement since GCC-4.2.
> GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
> SPECFP2000 in -O2 mode) code too in comparison with the previous
> release.  That is not including LTO and Graphite which can gives even
> more (especially LTO) in many cases.
>
>  GCC-4.5.0 has new big optimizations LTO and Graphite (more
> accurately graphite was introduced in the previous release).
> Therefore I ran additional benchmarks to test them.
>
>  LTO is a promising technology especially for integer benchmarks for
> which it results in smaller and faster code.  But it might result in
> degradations too on SPECFP2000 mainly because of big degradations on a
> few benchmarks like wupwise or facerec.  Another annoying thing about
> LTO, it considerably slows down the compiler.


The LTO improvement on spec2000int is is only 1.86%

4.5 4.5+lto Improvement
164.gzip955 950 -0.52%   <-- degrade
175.vpr 588 594 1.02%
176.gcc 121112160.41%
181.mcf 699 698 -0.14%
186.crafty  1011987 -2.37%<--- degrade
197.parser  792 813 2.65%
252.eon 10261023-0.29%   <-- degrade
253.perlbmk 13121294-1.37%  <-- degrade
254.gap 102110371.57%
255.vortex  1123131917.45%
256.bzip2   737 768 4.21%
300.twolf   773 779 0.78%
-
SPECint2000 913 930 1.86%


This matches our previous observation that to bring the best out of
LTO, FDO is also needed. (As a reference, LIPO improves over plain FDO
by ~4.5%, vortex improves 23%).  You will probably see even smaller
improvement in SPEC2006.

It would be great if there is number collected comparing LTO + FDO vs
plain FDO in the same setup.

Thanks,

David




>
>  Currently Graphite gives small improvements on x86 (one exception is
> 2% for peak x86 SPECFP2000) and mostly degradation on x86_64 (with
> maximum one more than 10% for SPECFP2000 because of big degradations
> on mgrid and swim).  So further work is needed on the project because
> it seems not mature yet.
>
>  As for LLVM, LLVM became slower (e.g. in comparison with llvm-2.5 on
> 15%-50% for x86-64).  So the gap between compilation speed of GCC and
> LLVM decreased and sometimes achieves 4% on x86_64 and 8% on x86 (both
> for SPECInt2000 in -O2 mode).  May be I am wrong but I don't think
> CLANG will improve this situation significantly (in -O2 and -O3 mode)
> because optimizations still take most of time of any serious
> optimizing compiler.
>
>  LLVM did a progress in code performance especially for floating
> point benchmarks.  But the gap between LLVM-2.7 and GCC-4.5 in peak
> performance (not including GCC LTO and Graphite) still 6-7% on
> SPECInt200 and 13-17% on SPECFP2000.
>
>  In general, IMHO GCC-4.5.0 is a good and promising release.
>
>

going from SunOS 5/SparcWorks -> Linux/gcc

2010-04-29 Thread Brian Hill

I have an 15-year old C program (which I didn't write) that compiles and 
runs fine with SparcWorks cc on Sun SPARC with SunOS 5.10.


It compiles on CentOS 5 64-bit with gcc 4.1.2 but core dumps all over 
the place.


Switching to 32-bit compile doesn't help much.

I did as much debugging as I could, but seems to come down to liberal 
use of memory that the SparcWorks compiler accommodates that gcc doesn't 
by default.


Is there some simple compiler option or other measure I can take to 
compile/run the code with gcc?


Rather than get into the details of the code, I figured I start with 
this angle.


Thanks!

Brian

Re: going from SunOS 5/SparcWorks -> Linux/gcc

2010-04-29 Thread Steven Bosscher

On Thu, Apr 29, 2010 at 8:25 PM, Brian Hill  wrote:
> I have an 15-year old C program (which I didn't write) that compiles and
> runs fine with SparcWorks cc on Sun SPARC with SunOS 5.10.
>
> It compiles on CentOS 5 64-bit with gcc 4.1.2 but core dumps all over the
> place.
>
> Switching to 32-bit compile doesn't help much.
>
> I did as much debugging as I could, but seems to come down to liberal use of
> memory that the SparcWorks compiler accommodates that gcc doesn't by
> default.
>
> Is there some simple compiler option or other measure I can take to
> compile/run the code with gcc?
>
> Rather than get into the details of the code, I figured I start with this
> angle.

This kind of question really doesn't belong here but on gcc-help. But
while we're here...

The standard 1st questions are:
1) Did you compile with -Wall -Wextra and solve all warnings?
2) Did you try with -fno-strict-aliasing?

There is just not enough information in your question to be more helpful.

Good luck,

Ciao!
Steven

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Vladimir Makarov


Xinliang David Li wrote:

On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov  wrote:
  

 GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases and LLVM-2.7.

 Even benchmarking SPEC2000 takes a lot of time on the fastest
machine I have. So I don't plan to use SPEC2006 for this in near
future.

 You can find the comparison on
http://vmakarov.fedorapeople.org/spec/ (please just click links at the
bottom of the left frame starting with link "GCC release comparison").

 If you need exact numbers, please use the tables (the links to them
are also given) which were used to generate the corresponding bar
graphs.


 In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
first considerable compilation speed improvement since GCC-4.2.
GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
SPECFP2000 in -O2 mode) code too in comparison with the previous
release.  That is not including LTO and Graphite which can gives even
more (especially LTO) in many cases.

 GCC-4.5.0 has new big optimizations LTO and Graphite (more
accurately graphite was introduced in the previous release).
Therefore I ran additional benchmarks to test them.

 LTO is a promising technology especially for integer benchmarks for
which it results in smaller and faster code.  But it might result in
degradations too on SPECFP2000 mainly because of big degradations on a
few benchmarks like wupwise or facerec.  Another annoying thing about
LTO, it considerably slows down the compiler.




The LTO improvement on spec2000int is is only 1.86%

4.5 4.5+lto Improvement
164.gzip955 950 -0.52%   <-- degrade
175.vpr 588 594 1.02%
176.gcc 121112160.41%
181.mcf 699 698 -0.14%
186.crafty  1011987 -2.37%<--- degrade
197.parser  792 813 2.65%
252.eon 10261023-0.29%   <-- degrade
253.perlbmk 13121294-1.37%  <-- degrade
254.gap 102110371.57%
255.vortex  1123131917.45%
256.bzip2   737 768 4.21%
300.twolf   773 779 0.78%
-
SPECint2000 913 930 1.86%


This matches our previous observation that to bring the best out of
LTO, FDO is also needed. (As a reference, LIPO improves over plain FDO
by ~4.5%, vortex improves 23%).  You will probably see even smaller
improvement in SPEC2006.

  
Thanks for the comments.  FDO will probably improve SPEC2000 score.  
Although it is not obvious for some tests because the train data sets 
for them are different from the reference data sets and it might 
actually mislead the  compiler.


FDO is important for optimizations where all possible data sets do not 
change branch probability distribution much.  IMHO therefore FDO is not 
widely used by most of developers (although I am sure that for Google 
applications it is extremely important) and therefore I don't measure it 
and it is not so interesting for me.  Although bigger reason not use FDO 
is inconvenience to use it for regular compiler user.


As for vortex FDO improvement, vortex contains a moderate size loop in 
which most of time is spent.  The loop has if-then-else on the top loop 
level.  On all SPEC2000 data sets, one if-branch  is  taken practically 
always  (like 1 to  1,000,000).   So it is not amazing for me that FDO 
gives such improvement for vortex.

It would be great if there is number collected comparing LTO + FDO vs
plain FDO in the same setup.

  
Usually after such posting the comparisons,  I am getting a lot of 
requests.  I'd like to do all of them but unfortunately running and the 
result preparation takes a lot of my time.  May be I'll do such 
comparison next year.

Re: going from SunOS 5/SparcWorks -> Linux/gcc

2010-04-29 Thread Paweł Sikora

On Thursday 29 April 2010 20:35:23 Steven Bosscher wrote:

> The standard 1st questions are:
> 1) Did you compile with -Wall -Wextra and solve all warnings?
> 2) Did you try with -fno-strict-aliasing?

for legacy code, the '-fwrapv' could be helpful.

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

>>
>
> Thanks for the comments.  FDO will probably improve SPEC2000 score.
>  Although it is not obvious for some tests because the train data sets for
> them are different from the reference data sets and it might actually
> mislead the  compiler.
>
> FDO is important for optimizations where all possible data sets do not
> change branch probability distribution much.  IMHO therefore FDO is not
> widely used by most of developers (although I am sure that for Google
> applications it is extremely important) and therefore I don't measure it and
> it is not so interesting for me.  Although bigger reason not use FDO is
> inconvenience to use it for regular compiler user.
>
> As for vortex FDO improvement, vortex contains a moderate size loop in which
> most of time is spent.  The loop has if-then-else on the top loop level.  On
> all SPEC2000 data sets, one if-branch  is  taken practically always  (like 1
> to  1,000,000).   So it is not amazing for me that FDO gives such
> improvement for vortex.

Actually what I was trying to say is that LTO will be more powerful
when combined with FDO. In other words, I expect LTO + FDO improves
over plain FDO more than 1.86%.


>>
>> It would be great if there is number collected comparing LTO + FDO vs
>> plain FDO in the same setup.
>>
>>
>
> Usually after such posting the comparisons,  I am getting a lot of requests.
>  I'd like to do all of them but unfortunately running and the result
> preparation takes a lot of my time.  May be I'll do such comparison next
> year.

Ok. Another comment is that using SPEC2000 for performance testing
won't be indicative of today's real world program size. Even
SPEC2006's largest C++ programs are not that big.

Thanks,

David

>
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Vladimir Makarov


Xinliang David Li wrote:

Thanks for the comments.  FDO will probably improve SPEC2000 score.
 Although it is not obvious for some tests because the train data sets for
them are different from the reference data sets and it might actually
mislead the  compiler.

FDO is important for optimizations where all possible data sets do not
change branch probability distribution much.  IMHO therefore FDO is not
widely used by most of developers (although I am sure that for Google
applications it is extremely important) and therefore I don't measure it and
it is not so interesting for me.  Although bigger reason not use FDO is
inconvenience to use it for regular compiler user.

As for vortex FDO improvement, vortex contains a moderate size loop in which
most of time is spent.  The loop has if-then-else on the top loop level.  On
all SPEC2000 data sets, one if-branch  is  taken practically always  (like 1
to  1,000,000).   So it is not amazing for me that FDO gives such
improvement for vortex.



Actually what I was trying to say is that LTO will be more powerful
when combined with FDO. In other words, I expect LTO + FDO improves
over plain FDO more than 1.86%.


  

It would be great if there is number collected comparing LTO + FDO vs
plain FDO in the same setup.


  

Usually after such posting the comparisons,  I am getting a lot of requests.
 I'd like to do all of them but unfortunately running and the result
preparation takes a lot of my time.  May be I'll do such comparison next
year.



Ok. Another comment is that using SPEC2000 for performance testing
won't be indicative of today's real world program size. Even
SPEC2006's largest C++ programs are not that big.


  
It is very subjective what is today's real world program size.  Usually 
it reflects what you are working on.  I understand that Google 
applications are huge and their speed is important for saving money (or 
energy) for their employees.   Firefox is  big enough but for regular 
desktop user 1% improvement may be invisible or not important if it is 
already fast enough.


A math-physics program can be small but its speed may be really 
important because it takes hours or days on fast machine.   Even big and 
intensively used applications like some logistic system can  have small  
program parts (e.g. ILP  solver or compression algorithms like gzip for 
speeding Internet communication up) whose optimization are the most 
important for the application and SPEC contains such 
calculation-intensive code (a lot of NP-complete task solvers and math 
physics programs).  So I would not say using SPEC for performance 
testing is not important for improving today's real world  size 
program.  Of course it is not so important than testing the program you 
are working on.  In order words, this program is most important 
benchmark for you but probably not for others.


As for me, GCC itself is very important program and SPEC contains it 
(2000 old one version and 2006 more recent one).  So SPEC is pretty 
important and good  for me (not perfect of course at least because it is 
not free) although it is not the single one which I care of.

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

Point well put. The benchmark suite should have good mixture of
programs with different sizes. SPEC2k programs cluster at the lower
end of the spectrum though.

David

On Thu, Apr 29, 2010 at 12:43 PM, Vladimir Makarov  wrote:
> Xinliang David Li wrote:
>>>
>>> Thanks for the comments.  FDO will probably improve SPEC2000 score.
>>>  Although it is not obvious for some tests because the train data sets
>>> for
>>> them are different from the reference data sets and it might actually
>>> mislead the  compiler.
>>>
>>> FDO is important for optimizations where all possible data sets do not
>>> change branch probability distribution much.  IMHO therefore FDO is not
>>> widely used by most of developers (although I am sure that for Google
>>> applications it is extremely important) and therefore I don't measure it
>>> and
>>> it is not so interesting for me.  Although bigger reason not use FDO is
>>> inconvenience to use it for regular compiler user.
>>>
>>> As for vortex FDO improvement, vortex contains a moderate size loop in
>>> which
>>> most of time is spent.  The loop has if-then-else on the top loop level.
>>>  On
>>> all SPEC2000 data sets, one if-branch  is  taken practically always
>>>  (like 1
>>> to  1,000,000).   So it is not amazing for me that FDO gives such
>>> improvement for vortex.
>>>
>>
>> Actually what I was trying to say is that LTO will be more powerful
>> when combined with FDO. In other words, I expect LTO + FDO improves
>> over plain FDO more than 1.86%.
>>
>>
>>

 It would be great if there is number collected comparing LTO + FDO vs
 plain FDO in the same setup.



>>>
>>> Usually after such posting the comparisons,  I am getting a lot of
>>> requests.
>>>  I'd like to do all of them but unfortunately running and the result
>>> preparation takes a lot of my time.  May be I'll do such comparison next
>>> year.
>>>
>>
>> Ok. Another comment is that using SPEC2000 for performance testing
>> won't be indicative of today's real world program size. Even
>> SPEC2006's largest C++ programs are not that big.
>>
>>
>>
>
> It is very subjective what is today's real world program size.  Usually it
> reflects what you are working on.  I understand that Google applications are
> huge and their speed is important for saving money (or energy) for their
> employees.   Firefox is  big enough but for regular desktop user 1%
> improvement may be invisible or not important if it is already fast enough.
>
> A math-physics program can be small but its speed may be really important
> because it takes hours or days on fast machine.   Even big and intensively
> used applications like some logistic system can  have small  program parts
> (e.g. ILP  solver or compression algorithms like gzip for speeding Internet
> communication up) whose optimization are the most important for the
> application and SPEC contains such calculation-intensive code (a lot of
> NP-complete task solvers and math physics programs).  So I would not say
> using SPEC for performance testing is not important for improving today's
> real world  size program.  Of course it is not so important than testing the
> program you are working on.  In order words, this program is most important
> benchmark for you but probably not for others.
>
> As for me, GCC itself is very important program and SPEC contains it (2000
> old one version and 2006 more recent one).  So SPEC is pretty important and
> good  for me (not perfect of course at least because it is not free)
> although it is not the single one which I care of.
>
>
>

ARM Neon Tests Failing on non-Neon Target

2010-04-29 Thread Joel Sherrill


Hi,

I am looking at the arm-rtems test results for the
trunk and noticing that most failures appear to be
for neon tests.

[j...@rtbf64a gcc]$ grep ^FAIL gcc.log.sent  | wc -l
2203
[j...@rtbf64a gcc]$ grep ^FAIL gcc.log.sent | grep /neon/  | wc -l
1986

http://gcc.gnu.org/ml/gcc-testresults/2010-04/msg02780.html

I see that there is an arm_neon_ok check in lib/target-supports.exp
but it only checks that neon code compiles.  It will compile but the
target can't run it.

Any ideas other than a "no_neon" setting in the board file?

--
Joel Sherrill, Ph.D. Director of Research&  Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
   Support Available (256) 722-9985

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jan Hubicka

> Thanks for the comments.  FDO will probably improve SPEC2000 score.   
> Although it is not obvious for some tests because the train data sets  
> for them are different from the reference data sets and it might  
> actually mislead the  compiler.

There are several studies on the topic and it is not that bad in practice.
In wast majority of cases even pretty bad training runs gets significant
portion of improvement you can get from training on the final benchmark
data.  In SPEC case FDO improves pretty much all benchmarks.

I think the FDO is relatively little used because it is relatively hard
to use (i.e. user has to modify makefiles and learn how the feature works)
and also because there is very little support for it (i.e. in automake and such)
> As for vortex FDO improvement, vortex contains a moderate size loop in  
> which most of time is spent.  The loop has if-then-else on the top loop  
> level.  On all SPEC2000 data sets, one if-branch  is  taken practically  
> always  (like 1 to  1,000,000).   So it is not amazing for me that FDO  
> gives such improvement for vortex.

It would be interesting to know if same improvement happens with LTO and if
not what LIPO does.  I will unbreak vortex on our tester.

Honza

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jan Hubicka


BTW we are also tracking SPEC2k6 with and without LTO (not FDO runs)
http://gcc.opensuse.org/SPEC/CINT/sb-barbella.suse.de-ai-64/recent.html
http://gcc.opensuse.org/SPEC/CINT/sb-barbella.suse.de-head-64-2006/recent.html

not all 2k6 tests pass with LTO so it will need a bit care to compare results.

Honza
> 
> Honza

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka  wrote:
>> Thanks for the comments.  FDO will probably improve SPEC2000 score.
>> Although it is not obvious for some tests because the train data sets
>> for them are different from the reference data sets and it might
>> actually mislead the  compiler.
>
> There are several studies on the topic and it is not that bad in practice.
> In wast majority of cases even pretty bad training runs gets significant
> portion of improvement you can get from training on the final benchmark
> data.  In SPEC case FDO improves pretty much all benchmarks.

Agree.

>
> I think the FDO is relatively little used because it is relatively hard
> to use (i.e. user has to modify makefiles and learn how the feature works)
> and also because there is very little support for it (i.e. in automake and 
> such)
>> As for vortex FDO improvement, vortex contains a moderate size loop in
>> which most of time is spent.  The loop has if-then-else on the top loop
>> level.  On all SPEC2000 data sets, one if-branch  is  taken practically
>> always  (like 1 to  1,000,000).   So it is not amazing for me that FDO
>> gives such improvement for vortex.
>
> It would be interesting to know if same improvement happens with LTO and if
> not what LIPO does.  I will unbreak vortex on our tester.
>

Vortex needs -fno-strict-aliasing.  It casts between two record types
with one record being a 'prefix' of another.

David



> Honza
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

I noticed eon's peak options do not include FDO, is that intended?

David


On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka  wrote:
>> Thanks for the comments.  FDO will probably improve SPEC2000 score.
>> Although it is not obvious for some tests because the train data sets
>> for them are different from the reference data sets and it might
>> actually mislead the  compiler.
>
> There are several studies on the topic and it is not that bad in practice.
> In wast majority of cases even pretty bad training runs gets significant
> portion of improvement you can get from training on the final benchmark
> data.  In SPEC case FDO improves pretty much all benchmarks.
>
> I think the FDO is relatively little used because it is relatively hard
> to use (i.e. user has to modify makefiles and learn how the feature works)
> and also because there is very little support for it (i.e. in automake and 
> such)
>> As for vortex FDO improvement, vortex contains a moderate size loop in
>> which most of time is spent.  The loop has if-then-else on the top loop
>> level.  On all SPEC2000 data sets, one if-branch  is  taken practically
>> always  (like 1 to  1,000,000).   So it is not amazing for me that FDO
>> gives such improvement for vortex.
>
> It would be interesting to know if same improvement happens with LTO and if
> not what LIPO does.  I will unbreak vortex on our tester.
>
> Honza
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Steven Bosscher

On Thu, Apr 29, 2010 at 11:27 PM, Jan Hubicka  wrote:
> It would be interesting to know if same improvement happens with LTO and if
> not what LIPO does.  I will unbreak vortex on our tester.

Perhaps you can add a LIPO tester? It looks like a very interesting
and promising approach.

Ciao!
Steven

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jan Hubicka

> I noticed eon's peak options do not include FDO, is that intended?
I think it is just bug in page header, but I will double check.
Base and peak should match otherwise.

Honza

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

Thanks for the suggestion. Raksit currently is busy with merging trunk
changes back to lw-ipo branch which can be a daunting task. After that
this can be done.  (Our internal release is based on 4.4).

David

On Thu, Apr 29, 2010 at 2:38 PM, Steven Bosscher  wrote:
> On Thu, Apr 29, 2010 at 11:27 PM, Jan Hubicka  wrote:
>> It would be interesting to know if same improvement happens with LTO and if
>> not what LIPO does.  I will unbreak vortex on our tester.
>
> Perhaps you can add a LIPO tester? It looks like a very interesting
> and promising approach.
>
> Ciao!
> Steven
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jack Howarth

On Thu, Apr 29, 2010 at 12:25:15PM -0400, Vladimir Makarov wrote:

>
>  Currently Graphite gives small improvements on x86 (one exception is
> 2% for peak x86 SPECFP2000) and mostly degradation on x86_64 (with
> maximum one more than 10% for SPECFP2000 because of big degradations
> on mgrid and swim).  So further work is needed on the project because
> it seems not mature yet.
>

Vladimir,
  Keep in mind that -fgraphite-identity currently still causes
vectorization opportunities to be missed. Once that if fixed
the higher level graphite optimizations may look alot better.
 Jack

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jan Hubicka

> Thanks for the suggestion. Raksit currently is busy with merging trunk
> changes back to lw-ipo branch which can be a daunting task. After that
> this can be done.  (Our internal release is based on 4.4).

I must say that LIPO is something I always intend to look into but didn't
seriously find time for that yet (well, hoping that submitting the thesis will
make this easier).
What are the LIPO's features that are missing in -flto -fprofile-use?

Honza
> 
> David
> 
> On Thu, Apr 29, 2010 at 2:38 PM, Steven Bosscher  
> wrote:
> > On Thu, Apr 29, 2010 at 11:27 PM, Jan Hubicka  wrote:
> >> It would be interesting to know if same improvement happens with LTO and if
> >> not what LIPO does.  I will unbreak vortex on our tester.
> >
> > Perhaps you can add a LIPO tester? It looks like a very interesting
> > and promising approach.
> >
> > Ciao!
> > Steven
> >

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Steven Bosscher

2010/4/30 Jan Hubicka :
>> Thanks for the suggestion. Raksit currently is busy with merging trunk
>> changes back to lw-ipo branch which can be a daunting task. After that
>> this can be done.  (Our internal release is based on 4.4).
>
> I must say that LIPO is something I always intend to look into but didn't
> seriously find time for that yet (well, hoping that submitting the thesis will
> make this easier).
> What are the LIPO's features that are missing in -flto -fprofile-use?

LIPO is a completely different approach, basically independent of LTO.
There is a good explanation of it on the wiki, see
http://gcc.gnu.org/wiki/LightweightIpo.

Ciao!
Steven

gcc-4.5-20100429 is now available

2010-04-29 Thread gccadmin

Snapshot gcc-4.5-20100429 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100429/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 158911

You'll find:

gcc-4.5-20100429.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100429.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100429.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100429.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100429.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100429.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100429.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100429.tar.bz2The GCC testsuite

Diffs from 4.5-20100422 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Jan Hubicka

> 2010/4/30 Jan Hubicka :
> >> Thanks for the suggestion. Raksit currently is busy with merging trunk
> >> changes back to lw-ipo branch which can be a daunting task. After that
> >> this can be done.  (Our internal release is based on 4.4).
> >
> > I must say that LIPO is something I always intend to look into but didn't
> > seriously find time for that yet (well, hoping that submitting the thesis 
> > will
> > make this easier).
> > What are the LIPO's features that are missing in -flto -fprofile-use?
> 
> LIPO is a completely different approach, basically independent of LTO.
> There is a good explanation of it on the wiki, see
> http://gcc.gnu.org/wiki/LightweightIpo.

Yep, I read that page (and saw some of implementation too).  Just was not able
to follow the precise feature set of LIPO (i.e. if it gets better SPEC results
than LTO+FDO then why)

Honza
> 
> Ciao!
> Steven

Re: ARM Neon Tests Failing on non-Neon Target

2010-04-29 Thread Joseph S. Myers

On Thu, 29 Apr 2010, Joel Sherrill wrote:

> Hi,
> 
> I am looking at the arm-rtems test results for the
> trunk and noticing that most failures appear to be
> for neon tests.
> 
> [j...@rtbf64a gcc]$ grep ^FAIL gcc.log.sent  | wc -l
> 2203
> [j...@rtbf64a gcc]$ grep ^FAIL gcc.log.sent | grep /neon/  | wc -l
> 1986
> 
> http://gcc.gnu.org/ml/gcc-testresults/2010-04/msg02780.html
> 
> I see that there is an arm_neon_ok check in lib/target-supports.exp
> but it only checks that neon code compiles.  It will compile but the
> target can't run it.

The vast bulk of NEON tests are compilation-only tests, not execution only 
tests.  Since your failures are generally "test for excess errors" and 
"internal compiler error", that indicates compilation failures that are 
nothing to do with whether your target can run the code; you need to 
investigate the actual errors seen.

It looks like arm-rtems may not be an EABI target.  NEON has probably not 
been tested much if at all for non-EABI targets.

Vectorization tests will already test whether NEON code can be executed on 
the target before trying to execute it.  The arm_neon_hw effective-target 
serves that purpose for tests that genuinely need to execute NEON code.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Steven Bosscher

2010/4/30 Jan Hubicka :
> Yep, I read that page (and saw some of implementation too).  Just was not able
> to follow the precise feature set of LIPO (i.e. if it gets better SPEC results
> than LTO+FDO then why)

OK, that's an interesting question. The first question (if...) is
something you'll have to try yourself, I suppose :-)

BTW will the CGO presentation about LIPO and sampled FDO be published
somewhere in the open?

Ciao!
Steven

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li

On Thu, Apr 29, 2010 at 4:03 PM, Jan Hubicka  wrote:
>> 2010/4/30 Jan Hubicka :
>> >> Thanks for the suggestion. Raksit currently is busy with merging trunk
>> >> changes back to lw-ipo branch which can be a daunting task. After that
>> >> this can be done.  (Our internal release is based on 4.4).
>> >
>> > I must say that LIPO is something I always intend to look into but didn't
>> > seriously find time for that yet (well, hoping that submitting the thesis 
>> > will
>> > make this easier).
>> > What are the LIPO's features that are missing in -flto -fprofile-use?
>>
>> LIPO is a completely different approach, basically independent of LTO.
>> There is a good explanation of it on the wiki, see
>> http://gcc.gnu.org/wiki/LightweightIpo.
>
> Yep, I read that page (and saw some of implementation too).  Just was not able
> to follow the precise feature set of LIPO (i.e. if it gets better SPEC results
> than LTO+FDO then why)
>

In theory, LIPO should not generate better results than LTO+FDO. What
makes LIPO attractive is that it allows distributed build from the
beginning. Its integration with large distributed build system is also
easy.  Another point is that LIPO can be decoupled from FDO as well.
The reason is that cross module call clusters do not change that much
and can be determined statically or determined once using sample
profiling information. The grouping info can then be used for regular
O2 builds. This will remove the need for people to move functions into
header files which tend to penalize compile time unnecessarily.

If there is performance difference, the following unique things in
LIPO may contribute to it ( I have not validate them)

1) LIPO supports tracking indirect call targets across modules. This
is not feasible for plain FDO as there will be cgraph pid conflicts.
LIPO uses unique function id == (module_id << 32) + func_def_no, which
makes it possible.
2) comdat function resolution -- since LIPO uses aux module functions
for inlining purpose only, it has the freedom to choose which copy to
use. The current scheme chooses copy in current module with priority
for better profile data context sensitivity (see below)
3) in profile-gen phase, allow more inlining for comdat functions (in
einline2 and ipa-inline) -- this will cause profile data to be tracked
with module sensitivity (note that counters are not in comdat group)

Thanks,

David

> Honza
>>
>> Ciao!
>> Steven
>

Parallelized loads and widening mults cont:ed (was: Re: GCC porting tutorials)

2010-04-29 Thread Hans-Peter Nilsson

> Date: Thu, 29 Apr 2010 08:55:56 +0200 (CEST)
> From: "Jonas Paulsson" 

> It feels good to know that the widening mults issue has been
> resolved

Yes, nice, and as late as last week too, though the patch was
from February.

> as
> it was a bit of a disapointment I noted the erratic behaviour with GCC
> 4.4.1. Perhaps you would care to comment on what to expect as a user now,
> then?

IIUC, it should Just Work.  No, I haven't checked.  Note that
the fix was somewhat along the lines of what you wrote in your
thesis IIUC; adding a specific pass to fix up separated
operations.  See
 and
.  BTW,
my observation was from the 4.3 era.  It's a regression, which
explains why I hadn't noticed it with the 3.x version I used
before that.  A pity it was deemed too invasive to fix for 4.5.

> Another issue that gave me porting problems was the SIMD memory accesses,
> for e g doing a wide load into two adjacent narrow registers with one
> instruction. This was resolved earlier on the mailinglist to not be
> handleable on RTL, so I wonder now if anything has been done for this, as
> it too seems rather reasonable, just like the widening loads?

You wanted to load adjacent data in a wider mode that was then
to be separately used in a mode half that size, but the
registers had to be adjacent too?  That's kind of the opposite
problem to what's usually needed!  If the use of the data was
actually for the obvious wider mode (SI or V2HI), you'd just
have to define the movsi or movv2hi pattern and it would be
used, but that unfortunately seems not applicable in any way.
I'm not sure that problem is of common interest I'm afraid, but
if it can be resolved with a target-specific pass, there'd be
reason to add a hook somewhat like
TARGET_MACHINE_DEPENDENT_REORG, but earlier.

But, did you check whether combine tried to match RTL that
looked somewhat like:

(parallel
 [(set (reg:HI 1) (mem:HI (plus:SI (reg:HI 3) (const_int 2
  (set (reg:HI 2) (mem:HI (plus:SI (reg:HI 3) (const_int 4])

I.e. a parallel with the two loads where the addresses were
adjacent?  From gdb you inspect the calls to try_combine (IIRC).
That insn could have been matched to a pattern like:

(define_insn "*load_wide"
 [(set (match_operand:HI 0 "register_operand" "=d0,d1,d2")
   (match_operand:HI 1 "reg_plus_const_memory_operand" "m"))
  (set (match_operand:HI 2 "register_operand" "=d1,d2,d3")
   (match_operand:HI 3 "reg_plus_const_memory_operand" "m"))]
 "rtx_equal_p (XEXP (operands[3], 0),
   plus_constant (XEXP (operands[1]), 2))"
 "load_wide %0,%1")

Just a WAG, there are reasons this would not match in the
general case (for one, you'd want to try to match the opposite
order too).  Don't pay too much attention to the exact matching
predicates, constraints and condition above.  The point is just
whether combine tried to generate and match a parallel with two
valid loads, given source where there was obvious opportunity
for it.

That insn *could* then be caught with a pattern which would,
through the right constraints coerce register allocation to make
the right choices for the (initially separete) registers.  In
the example above, four registers are assumed to be valid as
destination with the matching singleton constraints d0..d3.

brgds, H-P

Re: GIMPLE Front End (GSOC 2010)

2010-04-29 Thread Sandeep Soni

On Thu, Apr 29, 2010 at 11:01 PM, Manuel López-Ibáñez
 wrote:
> On 29 April 2010 19:25, Sandeep Soni  wrote:
>> I added the following page to the wiki.
>>
>> http://gcc.gnu.org/wiki/GimpleFrontEnd
>>
>> Any comments/suggestions or ideas related to the project are welcome.
>
> Hi Sandy,
>
> It may be helpful to take a look to wiki pages of previous SoC
> projects, such as
> http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings, for
> formatting/structure ideas.
>

Indeed. The above mentioned page is extremely well detailed.

I will be adding the granular details about the project as I work on
it during this period and keep the wiki page updated.So will
incorporate the structuring present on that page.

Thanks indeed for the help.

> Cheers,
>
> Manuel.
>

-- 
Cheers
Sandy

split lui_movf pattern on mips?

2010-04-29 Thread Amker.Cheng

HI:
   There is comment on lui_movf in mips.md like following,

;; because we don't split it.  FIXME: we should split instead.

I can split it into a move and a condmove(movesi_on_cc) insns , like

(define_split
 [(set (match_operand:CC 0 "d_operand" "")
   (match_operand:CC 1 "fcc_reload_operand" ""))]
 "reload_completed && ISA_HAS_8CC && TARGET_HARD_FLOAT && ISA_HAS_CONDMOVE
 && !CANNOT_CHANGE_MODE_CLASS(CCmode, SImode,

REGNO_REG_CLASS(REGNO(operands[0])))"
 [(set (match_dup 2) (match_dup 3))
  (set (match_dup 2)
   (if_then_else:SI
  (eq:SI (match_dup 1)
 (match_dup 4))
  (match_dup 2)
  (match_dup 4)))]
 "
 {
   operands[2] = gen_rtx_REG(SImode, REGNO(operands[0]));
   operands[3] = GEN_INT(0x3f80);
   operands[4] = const0_rtx;
 }
 ")

But I have two questions.

Firstly, the lui_movf pattern is output as
"lui\t%0,0x3f80\n\tmovf\t%0,%.,%1" in mips_output_move,
why 0x3f80? is it some kind of magic number, or anything else important?

Secondly, I have to change mode of operands[0] into SImode when
splitting, otherwise there is no
insn pattern matching the first insn generated.
Since no new REG generated I assuming the mode transforming is OK
here, any suggestion?

Thanks.
-- 
Best Regards.

51 matches

Mail list logo