Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Rafael Espindola
> Is there a specific reason you don't use the LLVM LTO interface?  It seems
> to be roughly the same as your proposed interface:
>
> a) it has a simple C interface like your proposed one
> b) it is already implemented in one system linker (Apple's), so GCC would
> just provide its own linker plugin and it would work on apple platforms
> c) it is richer than your interface
> d) it is battle tested, and exists today
> e) it is completely independent of llvm (by design)
> f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html
>
> Is there something specific you don't like about the LLVM interface?


We are still discussing how we are going to implement, so the API is
still not final. Some things that have been pointed out:

*) Plugins could have other uses and the naming used on the LLVM LTO
interface is LTO specific.
*) We have a normal symbol table on the .o files. It is not clear if
we should assume that it will always be the case. If so, we don't need
the API part that handles that.
*) How do you handle the case of multiple symbols with the same name
(say a weak and a strong one)? lto_codegen_add_must_preserve_symbol
has a char* argument. How does it know which symbol we are talking
about?
*) To save memory, one option is to have the plugin exec WPA and WPA
exec the linker again with the new objects. In this case the api
should be a bit different.


> -Chris
>


Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047


Re: why 6Gb RAM not enough to compile a 14Mb source [MELT]?

2008-06-04 Thread Richard Guenther
On Wed, Jun 4, 2008 at 8:31 AM, Basile STARYNKEVITCH
<[EMAIL PROTECTED]> wrote:
> Hello All,
>
> my MELT branch http://gcc.gnu.org/wiki/MiddleEndLispTranslator has a big
> source file in it warm-basilys-0.c. It is "self" generated, about 14Mbytes &
> almost 280KLOC (in rev136334). It ends with a big initialization routine of
> 100KLOC which mostly fills a 5000 member structure (each member being itself
> a small structure) and calls a few routines. This initialization routine has
> a simple control structure (no deeply nested blocks or loops).
>
> But gcc (either gcc-4.1 or 4.2 or 4.3 from Debian, or the bootsrapped trunk
> rev136331) can compile this file without any optimisation ie with -O0 -g3 in
> about 16 seconds and less than 1Gb RAM.
>
> But on my 6 Gbytes machine (Core2, 2400MHz, Debian/Sid/AMD64) the cc1
> process with -O2 (either 4.2, 4.3 or the trunk) eats  nearly 10Gb of virtual
> memory and trashes (using 4.8Gb of RAM, 1% cpu time, waiting for the swap
> IO). The same happens with -O1. -Os is a bit better.
>
> The time to run the
> ./built-melt-cc-script warm-basilys-0.c warm-basilys-0.so
> which compiles warm-basilys-0.c with -O2 -fPIC is
>
> (you can set the MELT_EXTRACFLAGS environment variable to pass
> real84m23.594s
> user6m23.496s
> sys 1m5.032s
>
> I am attaching the -ftime-report output for information. One of the most
> demanding passes is tree operand scan
>
> I find this report misleading on the memory consumption total (1591718kB =
> 1.6Gb). The top command gives that cc1 needs nearly 10Gb of process space,
> and uses nearly 5G (and trashes).
>
> I won't be annoyed for long by this, since I'll soon split the
> warm-basilys.bysl file (and hence the generated files) in several distinct
> files. Until then, -O0 is enough for me.
>
> Are there any specific flags to pass to gcc to lower the RAM consumption
> (even at the expense of generated code quality)?
>
> Are there any pragma-s to disable (or lower) optimisation of a single
> routine?
>
> My intuition (and experience) is that gcc -O2 (or even -O1) time and space
> consumption is nearly quadratic on the size of the longest routine.
>
> Thanks for reading.

If it does structure initialization you can try --param
max-fields-for-field-sensitive=0 --param max-aliased-vops=0

Otherwise can you file a bugreport and attach the testcase there?
(bonus points if you have some that doesn't max out at 10GB but
maybe 2GB ;))

Thanks,
Richard.

>
> --
> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
> email: basilestarynkevitchnet mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mines, sont seulement les miennes} ***
>
>
> Execution times (seconds)
>  garbage collection:   7.16 ( 2%) usr   0.45 ( 1%) sys  47.16 ( 1%) wall
>   0 kB ( 0%) ggc
>  callgraph construction:  16.83 ( 4%) usr   0.10 ( 0%) sys  16.87 ( 0%) wall
>   41478 kB ( 3%) ggc
>  callgraph optimization:   9.82 ( 3%) usr   0.11 ( 0%) sys   9.95 ( 0%) wall
>9184 kB ( 1%) ggc
>  ipa reference :   0.25 ( 0%) usr   0.02 ( 0%) sys   0.26 ( 0%) wall
>  52 kB ( 0%) ggc
>  ipa pure const:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
>   0 kB ( 0%) ggc
>  cfg cleanup   :   2.76 ( 1%) usr   0.03 ( 0%) sys   2.91 ( 0%) wall
>5120 kB ( 0%) ggc
>  CFG verifier  :  11.22 ( 3%) usr   0.69 ( 1%) sys 177.08 ( 3%) wall
>   0 kB ( 0%) ggc
>  trivially dead code   :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.80 ( 0%) wall
>   0 kB ( 0%) ggc
>  df reaching defs  :   3.01 ( 1%) usr   0.49 ( 1%) sys  34.85 ( 1%) wall
>   0 kB ( 0%) ggc
>  df live regs  :   3.46 ( 1%) usr   0.06 ( 0%) sys   3.57 ( 0%) wall
>   0 kB ( 0%) ggc
>  df live&initialized regs:   2.12 ( 1%) usr   0.00 ( 0%) sys   2.16 ( 0%)
> wall   0 kB ( 0%) ggc
>  df use-def / def-use chains:   1.61 ( 0%) usr   0.02 ( 0%) sys   1.75 ( 0%)
> wall   0 kB ( 0%) ggc
>  df reg dead/unused notes:   1.07 ( 0%) usr   0.04 ( 0%) sys   1.10 ( 0%)
> wall   15075 kB ( 1%) ggc
>  register information  :   0.51 ( 0%) usr   0.01 ( 0%) sys   0.45 ( 0%) wall
>   0 kB ( 0%) ggc
>  alias analysis:   1.05 ( 0%) usr   0.01 ( 0%) sys   0.91 ( 0%) wall
>   19781 kB ( 1%) ggc
>  register scan :   0.25 ( 0%) usr   0.01 ( 0%) sys   0.23 ( 0%) wall
> 163 kB ( 0%) ggc
>  rebuild jump labels   :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall
>   0 kB ( 0%) ggc
>  preprocessing :   1.24 ( 0%) usr   0.56 ( 1%) sys   1.93 ( 0%) wall
>   46597 kB ( 3%) ggc
>  lexical analysis  :   0.30 ( 0%) usr   0.81 ( 1%) sys   1.29 ( 0%) wall
>   0 kB ( 0%) ggc
>  parser:   1.70 ( 0%) usr   0.49 ( 1%) sys   2.24 ( 0%) wall
>  123365 kB ( 8%) ggc
>  inline heuristics :   0.63 ( 0%) usr   0.01 ( 0%) sys   0.62 ( 0%) wall
>5491 kB ( 0%) ggc
>  integration   :   2.11 ( 1%) usr   0.22 ( 0%) sys   2.25 ( 0%) wall
>  168932 kB (11%) ggc
>  tre

MULTILIB_OSDIRNAMES trouble

2008-06-04 Thread Andreas Krebbel
Hello,

I'm experimenting with a hardware dfp libgcc_s for S/390.  The target
is to have an additional variant of that lib built using the multilib
machinery.

Regarding this I'm wondering how to set MULTILIB_OSDIRNAMES correctly.
This variable is usually set in a makefile fragment in the back end
directory to tell GCC about the location of the correct libc and its
crt files - right?!  Besides this it is also used for the target
location of libgcc_s.so.  Unfortunately in my case I think I need two
distinct directories here.

Whether libgcc_s is compiled with hw dfp or not does not affect the
ABI.  So I don't need - and don't want - an additional variant of libc
for dfp but I want the new libgcc_s variant to be installed in a
separate subdirectory.  As I've written above setting
MULTILIB_OSDIRNAMES affects both :( If I set MULTILIB_OSDIRNAMES to
"../lib64 ../lib dfp" the libgcc_s would get installed correctly but
the link step of the 64 bit version fails since no crt files can be
found under /usr/lib64/dfp what makes ld to default to /usr/lib/ which
does not contain the 64 bit crt files.

If I don't set MULTILIB_OSDIRNAMES for dfp just leaving it as
"../lib64 ../lib" the build and link steps do fine but the resulting
hardware and software dfp libgcc_s are all installed in the lib64 and
lib dirs overwriting each other.

Does anybody has a hint how to solve this?  Perhaps any spec file
magic using the SPECS variable in the makefile fragment?

Bye,

-Andreas-


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Diego Novillo
On Tue, Jun 3, 2008 at 22:26, Chris Lattner <[EMAIL PROTECTED]> wrote:

> and whopr here.  Is LTO the mode "normal people" will use, and whopr is the
> mode where "people with huge clusters" will use?  Will LTO/whopr support
> useful optimization on common multicore machines?

As Ollie said, WHOPR is just an extension on the LTO framework to
cater for scalability when building large applications.  As such, when
building large applications we expect not to be able to apply IPA
passes that rely on having the whole program callgraph and bodies
loaded in memory.

However, WHOPR does not limit IPA passes to summary-only.  That's why
you see the distinction between IPA_PASS and SIMPLE_IPA_PASS in the
pass manager.

> Are you focusing on inlining here as a specific example, or is this the only
> planned IPA optimization that can use summaries?  It seems unfortunate to

No.  Just the first pass that we are going to concentrate for the
initial implementation.

>> == Design Philosophy ==
>> * The implementation provides complete transparency. Developers
>> should be able to take advantage of LTO without having to modify
>> existing build systems and/or Makefiles, all that's needed is to add
>> an LTO option (-flto).
>
> Ok.  How do you handle merging of optimization info?   If I build one .o
> file with -Os and one with -O3 who wins or what does this mean?  If I build
> one with -ffast-math and one without, does the right thing happen?

Right now, mixed optimization flags will likely cause trouble.  We
have not really talked about this issue in detail.  I expect many/most
of these issues will be orthogonal to the driver, though.  We've
talked a bit about different ways of encoding the options into the IR,
but there is nothing concrete yet.  It's in my list of things to
discuss at the next BoF.

> Also, where does debug info (i.e. DWARF for -g) get stored?  I'm not talking
> about people debugging the compiler, I'm talking about people who want to
> build an executable with debug info.

In the .o file.  We are generating regular .o files (for now).

> Is there a specific reason you don't use the LLVM LTO interface?  It seems
> to be roughly the same as your proposed interface:

Not really.  This is mostly the first iteration.  Rafael and Robert
will be able to tell you much more about this.  I'm not directly
working on this aspect.


Diego.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Chris Lattner <[EMAIL PROTECTED]> writes:

> Is there a specific reason you don't use the LLVM LTO interface?  It
> seems to be roughly the same as your proposed interface:
>
> a) it has a simple C interface like your proposed one
> b) it is already implemented in one system linker (Apple's), so GCC
> would just provide its own linker plugin and it would work on apple
> platforms
> c) it is richer than your interface
> d) it is battle tested, and exists today
> e) it is completely independent of llvm (by design)
> f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html
>
> Is there something specific you don't like about the LLVM interface?

(I didn't design the proposed linker interface, and I'm not sure my
earlier comments were included in the proposal sent to the list.  I'm
going to reply to that next.)

When I look at the LLVM interface as described on that web page, I see
these issues, all fixable:

* No support for symbol versioning.
* The return value of lto_module_get_symbol_attributes is not
  defined.
* lto_codegen_set_debug_model and lto_codegen_set_pic_model appear to
  be underspecified--don't they need an additional parameter?
* Interfaces like lto_module_get_symbol_name and
  lto_codegen_add_must_preserve_symbol are inefficient when dealing
  with large symbol tables.

A more general problem is that the main reason I see to use a linker
plugin is to let the linker handle symbol resolution.  The LLVM
interface does not do that.  Suppose the linker is invoked on a
sequence of object files, some with with LTO information, some
without, all interspersed.  Suppose some symbols are defined in
multiple .o files, through the use of common symbols, weak symbols,
and/or section groups.  The LLVM interface simply passes each object
file to the plugin.  The result is that the plugin is required to do
symbol resolution itself.  This 1) loses one of the benefits of having
the linker around; 2) will yield incorrect results when some non-LTO
object is linked in between LTO objects but redefines some earlier
weak symbol.

Also, returning a single object file restricts the possibilities.  The
design of WHOPR, as I understand it, permits creating several
different object files in parallel based on a fast analysis of which
code should be compiled together.  When the linker supports concurrent
linking, it will be desirable to be able to provide it with each
object file as it is completed.

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Diego Novillo <[EMAIL PROTECTED]> writes:

I have a feeling that the comments I wrote within Google about the
linker interface were lost.  I am going to try to recreate them here.


> The linker, upon start, examines a configuration file at a known
> location relative to its own location. If this file exists, it
> extracts the location of linker plugins (shared libraries) and loads
> those.  A fixed set of function interfaces needs to be implemented in
> the plugin, these functions are described below. One of many possible
> plugins is a plugin that controls LTO.
>
> Another way to locate a plugin would be via command-line.  This would
> make it easier for two different compilers (and therefore two
> different plugins) to use the same linker.

I think the plugin should always be specified on the command line, and
the linker should never search for it.  The plugin is inherently a
property of the compiler, not the linker.  We already expect that the
linker will always be invoked via the gcc driver program.  It is
trivial for the driver program to pass an option specifying the plugin
or the plugin directory.


> The linker performs regular symbol resolution. For each object file it
> touches, it calls a specific function in the plugin (int
> ldplugin_claim_file(const char *fname, size_t offset)). This
> function returns 1 if it intends to claim a file (e.g. it contains
> IR), and 0 if it doesn't.   The offset is used in the case of an
> archive file. This way the plugin doesn't need to understand archives.

There should be an interface to pass a pointer to the contents of the
file rather than the filename.  Otherwise each file has to be opened
twice, which is pointless.


> The linker also creates a list of all externally referenced symbols
> and passes these to the plugin via the function
> ldplugin_add_external_symbol(const char *mangled_name).
>
> '''TODO''': Would it be better to pass an abstract object to
> ldplugin_add_external_symbol? What should we pass to it if there are
> two symbols in IL files with the same name?  One strong and one weak
> for example.

"Externally referenced" is a bad term.  I think that is meant here is
"referenced by some part of the program which the plugin did not
claim".

There needs to be a way to specify the symbol version.

The interface should not require a separate function call for each
symbol.  This is inefficient.  Some executables have hundreds of
thousands of symbols.  There should be a way to pass a list of
symbols.

More seriously, this interface is much too simple.  In the general
case, for each input file, we need to specify the exact disposition of
each symbol.  If we don't provide a way for the linker to communicate
that to the plugin, then the plugin is forced to do symbol resolution
itself.  That is what we want to get away from.

My assumption is that the symbol table in an LTO object is fully
correct: it correctly reports weak symbols, section groups, etc.
Given that, the linker should be determining the symbol resolution.
For each defined symbol in the symbol table, the linker should say
whether that symbol should be included in the link.  For each
undefined symbol, the linker should say where the definition of that
symbol may be found--it could be in an LTO file or a non-LTO file.


> At this point, the linker calls the main entry point to the pluging
> (ldplugin_main(int argc, char *argv[]), passing its own arguments.
> It's the plugin's responsibility to extract its related {{{-Wx,...}}}
> values.

This does not make sense.  The linker options are complex and varied.
We do not want to require the plugin to understand how to parse them.
We need to define a different approach for sending options to the
plugin.


> '''TODO''': How do we handle symbols defined in more then one file?
> Should ldplugin_add_external_symbol take a abstract pointer/index into
> the linker symbol table?

Yes, this is required.


> '''TODO''': What is passed to ldpluging_claim_file if the file is in a
> .a file?

We should pass a buffer, not a file name.


> '''TODO:'''Are we assuming that the files with IL contain a
> normal symbol table? Should we make it possible for the plugin to call
> back into the linker to add symbols? This should make it possible to
> support a "full custom" file format for the IL files.

If the LTO files do not contain a normal symbol table, then the plugin
will have to provide one for the linker.  The symbol table provided by
the plugin will have to include symbol names and versions, weak
vs. strong, defined vs. common vs. undefined, symbol visibility,
symbol type, section group information.


> == Final Link - ld ==
> After all real object files have been generated, these files, along
> with the rest of the originally passed real object files, need to be
> passed to the linker. There are a few ways to do this:
>
>  * Call a plugin / linker interface which allows to explicitly add
>  files to the linker's internal data structures. '''TODO''': Unclear

Re: How to reserve an Elf e_machine value

2008-06-04 Thread Michael Meissner
On Tue, Jun 03, 2008 at 08:46:44AM -0700, Stephen Andieta wrote:
> 
> I am working on a compiler kit for an in-house processor that uses Elf as
> object file format. Since this compiler will be released to external
> customers, I need to reserve an 'official' e_machine value for this
> processor. Somehow I am unable to find out how to reserve such a value. How
> should I do this?
>    Thanks,   Stephen.

This is a binutils problem, not a GCC.

The problem is the company that assigns the official numbers (SCO) is rapidily
spinning out of control, and may not be responsive any more.  When I registered
EM_MEP in 2003, the address used was [EMAIL PROTECTED]

If you can't get an official number, there is this comment in elf/common.h:

/* If it is necessary to assign new unofficial EM_* values, please pick large
   random numbers (0x8523, 0xa7f2, etc.) to minimize the chances of collision
   with official or non-GNU unofficial values.

   NOTE: Do not just increment the most recent number by one.
   Somebody else somewhere will do exactly the same thing, and you
   will have a collision.  Instead, pick a random number.

   Normally, each entity or maintainer responsible for a machine with an
   unofficial e_machine number should eventually ask [EMAIL PROTECTED] for
   an officially blessed number to be added to the list above.  */

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
[EMAIL PROTECTED]



Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Diego Novillo
On Wed, Jun 4, 2008 at 10:44, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

> I have a feeling that the comments I wrote within Google about the
> linker interface were lost.  I am going to try to recreate them here.

Sorry.  I should've been more careful when I transcribed it over.


Diego.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Kenneth Zadeck

Diego Novillo wrote:

On Tue, Jun 3, 2008 at 22:26, Chris Lattner <[EMAIL PROTECTED]> wrote:

  

and whopr here.  Is LTO the mode "normal people" will use, and whopr is the
mode where "people with huge clusters" will use?  Will LTO/whopr support
useful optimization on common multicore machines?



As Ollie said, WHOPR is just an extension on the LTO framework to
cater for scalability when building large applications.  As such, when
building large applications we expect not to be able to apply IPA
passes that rely on having the whole program callgraph and bodies
loaded in memory.

However, WHOPR does not limit IPA passes to summary-only.  That's why
you see the distinction between IPA_PASS and SIMPLE_IPA_PASS in the
pass manager.

  

Are you focusing on inlining here as a specific example, or is this the only
planned IPA optimization that can use summaries?  It seems unfortunate to



No.  Just the first pass that we are going to concentrate for the
initial implementation.

  


I think that one thing that the gcc community should understand is that 
to a great extent whopr is a google thing.   All of the documents are 
drafted by google people, in meetings that are only open to google 
people and it is only after these documents have been drafted do the 
people who are outside of google who are working on lto, like Honza and 
myself, see the documents and get to comment.  The gcc community never 
sees the constraints, deadlines, needs, or benchmarks that are 
motivating the decisions that are made in the whopr documents.


Honza and I plan, and are implementing, a system where most, but 
probably all of the ipa passes, will be able to work in an environment 
where the entire call graph and all of the decls and types are 
available.  I.e. only the function bodies are missing.In this 
environment, we plan to do all of the interprocedural analysis and 
generate work orders that will be applied to each function.  

In a distributed environment, these "work orders" can then be streamed 
out to the machines that are actually going to read the function bodies 
and compile them. 

It is certainly not going to be possible to do this for all ipa passes, 
in particular any pass that requires the function body to be reanalyzed 
as part of the analysis pass will not be done, or will be degraded so 
that it does not use this mechanism.  But for a large number of passes 
this will work.


How this scales to google sized applications will have to be seen.  The 
point is that there is a rich space with a complex set tradeoffs to be 
explored with lto.   The decision to farm off the function bodies to 
other processors because we "cannot" have all of the function bodies in 
memory will have a dramatic effect on what gcc/lto/whopr compilation 
will be able to achieve.  We did not make this decision just because gcc 
is fat, we made it because we wanted to be able to compile larger 
programs that could fit into memory even if we did go on a real diet. 

However, in other lto systems like ibm's and (i believe) llvm where the 
link time compilation is done with everything in memory, you can do a 
lot more transformation because you can iterate and propagate 
information discovered from improvements in one function to another.  
IBM seems to sell 64 processor machines with up to 28tb of memory.   I 
do not know whether they can compile all of db2 at one time on this box, 
the last time i talked to them, a year ago, they could not (or at least 
did not) compile all of db2 at one time.  But they are able to do 
several rounds that consist of global analysis and local 
analysis/transformation.   This is certainly the way to squeeze out 
everything that static compilation has to offer.  However it is unlikely 
that many in the gcc community are going to have this kind of horsepower 
available (balrog is a toy compared to one of these monsters).


The bet (guess) that we are making in gcc is that doing weaker analysis 
over a larger context is going to win.  In the initial whopr 
proposal/implementation, this is taken to the extreme, to say that 
inlining is the only ipa transformation, but it is going to be applied 
the entire code base of some monster app.  The rest of the gcc community 
may not see the need to go here, and in fact i would guess (an 
uninformed guess from an outsider) that even google will not need this 
for all of their apps either.  In particular, as consumer machines get 
larger memories and more processors, the assumption that we cannot see 
all of the functions bodies gets more questionable, especially for 
modest sized apps that are the staple of the gcc community.


In particular, Google may be willing to compile the "entire" app, even 
sucking in the code from shared libraries if it provides any benefit.   
Users in the gcc community will most likely rarely go there, since it 
makes the process of doing updates almost impossible.  

There is also a rich set of choices that need to be made to 

Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Rafael Espindola
2008/6/4 Ian Lance Taylor <[EMAIL PROTECTED]>:
> Diego Novillo <[EMAIL PROTECTED]> writes:
>
> I have a feeling that the comments I wrote within Google about the
> linker interface were lost.  I am going to try to recreate them here.

I have added them to the gcc wiki.

I have also removed some of the TODOs that are now obsolete (passing
all of the liker options to the plugin, passing only the symbol name).

I created an abstract type ldplugin_symbol_t. We need to define some
inline functions that the plugin can use to extract data from it.

Thank a lot,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Kenneth Zadeck <[EMAIL PROTECTED]> writes:

> I think that one thing that the gcc community should understand is
> that to a great extent whopr is a google thing.   All of the documents
> are drafted by google people, in meetings that are only open to google
> people and it is only after these documents have been drafted do the
> people who are outside of google who are working on lto, like Honza
> and myself, see the documents and get to comment.  The gcc community
> never sees the constraints, deadlines, needs, or benchmarks that are
> motivating the decisions that are made in the whopr documents.

Every new gcc development starts that way.  Somebody has to put
together the initial proposal.  How many people were invited to work
on the initial LTO proposal before it was sent out?  Did anybody
outside of Red Hat see the tree-ssa proposal before it was sent out?

The WHOPR document has been out there for some time, and it was sent
out before any implementation work started.  There is no Google cabal
pushing it.  There is no secret information behind it, no constraints
or deadlines or benchmarks.  We did have the advantage of talking to
Google employees about their experience with LTO-style work done at
Intel and HP and Transmeta.  Some of the people we talked to have no
plans or interest in working on gcc, and it would not be fair to rope
them into the conversation further.  Google's needs are clear: we have
large programs.

Let's deal with these issues on the technical merits, not on
organizational issues.  If Google were dumping code on gcc, you would
have a legitimate complaint.  Here Google is proposing plans before
any work is started.  You seem to be complaining that the community
should have seen the plans at an earlier stage.  That makes no sense.
They are still just plans, they were based on all of two days of
meetings and discussions, and they are still completely open to
discussion and change.


> Honza and I plan, and are implementing, a system where most, but
> probably all of the ipa passes, will be able to work in an environment
> where the entire call graph and all of the decls and types are
> available.  I.e. only the function bodies are missing.In this
> environment, we plan to do all of the interprocedural analysis and
> generate work orders that will be applied to each function.  

I don't see that as being opposed to the WHOPR ideas.  It's not like
WHOPR will prohibit that approach.  It's a limiting case.


> In particular, as consumer
> machines get larger memories and more processors, the assumption that
> we cannot see all of the functions bodies gets more questionable,
> especially for modest sized apps that are the staple of the gcc
> community.

I question that assumption, and I especially question any assumption
that gcc should only work for modest sized apps.

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Chris Lattner


On Jun 4, 2008, at 8:27 AM, Kenneth Zadeck wrote:

It is certainly not going to be possible to do this for all ipa  
passes, in particular any pass that requires the function body to be  
reanalyzed as part of the analysis pass will not be done, or will be  
degraded so that it does not use this mechanism.  But for a large  
number of passes this will work.


How this scales to google sized applications will have to be seen.   
The point is that there is a rich space with a complex set tradeoffs  
to be explored with lto.   The decision to farm off the function  
bodies to other processors because we "cannot" have all of the  
function bodies in memory will have a dramatic effect on what gcc/ 
lto/whopr compilation will be able to achieve.


I agree with a lot of the sentiment that you express here Kenny.  In  
LLVM, we've intentionally taken a very incremental approach:


1) start with all code in memory and see how far you can get.  It  
seems that on reasonable developer machines (e.g. 2GB memory) that we  
can handle C programs on the order of a million lines of code, or C++  
code on the order of 400K lines of code without a problem with LLVM.


2) start leaving function bodies on disk, use lazily accesses, and a  
cache manager to keep things in memory when needed.  I think this will  
let us scale to tens or hundreds of million line code bases them.  I  
see no reason to take a whopr approach just to be able to handle large  
programs.


Independent of program size is the efficiency of LTO.  To me, allowing  
lto to scale and work well on 2 to 16 way shared memory machine is the  
first interesting order of business, just because that is what many  
developer's have on their desk.  Once that issue is nailed, going  
across a cluster is an interesting next step.


In the world I deal with, most code is built out of a large number of  
moderate sized libraries/plugins, not as a gigantic monolithic a.out  
file.  I admit that this shifts the emphasis we've been placing on to  
making things integration transparent, support for LTO across code  
bases with pieces missing, etc and not on support for ridiculously  
huge code bases.


I guess one difference between the LLVM and GCC approaches stems from  
the "constant factor" order of magnitude of efficiency difference  
between llvm and gcc.  If you can't reasonable hold a few hundred  
thousand lines of code in memory then you need more advanced  
techniques in order to be generally usable for moderate-sized code  
bases.


-Chris


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Chris Lattner

On Jun 4, 2008, at 7:22 AM, Ian Lance Taylor wrote:

Chris Lattner <[EMAIL PROTECTED]> writes:

Is there a specific reason you don't use the LLVM LTO interface?  It
seems to be roughly the same as your proposed interface:

a) it has a simple C interface like your proposed one
b) it is already implemented in one system linker (Apple's), so GCC
would just provide its own linker plugin and it would work on apple
platforms
c) it is richer than your interface
d) it is battle tested, and exists today
e) it is completely independent of llvm (by design)
f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html

Is there something specific you don't like about the LLVM interface?


(I didn't design the proposed linker interface, and I'm not sure my
earlier comments were included in the proposal sent to the list.  I'm
going to reply to that next.)

When I look at the LLVM interface as described on that web page, I see
these issues, all fixable:
* No support for symbol versioning.


Very true.  I think it would great to work from a common model that  
can be extended to support both compilers.  Having a unified interface  
would be very useful, and we are happy to evolve the interface to suit  
more general needs.



* The return value of lto_module_get_symbol_attributes is not
 defined.


Ah, sorry about that.  Most of the details are actually in the public  
header.  The result of this function is a 'lto_symbol_attributes'  
bitmask.  This should be more useful and revealing:

http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup


* lto_codegen_set_debug_model and lto_codegen_set_pic_model appear to
 be underspecified--don't they need an additional parameter?


These are actually likely to change.  We are currently working on  
extending the model to better handle the case whentranslation units  
are compiled with different flags.  I expect this to subsume the debug  
and pic handling, which are pretty ad-hoc right now.  There should be  
a proposal going out to llvmdev in the next few days on this.



* Interfaces like lto_module_get_symbol_name and
 lto_codegen_add_must_preserve_symbol are inefficient when dealing
 with large symbol tables.


The intended model is for the linker to query the LTO plugin for its  
symbol list and build up its own linker-specific hash table.  This way  
you don't need to force the linker to use the plugin's data structure  
or the plugin to use the linker data structure.  We converged on this  
approach after trying it the other way.


Does this make sense, do you have a better idea?


A more general problem is that the main reason I see to use a linker
plugin is to let the linker handle symbol resolution.


There is that, but also it lets the linker handle things like export  
maps, visibility, strange platform-specific options, etc.  As you  
know, linker's are very complex :)



The LLVM
interface does not do that.


Yes it does, the linker fully handles symbol resolution in our model.


Suppose the linker is invoked on a
sequence of object files, some with with LTO information, some
without, all interspersed.  Suppose some symbols are defined in
multiple .o files, through the use of common symbols, weak symbols,
and/or section groups.  The LLVM interface simply passes each object
file to the plugin.


No, the native linker handles all the native .o files.


 The result is that the plugin is required to do
symbol resolution itself.  This 1) loses one of the benefits of having
the linker around; 2) will yield incorrect results when some non-LTO
object is linked in between LTO objects but redefines some earlier
weak symbol.


In the LLVM LTO model, the plugin only needs to know about its .o  
files, and the linker uses this information to reason about symbol  
merging etc.  The Mac OS X linker can even do dead code stripping  
across Macho .o files and LLVM .bc files.


Further other pieces of the toolchain (nm, ar, etc) also use the same  
interface so that they can return useful information about LLVM LTO  
files.



Also, returning a single object file restricts the possibilities.  The
design of WHOPR, as I understand it, permits creating several
different object files in parallel based on a fast analysis of which
code should be compiled together.  When the linker supports concurrent
linking, it will be desirable to be able to provide it with each
object file as it is completed.


This sounds like a natural and easy extension once whopr gets farther  
along.


This is our second major revision of the LTO interfaces, and the  
interface continues to slowly evolve.  I think it would be great to  
work with you guys to extend the design to support GCC's needs.


-Chris



Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Mark Mitchell

Diego Novillo wrote:

We've started working on the driver and WPA components for whopr.
These are some of our initial thoughts and implementation strategy.  I
have linked these to the WHOPR page as well.  I'm hoping we can
discuss these at the Summit BoF, so I'm posting them now to start the
discussion.



== Repackaging ==
Under this proposal, WPA repackages its input files. 


FWIW, I'd suggest going this way.  I agree that this is probably the way 
to go in the long term, and avoiding the throw-away stage seems beneficial.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Chris Lattner


On Jun 4, 2008, at 12:27 AM, Rafael Espindola wrote:

Is there a specific reason you don't use the LLVM LTO interface?   
It seems

to be roughly the same as your proposed interface:

a) it has a simple C interface like your proposed one
b) it is already implemented in one system linker (Apple's), so GCC  
would
just provide its own linker plugin and it would work on apple  
platforms

c) it is richer than your interface
d) it is battle tested, and exists today
e) it is completely independent of llvm (by design)
f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html

Is there something specific you don't like about the LLVM interface?



We are still discussing how we are going to implement, so the API is
still not final. Some things that have been pointed out:


Hey Rafael!


*) Plugins could have other uses and the naming used on the LLVM LTO
interface is LTO specific.


The LLVM interface uses an lto_ prefix.  This interface is used by nm/ 
ar/etc as well as the linker.  Is there something specific about lto_  
that is bad?

http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup


*) We have a normal symbol table on the .o files. It is not clear if
we should assume that it will always be the case. If so, we don't need
the API part that handles that.


This seems like a pretty minor point, but it would be easy to either:

1) make this an optional interface
2) make the plugin implement the symtab interfaces, but query the ELF  
symbol table instead of the LTO symbol table if possible.



*) How do you handle the case of multiple symbols with the same name
(say a weak and a strong one)? lto_codegen_add_must_preserve_symbol
has a char* argument. How does it know which symbol we are talking
about?


The lto_symbol_attributes enum specifies linkage.


*) To save memory, one option is to have the plugin exec WPA and WPA
exec the linker again with the new objects. In this case the api
should be a bit different.


That's an interesting idea, but it is very unclear to me whether it  
would save a significant amount of memory.  Operating system VM  
systems are pretty good at paging out data that isn't used (e.g.  
the .o files the linker loaded into memory that exist when WPA is  
going on).


-Chris


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Chris Lattner


On Jun 4, 2008, at 9:29 AM, Chris Lattner wrote:


Suppose the linker is invoked on a
sequence of object files, some with with LTO information, some
without, all interspersed.  Suppose some symbols are defined in
multiple .o files, through the use of common symbols, weak symbols,
and/or section groups.  The LLVM interface simply passes each object
file to the plugin.


No, the native linker handles all the native .o files.


Incidentally, this is very easy to verify, as you can download this  
today and try it out.  LTO works fine in the Xcode 3.1 beta, which is  
available off developer.apple.com, including when you mix and match  
LLVM-compiled LTO .o files with GCC compiled ones.


For example, this works fine and does LTO across a.c/b.cpp/c.m:

llvm-gcc a.c   -O4 -o a.o -c
llvm-g++ b.cpp -O4 -o b.o -c
llvm-gcc c.m   -O4 -o c.o -c
gcc d.m-O3 -o d.o -c
g++ a.o b.o c.o d.o -o a.out

-Chris


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Rafael Espindola
> Hey Rafael!
Hello!

>> *) Plugins could have other uses and the naming used on the LLVM LTO
>> interface is LTO specific.
>
> The LLVM interface uses an lto_ prefix.  This interface is used by nm/ar/etc
> as well as the linker.  Is there something specific about lto_ that is bad?
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

This is a minor comment. All the design is done based on what LTO
needs, but since it could be possible to use plugins for other things
the proposed API uses a generic prefix.

>> *) We have a normal symbol table on the .o files. It is not clear if
>> we should assume that it will always be the case. If so, we don't need
>> the API part that handles that.
>
> This seems like a pretty minor point, but it would be easy to either:
>
> 1) make this an optional interface
> 2) make the plugin implement the symtab interfaces, but query the ELF symbol
> table instead of the LTO symbol table if possible.

Sure. There is just the issue of the many function calls. Not sure how
expensive this is. Maybe have the plugin construct a symbol table with
everything that is on the file?

>> *) How do you handle the case of multiple symbols with the same name
>> (say a weak and a strong one)? lto_codegen_add_must_preserve_symbol
>> has a char* argument. How does it know which symbol we are talking
>> about?
>
> The lto_symbol_attributes enum specifies linkage.

That allows the plugin to pass information to the linker. But if there
are two symbols named "foo". How can the linker instruct the plugin to
generate code for only one? The function that LLVM uses is

lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*)

right? Maybe adding a lto_symbol_attributes argument would be enough,
but having an abstract object is probably better.

>> *) To save memory, one option is to have the plugin exec WPA and WPA
>> exec the linker again with the new objects. In this case the api
>> should be a bit different.
>
> That's an interesting idea, but it is very unclear to me whether it would
> save a significant amount of memory.  Operating system VM systems are pretty
> good at paging out data that isn't used (e.g. the .o files the linker loaded
> into memory that exist when WPA is going on).

Sure. Again, the document is in a early stage, and we listed most of
the options. Restarting the linker is not a very popular option, but
might be worth trying.

> -Chris
>


Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Kenneth Zadeck

Ian Lance Taylor wrote:

Kenneth Zadeck <[EMAIL PROTECTED]> writes:

  

I think that one thing that the gcc community should understand is
that to a great extent whopr is a google thing.   All of the documents
are drafted by google people, in meetings that are only open to google
people and it is only after these documents have been drafted do the
people who are outside of google who are working on lto, like Honza
and myself, see the documents and get to comment.  The gcc community
never sees the constraints, deadlines, needs, or benchmarks that are
motivating the decisions that are made in the whopr documents.



Every new gcc development starts that way.  Somebody has to put
together the initial proposal.  How many people were invited to work
on the initial LTO proposal before it was sent out?  Did anybody
outside of Red Hat see the tree-ssa proposal before it was sent out?

The WHOPR document has been out there for some time, and it was sent
out before any implementation work started.  There is no Google cabal
pushing it.  There is no secret information behind it, no constraints
or deadlines or benchmarks.  We did have the advantage of talking to
Google employees about their experience with LTO-style work done at
Intel and HP and Transmeta.  Some of the people we talked to have no
plans or interest in working on gcc, and it would not be fair to rope
them into the conversation further.  Google's needs are clear: we have
large programs.

Let's deal with these issues on the technical merits, not on
organizational issues.  If Google were dumping code on gcc, you would
have a legitimate complaint.  Here Google is proposing plans before
any work is started.  You seem to be complaining that the community
should have seen the plans at an earlier stage.  That makes no sense.
They are still just plans, they were based on all of two days of
meetings and discussions, and they are still completely open to
discussion and change.


  
Ian, i am not dumping on google.   But there is a particular perspective 
that you have which is driven by your legitimate need to handle very 
large applications.  This perspective may not be shared by the rest of 
the gcc community.   I was really only pointing that out.   In 
particular, there are a lot of decisions that are being made in whopr to 
support very large applications that are done so at the expense of 
compiling modest and even large applications.  I do not necessarily 
disagree with these decisions, but I think that it is very important to 
get that out in front of everyone and let the community come to an 
informed consensus.  


Honza and I plan, and are implementing, a system where most, but
probably all of the ipa passes, will be able to work in an environment
where the entire call graph and all of the decls and types are
available.  I.e. only the function bodies are missing.In this
environment, we plan to do all of the interprocedural analysis and
generate work orders that will be applied to each function.  



I don't see that as being opposed to the WHOPR ideas.  It's not like
WHOPR will prohibit that approach.  It's a limiting case.

  



In particular, as consumer
machines get larger memories and more processors, the assumption that
we cannot see all of the functions bodies gets more questionable,
especially for modest sized apps that are the staple of the gcc
community.



I question that assumption, and I especially question any assumption
that gcc should only work for modest sized apps.

  
Ian, there are tradeoffs here.   My point is that there are a lot of 
things that can be done with modest sized apps or libraries that cannot 
be done on google sized applications.  Remember that the majority of the 
world outside of google neither has google sized applications or could 
compile them if they did.


While i agree that some form of lto needs to support monster apps, that 
should not inhibit us from supporting transformations or models of 
compilation that are only practical with 100k line programs.  




Ian
  




Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Diego Novillo
On Wed, Jun 4, 2008 at 12:50, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

> While i agree that some form of lto needs to support monster apps, that
> should not inhibit us from supporting transformations or models of
> compilation that are only practical with 100k line programs.

Of course not.  That was never the intent.  Supporting small/medium
sized applications is inherent in the WHOPR design.  If we can't
handle that efficiently, then we have a design/implementation bug.

While we (Google) are mostly interested in summary-based IPA for very
large applications, we do not want the design to negate other uses of
LTO.  WHOPR is designed to support the whole spectrum, from
small/medium sized applications where the whole program fits in
memory, to extremely large applications where memory/computing
requirements are prohibitive for a single machine.

In practice, the full distributed model that WHOPR offers will not
need to be triggered for small applications, only very large ones.
Ideally, we should be able to hide all that behind 'gcc -flto' and let
the compiler decide how to operate.

The natural restriction is that passes of type SMALL_IPA will not be
able to run when the full distributed version is being used.  Again,
this is something I expect the compiler to be able to figure out for
itself.


Diego.


MIrror

2008-06-04 Thread Alex Korolev
Hello,

  Karl Berry (GNU webmaster) ask me contact with you about new GCC mirror.
  It's up already. Please check http://gcc.releasenotes.org/

  Let me know if you need something else.

  Thanks.

Alex Korolev
[EMAIL PROTECTED]



Is this a typo in setup_incoming_varargs_64?

2008-06-04 Thread H.J. Lu
Hi,

setup_incoming_varargs_64 in i386.c has

  /* Compute address to jump to :
 label - 5*eax + nnamed_sse_arguments*5  */
  tmp_reg = gen_reg_rtx (Pmode);
  nsse_reg = gen_reg_rtx (Pmode);
  emit_insn (gen_zero_extendqidi2 (nsse_reg, gen_rtx_REG (QImode, AX_REG)));
  emit_insn (gen_rtx_SET (VOIDmode, tmp_reg,
  gen_rtx_MULT (Pmode, nsse_reg,
GEN_INT (4;
  if (cum->sse_regno)
emit_move_insn
  (nsse_reg,
   gen_rtx_CONST (DImode,
  gen_rtx_PLUS (DImode,
label_ref,
GEN_INT (cum->sse_regno * 4;
  else
emit_move_insn (nsse_reg, label_ref);
  emit_insn (gen_subdi3 (nsse_reg, nsse_reg, tmp_reg));

The comments don't match the code. Shout the comments be

 /* Compute address to jump to :
 label - 4*eax + nnamed_sse_arguments*4  */

Thanks.

-- 
H.J.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Kenneth Zadeck <[EMAIL PROTECTED]> writes:

> In particular, there are a lot of decisions that are being made in
> whopr to support very large applications that are done so at the
> expense of compiling modest and even large applications.  I do not
> necessarily disagree with these decisions, but I think that it is very
> important to get that out in front of everyone and let the community
> come to an informed consensus.  

If WHOPR does not work efficiently on small programs, then that is
clearly a problem.  But I don't see that in the design.

Ian


sshproxy.sourceware.org down?

2008-06-04 Thread Uros Bizjak

Hello!

Is there something wrong with the connection to sshproxy.sourceware.org 
[1]? The host is unreachable for a couple of days.


[1] http://gcc.gnu.org/ml/gcc/2005-10/msg00475.html


Uros.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Devang Patel
Also, returning a single object file restricts the possibilities.   
The

design of WHOPR, as I understand it, permits creating several
different object files in parallel based on a fast analysis of which
code should be compiled together.  When the linker supports  
concurrent

linking, it will be desirable to be able to provide it with each
object file as it is completed.


By definition, inter modular optimizer (aka lto) blurs object files  
boundaries. Typically, it will construct and walk combined call graph  
instead of dividing work based on input files. It does not add lots of  
value to preserve one to one direct relationship between optimizer  
input files and output files. I agree, it makes sense to have an  
additional interface to incrementally feed linker optimized chunks of  
code to take advantage of concurrent linking supported by the linker.


-
Devang





Fwd: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ollie Wild
Reposting to the gcc list since my first email got bounced.

Ollie



On Tue, Jun 3, 2008 at 7:26 PM, Chris Lattner <[EMAIL PROTECTED]> wrote:

> This is a very interesting design, and a very nice evolution from the 
> previous proposal.  I'm not completely clear on the difference between LTO 
> and whopr here.  Is LTO the mode "normal people" will use, and whopr is the 
> mode where "people with huge clusters" will use?  Will LTO/whopr support 
> useful optimization on common multicore machines?

WHOPR is just an extension of the original LTO proposal.  It seeks to
augment the LTO design by providing a mechanism for parallelizing the
final (link-time) optimization phase.  The design has been based on a
distcc-like distributed compilation model, so it should be beneficial
even to those with small to moderate sized clusters.  This doesn't
preclude parallelization on multi-core machines (and that has been
discussed to some degree), but I at least have treated that as a
secondary consideration.  A good example of this is in the WPA
discussion below.  On a multicore machine, repackaging doesn't make a
lot of sense because the compiler can efficiently cherry-pick function
bodies from different files.  However, in a distcc compiler farm, the
entirety of a file must be transferred, so this would result in a lot
of excess network overhead.

> Are you focusing on inlining here as a specific example, or is this the only 
> planned IPA optimization that can use summaries?  It seems unfortunate to 
> design a system where inlining is the only real IPO transformation you can 
> do.  Does adding new interprocedural optimizations require adding whole new 
> phases?

The WPA document is a cleaned up transcription of an internal document
I wrote.  During the transcription, some context got lost.  It's not
meant to be a description of a final implementation but rather a
pro/con comparison between two possible draft implementations.  The
goal is to get some basic infrastructure in place so that we can start
experimenting with it and better parallelize additional work.
Inlining is chosen as an initial feature because it's relatively easy
to implement and can be (coarsely) handled without support for
serializing IPA summary information.  Other IPA passes (e.g.
inter-procedural constant propagation) require additional
serialization capabilities (which Kenneth Zadeck is working on now).

Ollie


Fwd: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ollie Wild
Reposting this as well.

Ollie


On Wed, Jun 4, 2008 at 9:14 AM, Chris Lattner <[EMAIL PROTECTED]> wrote:
>
> 1) start with all code in memory and see how far you can get.  It seems that 
> on reasonable developer machines (e.g. 2GB memory) that we can handle C 
> programs on the order of a million lines of code, or C++ code on the order of 
> 400K lines of code without a problem with LLVM.

This is essentially what the lto branch does today, and I don't see
any reason to disable this feature.  In the language of the WHOPR
design, the lto branch supports LGEN + LTRANS, with WPA bypassed
completely.  For implementing WPA, my intention is to add a new flag
(-fpartition or whatever else people think is suitable) to instruct
the lto1 front end to perform partitioning (aka repackaging) of .o
files, execute summary IPA analysese, and kick off a separate LTRANS
phase.

This gives us two modes of operation: one in which all object files
are loaded into memory and optimized using the full array of passes
available to GCC; and one which does some high-level analysis on the
whole program, partitions the program into smaller pieces, and does
more detailed analysis + grunt work on the smaller pieces.

>
> 2) start leaving function bodies on disk, use lazily accesses, and a cache 
> manager to keep things in memory when needed.  I think this will let us scale 
> to tens or hundreds of million line code bases them.  I see no reason to take 
> a whopr approach just to be able to handle large programs.

In addition to memory consumption, there is also the question of time
consumption.  Alternative LTO implementations by HP, Intel, and others
follow this model and spend multiple hours optimizing even moderately
large programs.

Ollie


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Chris Lattner <[EMAIL PROTECTED]> writes:

>> * The return value of lto_module_get_symbol_attributes is not
>>  defined.
>
> Ah, sorry about that.  Most of the details are actually in the public
> header.  The result of this function is a 'lto_symbol_attributes'
> bitmask.  This should be more useful and revealing:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

>From an ELF perspective, this doesn't seem to have a way to indicate a
common symbol, and it doesn't provide the symbol's type.  It also
doesn't have a way to indicate section groups.

(How do section groups work in Mach-O?  Example is a C++ template
function with a static constant array which winds up in the .rodata
section.  Section groups permit discarding the array when we discard
the function code.)


>> * Interfaces like lto_module_get_symbol_name and
>>  lto_codegen_add_must_preserve_symbol are inefficient when dealing
>>  with large symbol tables.
>
> The intended model is for the linker to query the LTO plugin for its
> symbol list and build up its own linker-specific hash table.  This way
> you don't need to force the linker to use the plugin's data structure
> or the plugin to use the linker data structure.  We converged on this
> approach after trying it the other way.
>
> Does this make sense, do you have a better idea?

In gcc's LTO approach, I think the linker will already have access to
the symbol table anyhow.  But my actual point here is that requiring a
function call for every symbol is inefficient.  These functions should
take an array and a count.  There can be hundreds of thousands of
entries in a symbol table, and the interface should scale accordingly.


>> The LLVM
>> interface does not do that.
>
> Yes it does, the linker fully handles symbol resolution in our model.
>
>> Suppose the linker is invoked on a
>> sequence of object files, some with with LTO information, some
>> without, all interspersed.  Suppose some symbols are defined in
>> multiple .o files, through the use of common symbols, weak symbols,
>> and/or section groups.  The LLVM interface simply passes each object
>> file to the plugin.
>
> No, the native linker handles all the native .o files.
>
>>  The result is that the plugin is required to do
>> symbol resolution itself.  This 1) loses one of the benefits of having
>> the linker around; 2) will yield incorrect results when some non-LTO
>> object is linked in between LTO objects but redefines some earlier
>> weak symbol.
>
> In the LLVM LTO model, the plugin only needs to know about its .o
> files, and the linker uses this information to reason about symbol
> merging etc.  The Mac OS X linker can even do dead code stripping
> across Macho .o files and LLVM .bc files.

To be clear, when I said object file here, I meant any input file.
You may have understood that.

In ELF you have to think about symbol overriding.  Let's say you link
a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
definition.  c.o has a weak definition.  a.o and c.o have LTO
information, b.o does not.  ELF requires that a.o call the symbol from
b.o, not the symbol from c.o.  I don't see how to make that work with
the LLVM interface.

This is not a particularly likely example, of course.  People rely on
this sort of symbol overriding quite a bit, but it's unlikely that a.o
and c.o would have LTO information while b.o would not.  However,
given that we are designing an interface, I think we should design it
so that correctness is possible.


> Further other pieces of the toolchain (nm, ar, etc) also use the same
> interface so that they can return useful information about LLVM LTO
> files.

Useful, but as I understand it gcc's LTO files will have that
information anyhow.


> This is our second major revision of the LTO interfaces, and the
> interface continues to slowly evolve.  I think it would be great to
> work with you guys to extend the design to support GCC's needs.

Agreed.

Ian


Re: [lto] Streaming out language-specific DECL/TYPEs

2008-06-04 Thread Mark Mitchell

Jan Hubicka wrote:


Sure if it works, we should be lowering the types during gimplification
so we don't need to store all this in memory...
But C++ FE still use its local data later in stuff like thunks, but we
will need to cgraphize them anyway.


I agree.  The only use of language-specific DECLs and TYPEs after 
gimplification should be for generating debug information.  And if 
that's already been done, then you shouldn't need it at all.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ollie Wild
On Wed, Jun 4, 2008 at 12:33 PM, Chris Lattner <[EMAIL PROTECTED]> wrote:
>
> Right, I understand that you design "stacks" on LTO.  It just seems strange
> to work on the advanced stuff before the basic GCC LTO stuff is close to
> being useful.

To some degree, we're scratching our own itch here.  Basic LTO doesn't
help us much.  Obviously, though, we want to implement this in a way
which is generally useful to the external community.  The scalability
techniques required to work with distcc are different from the
techniques which are useful on a single machine.

> I don't know anything about the implementation of the HP or Intel LTO
> implementation, but it sounds like there is much room for improvement there.
>  With LLVM LTO, we see a compile-time slowdown on the order of 30-50% switch
> from O3 to O4, not an order of magnitude.  There is also still much room for
> improvement in the LLVM implementation of course.

I think we're working from different baselines.  We use distributed
techniques for compiling individual .o files.  With a tool like
distcc, you can get something on the order of 20x speedup.  Linking
becomes 20% or more of total execution time.  LTO *is* an order of
magnitude increase compared to a basic link operation.

Ollie


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Kenneth Zadeck

Ollie Wild wrote:
On Wed, Jun 4, 2008 at 9:14 AM, Chris Lattner <[EMAIL PROTECTED] 
> wrote:



1) start with all code in memory and see how far you can get.  It
seems that on reasonable developer machines (e.g. 2GB memory) that
we can handle C programs on the order of a million lines of code,
or C++ code on the order of 400K lines of code without a problem
with LLVM.


This is essentially what the lto branch does today, and I don't see 
any reason to disable this feature.  In the language of the WHOPR 
design, the lto branch supports LGEN + LTRANS, with WPA bypassed 
completely.  For implementing WPA, my intention is to add a new flag 
(-fpartition or whatever else people think is suitable) to instruct 
the lto1 front end to perform partitioning (aka repackaging) of .o 
files, execute summary IPA analysese, and kick off a separate LTRANS 
phase.
This is what lto does today because this was the easiest thing to do to 
be able to continue to develop and test the other parts of the 
system.it is stupidly implemented - it required only five lines of 
code (two of them being curly braces according to the gcc coding 
standards) so it allowed us to work on other things.


However this was not the point of my mail. The point of my mail was 
whopr's design that seems to basically sacrifice almost all 
interprocedural analysis and transformation except for inlining in order 
to scale so as to be able to compile programs of such size that most of 
the gcc community (including the distros) will never see.   I realize 
that there is handwaving that sure, there is this or that could possibly 
be implemented by someone else for programs of modest scale, but that is 
not what whopr is all about.  

I do not want to imply that google's needs are not real and that they 
should not use gcc to fulfill them.   I only want to raise the point 
that whopr is at one end of a spectrum in which REAL tradeoffs are being 
made in the quality of compilation vs size of program handled and there 
there is a real possibility that being able to handle an entire program 
with these tradeoffs is going to yield the fastest program or a 
reasonable compilation time.


kenny


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Kenneth Zadeck <[EMAIL PROTECTED]> writes:

> I do not want to imply that google's needs are not real and that they
> should not use gcc to fulfill them.   I only want to raise the point
> that whopr is at one end of a spectrum in which REAL tradeoffs are
> being made in the quality of compilation vs size of program handled
> and there there is a real possibility that being able to handle an
> entire program with these tradeoffs is going to yield the fastest
> program or a reasonable compilation time.

What you need to ask is whether WHOPR is going to slow down or prevent
handling an entire program.  If so, then why, and how can we avoid
that?

(Your answer should not be something along the lines of "people will
be working on WHOPR rather than something else."  People will work on
what they find to be important.)

Ian


Re: sshproxy.sourceware.org down?

2008-06-04 Thread Ian Lance Taylor
Uros Bizjak <[EMAIL PROTECTED]> writes:

> Is there something wrong with the connection to
> sshproxy.sourceware.org [1]? The host is unreachable for a couple of
> days.

I had to change the IP address.  It should be working at the new IP
address (64.13.131.149).  You can wait a few more hours, or you can
flush your DNS cache.  Sorry about the inconvenience.

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Nick Kledzik


On Jun 4, 2008, at 12:44 PM, Ian Lance Taylor wrote:

Chris Lattner <[EMAIL PROTECTED]> writes:


* The return value of lto_module_get_symbol_attributes is not
defined.


Ah, sorry about that.  Most of the details are actually in the public
header.  The result of this function is a 'lto_symbol_attributes'
bitmask.  This should be more useful and revealing:
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup


From an ELF perspective, this doesn't seem to have a way to indicate a
common symbol, and it doesn't provide the symbol's type.

The current lto interface does return whether a  symbol is
REGULAR, TENTATIVE, WEAK_DEF, or UNDEFINED.  There is also
CODE vs DATA which could be used to indicate STT_FUNC vs STT_OBJECT.



 It also
doesn't have a way to indicate section groups.

(How do section groups work in Mach-O?  Example is a C++ template
function with a static constant array which winds up in the .rodata
section.  Section groups permit discarding the array when we discard
the function code.)
Neither mach-o or llvm have group comdat.   We just rely on dead code  
stripping.

If the temple function was coalesced away, there would no longer be
a reference to that static const array, so it would get dead stripped.

Dead stripping is an important pass in LTO.



* Interfaces like lto_module_get_symbol_name and
lto_codegen_add_must_preserve_symbol are inefficient when dealing
with large symbol tables.


The intended model is for the linker to query the LTO plugin for its
symbol list and build up its own linker-specific hash table.  This  
way

you don't need to force the linker to use the plugin's data structure
or the plugin to use the linker data structure.  We converged on this
approach after trying it the other way.

Does this make sense, do you have a better idea?


In gcc's LTO approach, I think the linker will already have access to
the symbol table anyhow.  But my actual point here is that requiring a
function call for every symbol is inefficient.  These functions should
take an array and a count.  There can be hundreds of thousands of
entries in a symbol table, and the interface should scale accordingly.

I see you have your gold hat on here!  The current interface is
simple and clean.  If it does turn out that repeated calls to  
lto_module_get_symbol*

are really a bottleneck, we could add a "bulk" function.



The LLVM
interface does not do that.


Yes it does, the linker fully handles symbol resolution in our model.


Suppose the linker is invoked on a
sequence of object files, some with with LTO information, some
without, all interspersed.  Suppose some symbols are defined in
multiple .o files, through the use of common symbols, weak symbols,
and/or section groups.  The LLVM interface simply passes each object
file to the plugin.


No, the native linker handles all the native .o files.


The result is that the plugin is required to do
symbol resolution itself.  This 1) loses one of the benefits of  
having

the linker around; 2) will yield incorrect results when some non-LTO
object is linked in between LTO objects but redefines some earlier
weak symbol.


In the LLVM LTO model, the plugin only needs to know about its .o
files, and the linker uses this information to reason about symbol
merging etc.  The Mac OS X linker can even do dead code stripping
across Macho .o files and LLVM .bc files.


To be clear, when I said object file here, I meant any input file.
You may have understood that.

In ELF you have to think about symbol overriding.  Let's say you link
a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
definition.  c.o has a weak definition.  a.o and c.o have LTO
information, b.o does not.  ELF requires that a.o call the symbol from
b.o, not the symbol from c.o.  I don't see how to make that work with
the LLVM interface.
This does work.  There are two parts to it.  First the linker's master  
symbol

table sees the strong definition of S in b.o and the weak in c.o and
decides to use the strong one from b.o.  Second (because of that) the  
linker

calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
sees it has a weak global function S and it cannot inline those.  Put  
together

the LTO engine does generate a copy of S, but the linker throws it away
and uses the one from b.o.

-Nick



Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Nick Kledzik <[EMAIL PROTECTED]> writes:

> On Jun 4, 2008, at 12:44 PM, Ian Lance Taylor wrote:
>> Chris Lattner <[EMAIL PROTECTED]> writes:
>>
 * The return value of lto_module_get_symbol_attributes is not
 defined.
>>>
>>> Ah, sorry about that.  Most of the details are actually in the public
>>> header.  The result of this function is a 'lto_symbol_attributes'
>>> bitmask.  This should be more useful and revealing:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup
>>
>> From an ELF perspective, this doesn't seem to have a way to indicate a
>> common symbol, and it doesn't provide the symbol's type.
> The current lto interface does return whether a  symbol is
> REGULAR, TENTATIVE, WEAK_DEF, or UNDEFINED.  There is also
> CODE vs DATA which could be used to indicate STT_FUNC vs STT_OBJECT.

By "type" I mean STT_FUNC or STT_OBJECT.  I took CODE vs. DATA to
refer to the section in which the symbol is defined (SHF_EXECINSTR
vs. SHF_WRITE).  But, you're right, with appropriate squinting CODE
vs. DATA is probably adequate.


> I see you have your gold hat on here!  The current interface is
> simple and clean.  If it does turn out that repeated calls to
> lto_module_get_symbol*
> are really a bottleneck, we could add a "bulk" function.

I would like to add the bulk function now, because I know that we will
want it.


 The LLVM
 interface does not do that.
>>>
>>> Yes it does, the linker fully handles symbol resolution in our model.
>>>
 Suppose the linker is invoked on a
 sequence of object files, some with with LTO information, some
 without, all interspersed.  Suppose some symbols are defined in
 multiple .o files, through the use of common symbols, weak symbols,
 and/or section groups.  The LLVM interface simply passes each object
 file to the plugin.
>>>
>>> No, the native linker handles all the native .o files.
>>>
 The result is that the plugin is required to do
 symbol resolution itself.  This 1) loses one of the benefits of
 having
 the linker around; 2) will yield incorrect results when some non-LTO
 object is linked in between LTO objects but redefines some earlier
 weak symbol.
>>>
>>> In the LLVM LTO model, the plugin only needs to know about its .o
>>> files, and the linker uses this information to reason about symbol
>>> merging etc.  The Mac OS X linker can even do dead code stripping
>>> across Macho .o files and LLVM .bc files.
>>
>> To be clear, when I said object file here, I meant any input file.
>> You may have understood that.
>>
>> In ELF you have to think about symbol overriding.  Let's say you link
>> a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
>> definition.  c.o has a weak definition.  a.o and c.o have LTO
>> information, b.o does not.  ELF requires that a.o call the symbol from
>> b.o, not the symbol from c.o.  I don't see how to make that work with
>> the LLVM interface.
> This does work.  There are two parts to it.  First the linker's master
> symbol
> table sees the strong definition of S in b.o and the weak in c.o and
> decides to use the strong one from b.o.  Second (because of that) the
> linker
> calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
> sees it has a weak global function S and it cannot inline those.  Put
> together
> the LTO engine does generate a copy of S, but the linker throws it away
> and uses the one from b.o.

OK, for that case.  But are you asserting that this works in all
cases?  Should I come up with other examples of mixing LTO objects
with non-LTO objects using different types of symbols?

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Diego Novillo
On Wed, Jun 4, 2008 at 16:03, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

> However this was not the point of my mail. The point of my mail was whopr's
> design that seems to basically sacrifice almost all interprocedural analysis
> and transformation except for inlining in order to scale so as to be able to
> compile programs of such size that most of the gcc community (including the
> distros) will never see.

There is absolutely nothing in WHOPR's design that sacrifices IPA
transformations.  I have tried to explain this several times, but I
seem to have failed.

I will try one more time.

Suppose that you have a program with a callgraph in the million node
range and no way to hold it in memory.  With the current design, you
either can't run IPA because of memory/computing limitations or you
can start loading and unloading function bodies, types, symbols
on-demand as you go in and out of each node in the callgraph.

WHOPR simply adds another alternative, if you are willing to only run
summary-based transformations, we can split the analysis and
transformation phases in two such that you can parallelize the work
over a cluster or a large SMP.  That's it.  Nothing more.

All the other transformations may still be executed, nothing in the
design prohibits this.  If the program is small enough to fit on one
machine, WHOPR simply runs the way LTO operates today.  The only case
where that can't happen is when you committed to spreading this out
over a cluster.


> I only want to raise the point that whopr is
> at one end of a spectrum in which REAL tradeoffs are being made in the
> quality of compilation vs size of program handled and there there is a real
> possibility that being able to handle an entire program with these tradeoffs
> is going to yield the fastest program or a reasonable compilation time.

How is this detrimental to the rest of LTO?  Your point seems moot.
We are simply adding a new feature on top of the basic LTO machinery
that we are also helping to build.  I still don't see what you find so
objectionable about this.


Diego.


Re: [MELT] branch Melt- bootstrapped & questions...

2008-06-04 Thread Tom Tromey
> "Basile" == Basile STARYNKEVITCH <[EMAIL PROTECTED]> writes:

Basile> 1. Should I avoid committing warm-basilys-0.c frequently to
Basile> lower the Subversion server disk consumption?

I don't know the answer to this one.

Basile> 2. I cannot figure out if tbhe GCC bugzilla can be used for work on a
Basile> branch, not only on the trunk (or older releases).

Apparently you can get a branch added as a "version" in bugzilla, then
file bugs against that version.  I see a few branches in there.  I
don't know if this is an ongoing practice or just historical, though.

Basile> 3. Are shared libraries *.la obtained by libtool usable inside
Basile> gcc/Makefile.in?

Not directly, but I suppose you can modify the Makefile to run libtool.

Tom


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Nick Kledzik


On Jun 4, 2008, at 1:45 PM, Ian Lance Taylor wrote:


The result is that the plugin is required to do
symbol resolution itself.  This 1) loses one of the benefits of
having
the linker around; 2) will yield incorrect results when some non- 
LTO

object is linked in between LTO objects but redefines some earlier
weak symbol.


In the LLVM LTO model, the plugin only needs to know about its .o
files, and the linker uses this information to reason about symbol
merging etc.  The Mac OS X linker can even do dead code stripping
across Macho .o files and LLVM .bc files.


To be clear, when I said object file here, I meant any input file.
You may have understood that.

In ELF you have to think about symbol overriding.  Let's say you  
link

a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
definition.  c.o has a weak definition.  a.o and c.o have LTO
information, b.o does not.  ELF requires that a.o call the symbol  
from
b.o, not the symbol from c.o.  I don't see how to make that work  
with

the LLVM interface.
This does work.  There are two parts to it.  First the linker's  
master

symbol
table sees the strong definition of S in b.o and the weak in c.o and
decides to use the strong one from b.o.  Second (because of that) the
linker
calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
sees it has a weak global function S and it cannot inline those.  Put
together
the LTO engine does generate a copy of S, but the linker throws it  
away

and uses the one from b.o.


OK, for that case.  But are you asserting that this works in all
cases?  Should I come up with other examples of mixing LTO objects
with non-LTO objects using different types of symbols?
I don't claim our current implementation is bug free, but the lto  
interface

matches the Apple linker internal model, so we don't expect and have
not encountered any problems mixing mach-o and llvm bitcode files.

-Nick



Development process for i386 machine descriptions

2008-06-04 Thread Ty Smith

Hello,
   I'm a new developer to GCC and have been tasked with building a 
machine description for an x86 processor.  I have documentation 
regarding the pipeline stages, instruction latency, and even a number of 
special case optimization possibilities.  I have been adding small 
changes to i386.h/i386.c and have created a new machine description 
file.  In building the machine description, I have attempted to model 
the pipeline as best as I understand from the documentation given, but a 
number of aspects of GCC development elude me because I'm new to the 
development process.  Here are a list of questions I have that are 
specific to i386 machine descriptions.
1.) The processor_costs structure seems very limited, but seem very 
easily to "fill in" but are these costs supposed to be best or worst 
case?  For instance, many instructions with different sized operands 
vary in latency.
2.) I don't understand the meaning of the stringop_algs, scalar, vector, 
and branching costs at the end of the processor_cost structure.  Could 
someone give me an accurate description?
3.) The processor I am currently attempting to model is 
single-issue/in-order with a simple pipeline.  Stalls can occasionally 
occur in the fetch/decode/translate, but the core is the latency of 
instructions in the functional units in the execute stage.  What 
recommendations can anyone make to me for designing the DFA?  Should it 
just directly model the functional units latencies for certain insn types?


Because I'm new at this, any recommendations or assistance in this vein 
of development would be greatly appreciated.


Thank you,
Ty Smith



gcc-4.2-20080604 is now available

2008-06-04 Thread gccadmin
Snapshot gcc-4.2-20080604 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.2-20080604/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.2 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_2-branch 
revision 136377

You'll find:

gcc-4.2-20080604.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.2-20080604.tar.bz2 C front end and core compiler

gcc-ada-4.2-20080604.tar.bz2  Ada front end and runtime

gcc-fortran-4.2-20080604.tar.bz2  Fortran front end and runtime

gcc-g++-4.2-20080604.tar.bz2  C++ front end and runtime

gcc-java-4.2-20080604.tar.bz2 Java front end and runtime

gcc-objc-4.2-20080604.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.2-20080604.tar.bz2The GCC testsuite

Diffs from 4.2-20080528 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.2
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Diego Novillo
On Wed, Jun 4, 2008 at 15:33, Chris Lattner <[EMAIL PROTECTED]> wrote:

> Right, I understand that you design "stacks" on LTO.  It just seems strange
> to work on the advanced stuff before the basic GCC LTO stuff is close to
> being useful.

Not at all.  We are working on both fronts.


Diego.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Nick Kledzik <[EMAIL PROTECTED]> writes:

> I don't claim our current implementation is bug free, but the lto
> interface
> matches the Apple linker internal model, so we don't expect and have
> not encountered any problems mixing mach-o and llvm bitcode files.

Hmmm, OK, how about this example:

a.o: contains LTO information, refers to S
b.o: no LTO information, defines S
c.o: contains LTO information, defines S at version V, S/V is not hidden

In the absence of b.o, the reference to S in a.o will be resolved
against the definition of S in c.o.  In the presence of b.o, the
reference to S in a.o will be resolved against the definition of S in
b.o.

I suppose we could refuse to inline versioned symbols, but that
doesn't seem desirable since it is normally fine.

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Nick Kledzik


On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:

Nick Kledzik <[EMAIL PROTECTED]> writes:


I don't claim our current implementation is bug free, but the lto
interface
matches the Apple linker internal model, so we don't expect and have
not encountered any problems mixing mach-o and llvm bitcode files.


Hmmm, OK, how about this example:

a.o: contains LTO information, refers to S
b.o: no LTO information, defines S
c.o: contains LTO information, defines S at version V, S/V is not  
hidden


In the absence of b.o, the reference to S in a.o will be resolved
against the definition of S in c.o.  In the presence of b.o, the
reference to S in a.o will be resolved against the definition of S in
b.o.

I suppose we could refuse to inline versioned symbols, but that
doesn't seem desirable since it is normally fine.



As Chris mentioned earlier today, the Apple tool chain does not  
support versioned symbols.
But if versioned symbols are a naming convention (that is everything  
is encoded in
the symbol name), then this would work the same as your previous  
example.  Namely,
the linker would coalesce away S in c.o, which in turns tell the LTO  
engine that it
can't inline/optimize away c.o's S and after LTO is done, the linker  
throws away

the LTO generated S and uses b.o's S instead.

-Nick


On Jun 4, 2008, at 9:29 AM, Chris Lattner wrote:
When I look at the LLVM interface as described on that web page, I  
see

these issues, all fixable:
* No support for symbol versioning.


Very true.  I think it would great to work from a common model that  
can be extended to support both compilers.  Having a unified  
interface would be very useful, and we are happy to evolve the  
interface to suit more general needs.




Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Nick Kledzik <[EMAIL PROTECTED]> writes:

> On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:
>> Nick Kledzik <[EMAIL PROTECTED]> writes:
>>
>>> I don't claim our current implementation is bug free, but the lto
>>> interface
>>> matches the Apple linker internal model, so we don't expect and have
>>> not encountered any problems mixing mach-o and llvm bitcode files.
>>
>> Hmmm, OK, how about this example:
>>
>> a.o: contains LTO information, refers to S
>> b.o: no LTO information, defines S
>> c.o: contains LTO information, defines S at version V, S/V is not
>> hidden
>>
>> In the absence of b.o, the reference to S in a.o will be resolved
>> against the definition of S in c.o.  In the presence of b.o, the
>> reference to S in a.o will be resolved against the definition of S in
>> b.o.
>>
>> I suppose we could refuse to inline versioned symbols, but that
>> doesn't seem desirable since it is normally fine.
>
>
> As Chris mentioned earlier today, the Apple tool chain does not
> support versioned symbols.
> But if versioned symbols are a naming convention (that is everything
> is encoded in
> the symbol name), then this would work the same as your previous
> example.  Namely,
> the linker would coalesce away S in c.o, which in turns tell the LTO
> engine that it
> can't inline/optimize away c.o's S and after LTO is done, the linker
> throws away
> the LTO generated S and uses b.o's S instead.

Versioned symbols are not a naming convention, but they aren't all
that different from one.  Basically every symbol may have an optional
version, and when a symbol has a version the version may be hidden or
not.  A symbol definition with a hidden version may only be matched by
a symbol reference with that exact version.  A symbol definition with
a non-hidden version definition may be matched by a symbol reference
with the same name without a version.  This is most interesting in the
dynamic linker, of course.

How does the linker inform the plugin that the plugin is not permitted
to use c.o's S?

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Nick Kledzik


On Jun 4, 2008, at 5:39 PM, Ian Lance Taylor wrote:

Nick Kledzik <[EMAIL PROTECTED]> writes:


On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:

Nick Kledzik <[EMAIL PROTECTED]> writes:


I don't claim our current implementation is bug free, but the lto
interface
matches the Apple linker internal model, so we don't expect and  
have

not encountered any problems mixing mach-o and llvm bitcode files.


Hmmm, OK, how about this example:

a.o: contains LTO information, refers to S
b.o: no LTO information, defines S
c.o: contains LTO information, defines S at version V, S/V is not
hidden

In the absence of b.o, the reference to S in a.o will be resolved
against the definition of S in c.o.  In the presence of b.o, the
reference to S in a.o will be resolved against the definition of S  
in

b.o.

I suppose we could refuse to inline versioned symbols, but that
doesn't seem desirable since it is normally fine.



As Chris mentioned earlier today, the Apple tool chain does not
support versioned symbols.
But if versioned symbols are a naming convention (that is everything
is encoded in
the symbol name), then this would work the same as your previous
example.  Namely,
the linker would coalesce away S in c.o, which in turns tell the LTO
engine that it
can't inline/optimize away c.o's S and after LTO is done, the linker
throws away
the LTO generated S and uses b.o's S instead.


Versioned symbols are not a naming convention, but they aren't all
that different from one.  Basically every symbol may have an optional
version, and when a symbol has a version the version may be hidden or
not.  A symbol definition with a hidden version may only be matched by
a symbol reference with that exact version.  A symbol definition with
a non-hidden version definition may be matched by a symbol reference
with the same name without a version.  This is most interesting in the
dynamic linker, of course.

How does the linker inform the plugin that the plugin is not permitted
to use c.o's S?
In the previous case where S was weak, the call to  
lto_codegen_add_must_preserve_symbol("S")

caused the LTO engine to know it could not inline S (because it was
a weak definition and used outside the LTO usage sphere). And then
after LTO was done, the linker threw away the LTO produced S and used
the one from c.o instead.

In this case S is a regular symbol.  So the previous trick won't  
work.  Probably
the best solution would be to add a new  lto_ API to tell the LTO  
engine to
ignore a definition is already has.  It would make more sense to use  
this

new API in the weak case too.

-Nick



Re: [MELT] branch Melt- bootstrapped & questions...

2008-06-04 Thread Daniel Berlin
On Tue, Jun 3, 2008 at 5:41 AM, Basile STARYNKEVITCH
<[EMAIL PROTECTED]> wrote:
> Hello All,
>
> See http://gcc.gnu.org/wiki/MiddleEndLispTranslator for MELT
>
> The MELT branch bootstrapped, in the sense that the Lisp compiler is able to
> compile itself to C code. It is not the bootstrap in the usual GCC sense (a
> GCC being able to compile itself - currently MELT GCC behaves like the trunk
> in this respect). MELT branch is closely following the trunk: I am doing
> svnmerge merge more than once a week (without any issues so far)
>
> I've got some questions to the list:
>
> first, MELT is generating itself, so the generated C code
> gcc/warm-basilys-0.c file is committed to SVN. In that respect, it is like
> the configure script (a generated file which is in the SVN repository).
>
> second, the generated warm-basilys-0.c is quite big (more than 250KLOC or
> 13Mbytes), and a small change (e.g. one line) in melt/warm-basilys.bysl
> triggers many changes (e.g. a thousand lines changed) in the generated file
> warm-basilys-{0,1,2,3].c
>

Don't worry about it.

> I still feel that I have to commit it frequently, to make tracable and
> reproducible all my changes.
>
> Some questions:
>
> 1. Should I avoid committing warm-basilys-0.c frequently to lower the
> Subversion server disk consumption? I could do that, but then my changes
> would be less reproductible (in the sense that applying the diff between two
> commits to a source tree would not be enough to make it recompilable). My
> perception is that disk space is on svn://gcc.gnu.org/ cheap (but then, I am
> not paying it!).

I'll tell you if it starts causing a problem.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Devang Patel


On Jun 4, 2008, at 6:09 PM, Nick Kledzik wrote:



On Jun 4, 2008, at 5:39 PM, Ian Lance Taylor wrote:

Nick Kledzik <[EMAIL PROTECTED]> writes:


On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:

Nick Kledzik <[EMAIL PROTECTED]> writes:


I don't claim our current implementation is bug free, but the lto
interface
matches the Apple linker internal model, so we don't expect and  
have

not encountered any problems mixing mach-o and llvm bitcode files.


Hmmm, OK, how about this example:

a.o: contains LTO information, refers to S
b.o: no LTO information, defines S
c.o: contains LTO information, defines S at version V, S/V is not
hidden

In the absence of b.o, the reference to S in a.o will be resolved
against the definition of S in c.o.  In the presence of b.o, the
reference to S in a.o will be resolved against the definition of  
S in

b.o.

I suppose we could refuse to inline versioned symbols, but that
doesn't seem desirable since it is normally fine.



As Chris mentioned earlier today, the Apple tool chain does not
support versioned symbols.
But if versioned symbols are a naming convention (that is everything
is encoded in
the symbol name), then this would work the same as your previous
example.  Namely,
the linker would coalesce away S in c.o, which in turns tell the LTO
engine that it
can't inline/optimize away c.o's S and after LTO is done, the linker
throws away
the LTO generated S and uses b.o's S instead.


Versioned symbols are not a naming convention, but they aren't all
that different from one.  Basically every symbol may have an optional
version, and when a symbol has a version the version may be hidden or
not.  A symbol definition with a hidden version may only be matched  
by

a symbol reference with that exact version.  A symbol definition with
a non-hidden version definition may be matched by a symbol reference
with the same name without a version.  This is most interesting in  
the

dynamic linker, of course.

How does the linker inform the plugin that the plugin is not  
permitted

to use c.o's S?
In the previous case where S was weak, the call to  
lto_codegen_add_must_preserve_symbol("S")

caused the LTO engine to know it could not inline S (because it was
a weak definition and used outside the LTO usage sphere).


weak definition is the deciding factor. Optimizer can inline a  
function at the call site irrespective of whether it is used outside  
LTO usage sphere or not. The outside the LTO sphere use determines  
whether to preserve the function body, when the function is inlined  
everywhere inside LTO usage sphere,  or not.



And then
after LTO was done, the linker threw away the LTO produced S and used
the one from c.o instead.

In this case S is a regular symbol.  So the previous trick won't  
work.  Probably
the best solution would be to add a new  lto_ API to tell the LTO  
engine to
ignore a definition is already has.  It would make more sense to use  
this

new API in the weak case too.


If the optimizer can handle the symbol versioning case when one  
definition with version is defined in the same source file as the  
reference then you don't need new API.


For example,

a.o : refers to S and defines S at version V.
b.o : defines S.

Is inliner, at compile time allowed to inline uses of S in a.o using  
the definition it has ?


-
Devang



How to insert nops

2008-06-04 Thread Mohamed Shafi
Hello all,

For the big endian 16bit target that i am porting to gcc 4.1.2 a nop
is needed after a load instruction if the destination register of the
load instruction is used as the source in the next instruction. So

load R0, R3[2]
add R2, R0

needs a nop inserted in between the instructions. I have issues when
the operation is that of 32bit data types. The target doesn't have any
32bit instructions. All the 32bit move instructions are split after
reload. The following is an example where i am having issues

(set (reg:HI 2 R2)
(mem/s:HI (reg/f:HI 8 R8)

(set (reg:HI 3 R3)
(mem/s:HI (plus:HI (reg/f:HI 8 R8)
   (const_int 2 [0x2]))

(set (reg:SI 0 R0)
(minus:SI (reg:SI 0 R0)
   (reg:SI 2 R2)))


load R2, R8
load R3, R8[2]
sub R1, R3
subc R0, R2

For the above case no nop inserted. But because of the endianess src
reg gets used in the next instructions. How do i solve this?
I do nop insertion in reorg pass where i first do delay slot
scheduling. The follwoing is what i have in reorg() for nop insertion

  attr = get_attr_type (insn);
  if (next_insn && attr == TYPE_LOAD) {
  if (insn_true_dependent_p (insn, next_insn))
emit_insn_after (gen_nop (), insn);
}



static bool
insn_true_dependent_p (rtx x, rtx y)
{
  rtx tmp;

  if (! INSN_P (x) || ! INSN_P (y))
return 0;

  tmp = PATTERN (y);
  note_stores (PATTERN (x), insn_dependent_p_1, &tmp);
  return (tmp == NULL_RTX);
}

static void
insn_dependent_p_1 (rtx x, rtx pat ATTRIBUTE_UNUSED, void *data)
{
  rtx * pinsn = (rtx *) data;

  if (*pinsn && reg_mentioned_p (x, *pinsn))
*pinsn = NULL_RTX;
}


I think apart from the above cases i will also have cases where nop
gets inserted when it's not really required.
How will it be possible to solve this issue?

Regards,
Shafi


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
[ trimming the CC list ]

Devang Patel <[EMAIL PROTECTED]> writes:

> If the optimizer can handle the symbol versioning case when one
> definition with version is defined in the same source file as the
> reference then you don't need new API.
>
> For example,
>
> a.o : refers to S and defines S at version V.
> b.o : defines S.
>
> Is inliner, at compile time allowed to inline uses of S in a.o using
> the definition it has ?

The compiler doesn't know about symbol versions.  The way they work is
that you give the symbol a name like S_V, and then use an assembly
level .symver directive to say that S_V is really S at version V.  So
false inlining doesn't really arise in a single source file, unless
you do something rather odd.

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-04 Thread Ian Lance Taylor
Nick Kledzik <[EMAIL PROTECTED]> writes:

> In this case S is a regular symbol.  So the previous trick won't
> work.  Probably
> the best solution would be to add a new  lto_ API to tell the LTO
> engine to
> ignore a definition is already has.  It would make more sense to use
> this
> new API in the weak case too.

I would like to propose a farther reaching change: for each undefined
symbol reference, tell LTO the location of the symbol definition which
should be used.  The linker has to develop this information anyhow
during the course of the link.

Ian


Re: [whopr] plugin interface design

2008-06-04 Thread Chris Lattner

On Jun 4, 2008, at 10:39 PM, Ian Lance Taylor wrote:

Devang Patel <[EMAIL PROTECTED]> writes:


If the optimizer can handle the symbol versioning case when one
definition with version is defined in the same source file as the
reference then you don't need new API.

For example,

a.o : refers to S and defines S at version V.
b.o : defines S.

Is inliner, at compile time allowed to inline uses of S in a.o using
the definition it has ?


The compiler doesn't know about symbol versions.  The way they work is
that you give the symbol a name like S_V, and then use an assembly
level .symver directive to say that S_V is really S at version V.  So
false inlining doesn't really arise in a single source file, unless
you do something rather odd.


If you plan to do link-time optimization, you need to be able to  
capture all "assembler-level" features in your IR, somehow.  Magic  
that gets splatted out by the assembly printer will ideally be changed  
to update the IR in some form.


LLVM LTO does exactly this.  The front-end produces LLVM IR and does  
no .s file printing at all.  This IR goes through optimizations and at  
-O3 and lower is then run through the code generator which produces  
a .s file.


At -O4, the difference is that the code generator is not run, so LLVM  
IR is written to disk.  When LTO is run, we then load the LLVM IR for  
all the LTO'able files and then run an LLVM Linker across them.  This  
does an LLVM IR level link step, which is aware of the semantics of  
weak symbols, and many other details that come up when linking two  
files together (however, it has no idea where those two files came  
from, no idea about archive resolution, etc).


I don't know if LLVM properly supports symbol versions on ELF systems,  
I would guess not yet.  Since symbol versions affect linking, the LLVM  
linker would have to have enough information to "do the right thing".


Once a fully linked LLVM IR file is produced, the total result is sent  
through LLVM optimization passes, which then do interprocedural and  
intraprocedural optimizations to continue improving the code.  After  
that, the normal LLVM code generator is run to produce a native form  
of the LTO'd module and the system linker uses that to continue linking.



I don't know how closely your plans follow this model.  If you think  
this approach is reasonable, you really do need to reflect things like  
symbol versions in your IR somehow.  This compiler must know about  
versions, and when it does, it is easy to avoid optimizations that are  
invalid for them.


-Chris