Re: [gomp4] Building binaries for offload.

2013-10-18 Thread Jakub Jelinek
On Tue, Oct 15, 2013 at 02:03:48PM +0400, Kirill Yukhin wrote:
> Let me somewhat summarize current understanding of
> host binary linking as well as target binary building/linking.
> 
> We put code which supposed to be offloaded to dedicated sections,
> with name starting with gnu.target_lto_
> 
> At link time (I mean, link time of host app):
>   1. Generate dedicated data section in each binary (executable or DSO),
>  which'll be a placeholder for offloading stuff.
> 
>   2. Generate __OPENMP_TARGET__ (weak, hidden) symbol,
>  which'll point to start of the section mentioned in previous item.
> 
> This section should contain at least:
>   1. Number of targets
>   2. Size of offl. symbols table
> 
>   [ Repeat `number of targets']
>   2. Name of target
>   3. Offset to beginning of image to offload to that target
>   4. Size of image
> 
>   5. Offl. symbols table
> 
> Offloading symbols table will contain information about addresses
> of offloadable symbols in order to create mapping of host<->target
> addresses at runtime.
> 
> To get list of target addresses we need to have dedicated interface call
> to libgomp plugin, something like getTargetAddresses () which will
> query target for the list of addresses (accompanied with symbol names).
> To get this information target DSO should contain similar table of
> mapping symbols to address.

No, IMHO it is enough if the linker plugin finds the array of the target
addresses in the shared library it is going to embed (e.g. using some magic
symbol lookup, or named section) and just put a pointer to that place in the
payload into the __OPENMP_TARGET__ header structure, or whatever other way
will be best to provide that info to libgomp.
Say, if the pairs host_address, size are put into .gnu.target_addr section
in the host code and we arrange for the address to be put into vars in
.gnu.target_addr section in the .gnu.target_lto* IL for target, in the end
there will be a table of the target addresses in .gnu.target_addr section
in the target shared library.  So, either the __OPENMP_TARGET__ header
entry for the corresponding target (MIC in your case) would contain both the
host .gnu.target_addr table and a pointer to the .gnu.target_addr in the
payload, or the plugin could copy it over and create a table with
{ host_addr, size, target_addr_nonrelocated } and libgomp would just add a
load bias of the target shared library to the target address.

> Application is going to have single instance of libgomp, which
> in turn means that we'll have single splay tree holding information
> about mapping  (host -> target) for all DSO and executable.

One splay tree per device without shared address space in particular.

> We have at least 2 approaches of host->target mapping solving.
> 
> I. Preserve order of symbols appearance.
>Table row: [ address, size ]
>For routines, size to be 1
> 
>In order to initialize the table we need to get two arrays:
>of host and target addresses. The order of appearance of objects in
>these arrays must be the same. Having this makes mapping easy.
>We just need to find index if given address in array of host addrs and
>then dereference array of target addresses with index found.
> 
>The problem is that it unlikely will work when LTO of host is ON.
>I am also not sure, that order of handling objects on target is the same
>as on host.

I don't see why it wouldn't work, it will be the duty of the linker plugin
not to reorder the objects.

> II. Store symbol identifier along with address.
>   Table row: [ symbol_name, address, size]
>   For routines, size to be 1
> 
>   To construct the table of host addresses, at link
>   time we put all symbol (marked at compile time with dedicated
>   attribute) addresses to the table, accompanied with symbol names (they'll
>   serve as keys)
> 
>   During initialization of the table we create host->target address mapping
>   using symbol names as keys.

No, this is not going to work, as I said earlier, names aren't necessarily
unique for static functions.
> 
> The last thing I wanted to summarize: compiling target code.
> 
> We have 2 approaches here:
> 
>1. Perform WPA and extract sections, marked as target, into separate object
>   file. Then call target compiler on that object file to produce the 
> binary.
> 
>   As mentioned by Jakub, this approach will complicate debugging.
> 
>2. Pass fat object files directly to the target compiler (one CU at a 
> time).
>   So, for every object file we are going to call GCC twice:
> - Host GCC, which will compile all host code for every CU
> - Target GCC, which will compile all target code for every CU
> 
> I vote for option #2 as far as WPA-based approach complicates debugging.
> What do you guys think?

One needs to think about ld -r, the linker plugin might actually see
multiple CUs in one object file, so perhaps the target compiler will need to
be run on the same *.o file seve

[RFC] Include file structuring.

2013-10-18 Thread Andrew MacLeod
The tree-flow.h restructuring now brings us to the larger question of 
exactly how we want includes organized.  All the remaining includes in 
tree-ssa.h are required by numerous other .c files. The actual number of 
.c files which will need to #include any given file is:


(roughly calculated by the number of .o file which don't compile when 
removed from tree-ssa.h)

19  bitmap.h
77  gimple.h
61  gimple-ssa.h
17   cgraph.h
72  tree-cfg.h
46  tree-phinodes.h
69  ssa-iterators.h
82  tree-ssanames.h
38  tree-ssa-loop.h
37  tree-into-ssa.h
35  tree-dfa.h


The question is... Do we allow a .h file like this to be an aggregator, 
meaning a file can just include tree-ssa.h and get all this, or do we 
push it all down to the .c file, and actually include what each one 
needs.  Or do we pick a specific subset for tree-ssa.h...


So far I've removed the less commonly included files, ie, less than 10 
or so .c files need it.  That also gave me the opportunity to analyze 
and restructure the exports in those files a bit. That is a much larger 
job on these commonly included files, so I don't plan to do that sort of 
analysis.  Yet anyway.


Current practice is that every .c file should start with
#include "config.h"
#include "system.h"
#include "coretypes.h"

I also think every .c file should also then have
#include "tree.h"
#include "gimple.h"// Only assuming it is a gimple based file

These are basic implementation files and I think it's reasonable for 
each .c file to include them as a basis if they are to be used.


Beyond that I am a bit torn... It seems reasonable to have module 
aggregators like tree-ssa.h which a .c file can include can get all the 
"stuff" an ssa pass will commonly require.  I can also see the argument 
for the "include what you use" paradigm.


At a minimum, I do think that if a .h file *requires* another .h file to 
compile, that it should include it.  ie, if gimple-ssa.h is included, it 
wont compile unless tree-ssa-operands.h has already been included, so 
that seems reasonable to include directly in gimple-ssa.h.  Otherwise 
ones needs to add the file,  compile, and then figure out what other 
file you need.  That seems silly to me.


Perhaps using that as the guideline, we should just push all includes 
down into the .c files?  I think I favour that approach.


What are other thoughts?

Andrew



FAIL: g++.dg/guality/pr55665.C

2013-10-18 Thread Paolo Carlini

Hi,

these FAILs:

FAIL: g++.dg/guality/pr55665.C  -O2  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C  -O3 -fomit-frame-pointer  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C  -O3 -g  line 23 p == 40

apparently are here to stay, at least on x86_64-linux... Seriusly, do we 
know what's going on? Do we have have a Bugzilla tracking the issue?


Thanks,
Paolo.


Re: [wide-int] int_traits

2013-10-18 Thread Kenneth Zadeck
I am a little confused here.what is the reason for doing the the 
is_sign_extended thing?
is the point of deferring the canonization that we can avoid the use of 
the scratch array until the value is actually used.   then we 
canonicalize on demand?


That seems like it is going to require several bits of state: like one 
to say if we have canonicalized or not.   All of these will then have to 
be checked on every operation.


however, i guess if it keeps to alias analyzer sane then I guess it is 
ok, but it makes everything else a little more complex.


Kenny




On 10/18/2013 10:57 AM, Richard Sandiford wrote:

[off-list]

Kenneth Zadeck  writes:

Richi,

Do you want me to back out the patch that changes the rep for unsigned
tree-csts?

kenny

Doesn't look like you're on IRC, so FWIW:

 richi: just to check, you still want the scratch array to be
   separate from the other fields, to avoid the other fields escaping,
   is that right? [11:13]
 will try that today if so
 rsandifo: that would be nice, but I'm not settled yet on what to do on
::decompose fully - the idea of a ::is_sign_extended storage-ref flag
popped up [11:17]
 rsandifo: so eventually we don't need a scratch member anymore (which
would be best)
 rsandifo: so if you can give that idea a take instead ... ;) [11:18]
 today I'll try to clear my patch review pipeline ...
 yeah.  Just to be sure: we still need the extra zero HWI for the
   large unsigned constants though, right?
 no, in that case we wouldn't need that
 we'd have extra (conditional) sext() operations in all sign-dependend
ops though [11:19]
 thus delay the canonicalization until the first use
 I thought the ::is_sign_extended was dealing with a different case
   (small_prec).  The problem with the extra zero HWI is different.
 no, it would deal with the case in general and say "if you need a
sign-extended rep you have to sign-extend" [11:20]
 but the point of the extra HWI is that if you extend a 64-bit tree
   to 128 bits, you have two signficant HWIs.
 for the fixed_wide_int rep we then want to combine the extension with
the copy in its constructor [11:21]
 For unsigned constants that extra HWI is zero.  For signed
   constants it's minus one.
 The choice matters in that case, since it affects the 128-bit
   result of any arithmetic
 but here the interface is just what limits us - the fact that
decompose allows precision != xprecision [11:22]
 that case should be split out to a different "decompose"
 it's just needed for fixed_wide_int AFAIK?
 But the interface was defined that way to avoid constructing
   fixed_wide_ints for x and y when doing x + y -> fixed_wide_int
 we could instead do addition with three different precisions (x, y
   and the result), but that'd get complicated... [11:23]
 hmm, I see
 well, then leave the extra zeros alone for now [11:24]
 ok, works for me, thanks
 so for this ::is_sign_extended thing, we basically go back to the
   original "arbitrary upper bits" as the default (conservative)
   assumption, but add optimisations for the case where
   ::is_sign_extended is a compile-time true? [11:25]
 yes
 OK
 "arbitrary upper bits" would be either sign- or zero-extended thouhg
(but it probaly doesn't make a difference to arbtrary)
 on a tree wide_int_ref would have that "arbitrary upper bits"
 that in as far as I can see should get rid of scratch (fingers
crossing) [11:27]
 s/on/only/
 Right -- that copy wasn't there in Kenny's original version, it
   only came in because of the "small_prec having defined upper bits".
 I agree ::is_sign_extended sounds like a nice compromise [11:28]
 let's hope it will work out and improve code as desired ;)
 also the parts Mike suggested, adding more compile-time known
optimizations [11:29]
 yeah
 and making tree.c predicates use the tree rep directly
 though then integer_zerop will be magically more efficient than t == 0
... [11:30]
 would be nice if we could wi:: to be efficient enough so that we
   can keep the tree.c bits as-is [11:32]
 ...could get... [11:33]
 with CONSTANT (...) tricks or whatever
 realise that might be a bit optimistic though...
 yeah [12:41]





Re: [wide-int] int_traits

2013-10-18 Thread Richard Biener
Kenneth Zadeck  wrote:
>I am a little confused here.what is the reason for doing the the 
>is_sign_extended thing?
>is the point of deferring the canonization that we can avoid the use of
>
>the scratch array until the value is actually used.   then we 
>canonicalize on demand?

Yes.

>That seems like it is going to require several bits of state: like one 
>to say if we have canonicalized or not.   All of these will then have
>to 
>be checked on every operation.

We simply statically assign this property to an int_traits kind so
We can evaluate it at compile-time.

>however, i guess if it keeps to alias analyzer sane then I guess it is 
>ok, but it makes everything else a little more complex.

Its indeed solely to make the generated code sane and thus reduce the overhead 
of the wide-int branch.

Richard.

>Kenny
>
>
>
>
>On 10/18/2013 10:57 AM, Richard Sandiford wrote:
>> [off-list]
>>
>> Kenneth Zadeck  writes:
>>> Richi,
>>>
>>> Do you want me to back out the patch that changes the rep for
>unsigned
>>> tree-csts?
>>>
>>> kenny
>> Doesn't look like you're on IRC, so FWIW:
>>
>>  richi: just to check, you still want the scratch array to
>be
>> separate from the other fields, to avoid the other fields
>escaping,
>> is that right? [11:13]
>>  will try that today if so
>>  rsandifo: that would be nice, but I'm not settled yet on what
>to do on
>>  ::decompose fully - the idea of a ::is_sign_extended storage-ref
>flag
>>  popped up [11:17]
>>  rsandifo: so eventually we don't need a scratch member
>anymore (which
>>  would be best)
>>  rsandifo: so if you can give that idea a take instead ... ;)
>[11:18]
>>  today I'll try to clear my patch review pipeline ...
>>  yeah.  Just to be sure: we still need the extra zero HWI
>for the
>> large unsigned constants though, right?
>>  no, in that case we wouldn't need that
>>  we'd have extra (conditional) sext() operations in all
>sign-dependend
>>  ops though [11:19]
>>  thus delay the canonicalization until the first use
>>  I thought the ::is_sign_extended was dealing with a
>different case
>> (small_prec).  The problem with the extra zero HWI is different.
>>  no, it would deal with the case in general and say "if you
>need a
>>  sign-extended rep you have to sign-extend" [11:20]
>>  but the point of the extra HWI is that if you extend a
>64-bit tree
>> to 128 bits, you have two signficant HWIs.
>>  for the fixed_wide_int rep we then want to combine the
>extension with
>>  the copy in its constructor [11:21]
>>  For unsigned constants that extra HWI is zero.  For signed
>> constants it's minus one.
>>  The choice matters in that case, since it affects the
>128-bit
>> result of any arithmetic
>>  but here the interface is just what limits us - the fact that
>>  decompose allows precision != xprecision [11:22]
>>  that case should be split out to a different "decompose"
>>  it's just needed for fixed_wide_int AFAIK?
>>  But the interface was defined that way to avoid
>constructing
>> fixed_wide_ints for x and y when doing x + y -> fixed_wide_int
>>  we could instead do addition with three different
>precisions (x, y
>> and the result), but that'd get complicated... [11:23]
>>  hmm, I see
>>  well, then leave the extra zeros alone for now [11:24]
>>  ok, works for me, thanks
>>  so for this ::is_sign_extended thing, we basically go back
>to the
>> original "arbitrary upper bits" as the default (conservative)
>> assumption, but add optimisations for the case where
>> ::is_sign_extended is a compile-time true? [11:25]
>>  yes
>>  OK
>>  "arbitrary upper bits" would be either sign- or zero-extended
>thouhg
>>  (but it probaly doesn't make a difference to arbtrary)
>>  on a tree wide_int_ref would have that "arbitrary upper bits"
>>  that in as far as I can see should get rid of scratch
>(fingers
>>  crossing) [11:27]
>>  s/on/only/
>>  Right -- that copy wasn't there in Kenny's original
>version, it
>> only came in because of the "small_prec having defined upper
>bits".
>>  I agree ::is_sign_extended sounds like a nice compromise
>[11:28]
>>  let's hope it will work out and improve code as desired ;)
>>  also the parts Mike suggested, adding more compile-time known
>>  optimizations [11:29]
>>  yeah
>>  and making tree.c predicates use the tree rep directly
>>  though then integer_zerop will be magically more efficient
>than t == 0
>>  ... [11:30]
>>  would be nice if we could wi:: to be efficient enough so
>that we
>> can keep the tree.c bits as-is [11:32]
>>  ...could get... [11:33]
>>  with CONSTANT (...) tricks or whatever
>>  realise that might be a bit optimistic though...
>>  yeah [12:41]
>>




Re: [wide-int] int_traits

2013-10-18 Thread Richard Sandiford
Kenneth Zadeck  writes:
> I am a little confused here.what is the reason for doing the the 
> is_sign_extended thing?
> is the point of deferring the canonization that we can avoid the use of 
> the scratch array until the value is actually used.   then we 
> canonicalize on demand?

The idea is that we conservatively treat upper bits as undefined when
reading, just like you did originally.  But (assuming I understood the
idea correctly) the traits class has an ::is_sign_extended member that
may be able to tell you at compile time that the upper bits are actually
sign-extended.  This would be true for wide_int and rtx, but not tree.
We can then use ::is_sign_extended to add fast paths to the functions
that are better with sign-extended inputs (like eq_p and lts_p) without
adding any extra run-time checks.

Thanks,
Richard



Re: [RFC] Include file structuring.

2013-10-18 Thread Jeff Law

On 10/18/13 08:00, Andrew MacLeod wrote:

The tree-flow.h restructuring now brings us to the larger question of
exactly how we want includes organized.  All the remaining includes in
tree-ssa.h are required by numerous other .c files. The actual number of
.c files which will need to #include any given file is:

(roughly calculated by the number of .o file which don't compile when
removed from tree-ssa.h)
19  bitmap.h
77  gimple.h
61  gimple-ssa.h
17   cgraph.h
72  tree-cfg.h
46  tree-phinodes.h
69  ssa-iterators.h
82  tree-ssanames.h
38  tree-ssa-loop.h
37  tree-into-ssa.h
35  tree-dfa.h


The question is... Do we allow a .h file like this to be an aggregator,
meaning a file can just include tree-ssa.h and get all this, or do we
push it all down to the .c file, and actually include what each one
needs.  Or do we pick a specific subset for tree-ssa.h...
I thought we had decided we weren't going to allow this -- ie, if you 
need ssa-iterators.h, then you include it rather than assuming you get 
it from tree-ssa.h.


ISTM explicit including rather than aggregation makes it clearer to the 
reader which modules a particular file has to interact with.  And when 
we see something "odd" in a particular file's include list, it's a good 
indicator that we need to look at little more closely to see if we've 
got code in the wrong place or bad separation of components.






So far I've removed the less commonly included files, ie, less than 10
or so .c files need it.  That also gave me the opportunity to analyze
and restructure the exports in those files a bit. That is a much larger
job on these commonly included files, so I don't plan to do that sort of
analysis.  Yet anyway.

Current practice is that every .c file should start with
#include "config.h"
#include "system.h"
#include "coretypes.h"

Yes.  Let's just go ahead and set this in stone now.

I know someone will argue that if they're the same, then there should be 
an aggregator which includes all three so that each .c file doesn't muck 
it up.  I see their point, I just prefer to move to a more explicit model.






I also think every .c file should also then have
#include "tree.h"
#include "gimple.h"// Only assuming it is a gimple based file

These are basic implementation files and I think it's reasonable for
each .c file to include them as a basis if they are to be used.
Well, presumably you're talking about files that care about trees. 
Ideally we'll get to a place where the RTL optimizers (for example) 
don't have to include tree.h.



At a minimum, I do think that if a .h file *requires* another .h file to
compile, that it should include it.  ie, if gimple-ssa.h is included, it
wont compile unless tree-ssa-operands.h has already been included, so
that seems reasonable to include directly in gimple-ssa.h.  Otherwise
ones needs to add the file,  compile, and then figure out what other
file you need.  That seems silly to me.
It's a bit silly, but does result in a minimal set of #includes :-)  Or 
it results in folks just including a bunch of headers because they 
copied them from somewhere else.




Perhaps using that as the guideline, we should just push all includes
down into the .c files?  I think I favour that approach.
To the extent possible, I favor this as well.  Mostly because I see this 
as an early pass filter when we look at modularity problems, both in 
existing sources and in future patches.




jeff



Re: [RFC] Include file structuring.

2013-10-18 Thread Andrew MacLeod

On 10/18/2013 12:55 PM, Jeff Law wrote:

On 10/18/13 08:00, Andrew MacLeod wrote:



The question is... Do we allow a .h file like this to be an aggregator,
meaning a file can just include tree-ssa.h and get all this, or do we
push it all down to the .c file, and actually include what each one
needs.  Or do we pick a specific subset for tree-ssa.h...
I thought we had decided we weren't going to allow this -- ie, if you 
need ssa-iterators.h, then you include it rather than assuming you get 
it from tree-ssa.h.




Well, we specifically disallowed prototypes from some other .c file 
being in a .h..  ie,  file.h is to have only the prototypes for file.c 
in it.


I don't think we ever discussed/decided what an include file should or 
needs to include from other include files...  but I could be wrong :-)






So far I've removed the less commonly included files, ie, less than 10
or so .c files need it.  That also gave me the opportunity to analyze
and restructure the exports in those files a bit. That is a much larger
job on these commonly included files, so I don't plan to do that sort of
analysis.  Yet anyway.

Current practice is that every .c file should start with
#include "config.h"
#include "system.h"
#include "coretypes.h"

Yes.  Let's just go ahead and set this in stone now.

I know someone will argue that if they're the same, then there should 
be an aggregator which includes all three so that each .c file doesn't 
muck it up.  I see their point, I just prefer to move to a more 
explicit model.



Yes, one can argue that the "implementation" file, be it tree.h gimple.h 
or rtl.h could include these very things... since ultimately, we'd need 
one of these 3 files to start every .c file.







I also think every .c file should also then have
#include "tree.h"
#include "gimple.h"// Only assuming it is a gimple based file

These are basic implementation files and I think it's reasonable for
each .c file to include them as a basis if they are to be used.
Well, presumably you're talking about files that care about trees. 
Ideally we'll get to a place where the RTL optimizers (for example) 
don't have to include tree.h.


indeed.  at the moment, every file needs trees.  When I work on the 
wrappers, tree.h will be trimmed out of those files which get 
converted.   rtl will also at the point only require some aspects of 
gimple (the symtabs and whatever else we discover), and  presumably have 
an rtl.h that those files include.


The point being, after those first 3 files, the .c file should include 
whichever "implementation" files it requires... be it tree.h, gimple.h 
and/or rtl.h.  Eventually, it will only be one of those 3.  I hope :-)




At a minimum, I do think that if a .h file *requires* another .h file to
compile, that it should include it.  ie, if gimple-ssa.h is included, it
wont compile unless tree-ssa-operands.h has already been included, so
that seems reasonable to include directly in gimple-ssa.h. Otherwise
ones needs to add the file,  compile, and then figure out what other
file you need.  That seems silly to me.
It's a bit silly, but does result in a minimal set of #includes :-)  
Or it results in folks just including a bunch of headers because they 
copied them from somewhere else.


perhaps...  As I was writing up my argument for why this #include in the 
includ efile is good, I convinced myself of the opposite :-P


If we follow this and include other files which are required by this .h, 
some files will indeed be included many times.  In particular, most of 
the tree-ssa*.h files which include inline functions are going to need a 
certain subset of other ssa.h files, like operands, iterators, etc.  So 
this path slowly leads to include aggregators again...


So I retract that comment.   I think perhaps no include file should 
include any other include file...  and the C files can order them. The 
worst case we end up with is too many includes in a .c file... and that 
can be addressed.


I have already created a little tool that runs through the source base 
removing each include one at a time from a .c file and checking if it 
still compiles ok.   I was saving that to run for a bit later once I 
have things a bit more processed... maybe in a week or two. I did a dry 
run last week and it removed 600+ include's from just the middle/backend 
.c files :-P.   we could easily run that just before ending every stage 
1 to clean things up a bit.  (it also ends up removing duplicates... 
you'd be surprised how many times the same file appears in the include 
list file :-)


That should at least help with the cut'n'paste approach that we've all 
used to get a .c file started :-) Just use 'em all and get the tool to 
reduce it for you!


So I think I am in favour of no includes in .h files... It may make it 
more obvious when a file is using some other inappropriate file for 
something, and it is easier for my simple analysis tools to find poor 
export  candidates.


I will also note th