Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread David Brown
On 10/09/13 04:44, Jeff Law wrote:
> On 09/09/2013 02:45 PM, Andrew MacLeod wrote:
>> A number of header files have inline functions declared in them. Some of
>> these functions are actually quite large, and I doubt that inlining them
>> is the right thing.   For instance, tree-flow-inline.h has some quite
>> large functions.  Many of the op_iter* functions are 30-40 lines long,
>> and get_addr_base_and_unit_offset_1() is 130 lines.  Doesn't seem like
>> it should be static inline! :-P
>>
>> During the process of re-factoring header files, it could be worthwhile
>> to also move  functions like this to a .c file...
>>
>> I know a lot of work has  going in to the inliner and LTO, and I was
>> wondering what its state was with regards to the current gcc source base.
>>
>> My questions are:
>>
>> 1) is everyone in favour of moving these largish inlines out of header
>> files and making them not inline,
>> 2) what size of function is rationale for inlining. Small one obviously,
>> but where does the line start to get vague, and what would be a good
>> rationale litmus test for an inline?  Functions which "do a lot" and
>> look like they would use a number of registers seem like candidates to
>> move..   I think we have a lot of functions that end up being compiled
>> quite large because they inline functions which inline functions which
>> inline functions
>> 3) The significance of moving these out would be greatly reduced if GCC
>> were produced with LTO.. have we tried or considered doing this and
>> possibly releasing gcc compiled this way?  It seems to me we could have
>> significantly  less stuff in header files tagged as inline, but still
>> see the benefit in our final product...   maybe all we'd need is the
>> teeny tiny ones... and let the machinery figure it all out.  Now that
>> would be sweet...
> Unless we have evidence to show inlining a nontrivial function is a
> performance win, my inclination is to not have them in .h files and
> decorate them with inline directives.  Instead put them back in a .c
> file where they belong and let LTO do its thing.
> 
> I haven't done any research, but I suspect once you go beyond the
> trivial functions size is no longer a good indicator of whether or not
> something should be inlined.  Instead I suspect the question should be,
> if I inline this nontrivial code, how much code either in the caller or
> the inlined callee gets simplified away.   Of course, that's not always
> an easy question to answer :-)
> 
> Jeff
> 

This last point is crucial.  I haven't looked at the code in question,
but one point to check is how the functions are called.  If they are
often called with constant values, then they may be very much simplified
due to constant propagation.  Secondly, if a function is inlined, the
compiler has full knowledge of the effects of the function "call" and
can thus optimise better (keeping data in registers over the "call",
shuffling around loads and stores, etc.).  Finally, if the functions are
called in loops or other time-critical code, it can be worth spending
more code section space for a faster result (but sometimes smaller code
is faster due to caches, branch prediction buffers, etc.).

The ideal situation is that LTO figures this out for you, and the code
can go as normal non-inline functions in a C file.  But if you are not
compiling with LTO (or not yet, anyway), then check the usage of the
functions before moving them.

David




Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2013 at 10:06:04AM +0200, David Brown wrote:
> This last point is crucial.  I haven't looked at the code in question,
> but one point to check is how the functions are called.  If they are
> often called with constant values, then they may be very much simplified
> due to constant propagation.  Secondly, if a function is inlined, the
> compiler has full knowledge of the effects of the function "call" and
> can thus optimise better (keeping data in registers over the "call",
> shuffling around loads and stores, etc.).  Finally, if the functions are
> called in loops or other time-critical code, it can be worth spending
> more code section space for a faster result (but sometimes smaller code
> is faster due to caches, branch prediction buffers, etc.).
> 
> The ideal situation is that LTO figures this out for you, and the code

At least until LTO keeps to end up with unusable or hardly usable debug
info, effectively requiring LTO for good compiler performance is a
non-starter.  And, the inliner we have is not dumb, if it sees an inline
function, but it is too large, it will usually not inline it.
So, rather than counting the lines of inline functions in headers, IMHO it
is better to look at inliner's decisions about those functions, if it
never inlines them, then supposedly moving them out of line is reasonable.

Jakub


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread David Brown
On 10/09/13 10:11, Jakub Jelinek wrote:
> On Tue, Sep 10, 2013 at 10:06:04AM +0200, David Brown wrote:
>> This last point is crucial.  I haven't looked at the code in question,
>> but one point to check is how the functions are called.  If they are
>> often called with constant values, then they may be very much simplified
>> due to constant propagation.  Secondly, if a function is inlined, the
>> compiler has full knowledge of the effects of the function "call" and
>> can thus optimise better (keeping data in registers over the "call",
>> shuffling around loads and stores, etc.).  Finally, if the functions are
>> called in loops or other time-critical code, it can be worth spending
>> more code section space for a faster result (but sometimes smaller code
>> is faster due to caches, branch prediction buffers, etc.).
>>
>> The ideal situation is that LTO figures this out for you, and the code
> 
> At least until LTO keeps to end up with unusable or hardly usable debug
> info, effectively requiring LTO for good compiler performance is a
> non-starter.  And, the inliner we have is not dumb, if it sees an inline
> function, but it is too large, it will usually not inline it.
> So, rather than counting the lines of inline functions in headers, IMHO it
> is better to look at inliner's decisions about those functions, if it
> never inlines them, then supposedly moving them out of line is reasonable.
> 
>   Jakub
> 

That should be easy enough with "-Winline", I think.

David



Re: [RFC] Vectorization of indexed elements

2013-09-10 Thread Richard Biener
On Mon, 9 Sep 2013, Marc Glisse wrote:

> On Mon, 9 Sep 2013, Vidya Praveen wrote:
> 
> > Hello,
> > 
> > This post details some thoughts on an enhancement to the vectorizer that
> > could take advantage of the SIMD instructions that allows indexed element
> > as an operand thus reducing the need for duplication and possibly improve
> > reuse of previously loaded data.
> > 
> > Appreciate your opinion on this.
> > 
> > ---
> > 
> > A phrase like this:
> > 
> > for(i=0;i<4;i++)
> >   a[i] = b[i]  c[2];
> > 
> > is usually vectorized as:
> > 
> >  va:V4SI = a[0:3]
> >  vb:V4SI = b[0:3]
> >  t = c[2]
> >  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
> >  ...
> >  va:V4SI = vb:V4SI  vc:V4SI
> > 
> > But this could be simplified further if a target has instructions that
> > support
> > indexed element as a parameter. For example an instruction like this:
> > 
> >  mul v0.4s, v1.4s, v2.4s[2]
> > 
> > can perform multiplication of each element of v2.4s with the third element
> > of
> > v2.4s (specified as v2.4s[2]) and store the results in the corresponding
> > elements of v0.4s.
> > 
> > For this to happen, vectorizer needs to understand this idiom and treat the
> > operand c[2] specially (and by taking in to consideration if the machine
> > supports indexed element as an operand for  through a target hook or
> > macro)
> > and consider this as vectorizable statement without having to duplicate the
> > elements explicitly.
> > 
> > There are fews ways this could be represented at gimple:
> > 
> >  ...
> >  va:V4SI = vb:V4SI  VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  ...
> > 
> > or by allowing a vectorizer treat an indexed element as a valid operand in a
> > vectorizable statement:
> 
> Might as well allow any scalar then...

I agree.  The VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) form
would necessarily be two extra separate statements and thus subject
to CSE obfuscating it enough for RTL expansion to no longer notice it.

That said, allowing mixed scalar/vector ops isn't very nice and
your scheme can be simplified by just using

  vc:V4SI = VEC_DUPLICATE_EXPR <...>
  va:V4SI = vb:V4SI  vc:V4SI

where the expander only has to see that vc:V4SI is defined by
a duplicate.

> >  ...
> >  va:V4SI = vb:V4SI  VEC_SELECT_EXPR (vc:V4SI 2)
> >  ...
> > 
> > For the sake of explanation, the above two representations assumes that
> > c[0:3] is loaded in vc for some other use and reused here. But when c[2] is
> > the
> > only use of 'c' then it may be safer to just load one element and use it
> > like
> > this:
> > 
> >  vc:V4SI[0] = c[2]
> >  va:V4SI = vb:V4SI  VEC_SELECT_EXPR (vc:V4SI 0)
> > 
> > This could also mean that expressions involving scalar could be treated
> > similarly. For example,
> > 
> >  for(i=0;i<4;i++)
> >a[i] = b[i]  c
> > 
> > could be vectorized as:
> > 
> >  vc:V4SI[0] = c
> >  va:V4SI = vb:V4SI  VEC_SELECT_EXPR (vc:V4SI 0)
> > 
> > Such a change would also require new standard pattern names to be defined
> > for
> > each .
> > 
> > Alternatively, having something like this:
> > 
> >  ...
> >  vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  va:V4SI = vb:V4SI  vt:V4SI
> >  ...
> > 
> > would remove the need to introduce several new standard pattern names but
> > have
> > just one to represent vec_duplicate(vec_select()) but ofcourse this will
> > expect
> > the target to have combiner patterns.
> 
> The cost estimation wouldn't be very good, but aren't combine patterns enough
> for the whole thing? Don't you model your mul instruction as:
> 
> (mult:V4SI
>   (match_operand:V4SI)
>   (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI
> 
> anyway? Seems that combine should be able to handle it. What currently happens
> that we fail to generate the right instruction?
> 
> In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR for
> vec_duplicate, adding new nodes is always painful.

True, though CONSTRUCTOR isn't a good vec_duplicate primitive.  But yes,
we have it that way at the moment and indeed adding new nodes is always
painful.

> > This enhancement could possibly help further optimizing larger scenarios
> > such
> > as linear systems.

Given that the vectorizer already handles all cases you quote but
just the expansion doesn't use the targets special abilities - can't
you just teach the expander to lookup the definition of the
vectors and see if it is an uniform CONSTRUCTOR?

Richard.


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Richard Biener
On Tue, Sep 10, 2013 at 10:11 AM, Jakub Jelinek  wrote:
> On Tue, Sep 10, 2013 at 10:06:04AM +0200, David Brown wrote:
>> This last point is crucial.  I haven't looked at the code in question,
>> but one point to check is how the functions are called.  If they are
>> often called with constant values, then they may be very much simplified
>> due to constant propagation.  Secondly, if a function is inlined, the
>> compiler has full knowledge of the effects of the function "call" and
>> can thus optimise better (keeping data in registers over the "call",
>> shuffling around loads and stores, etc.).  Finally, if the functions are
>> called in loops or other time-critical code, it can be worth spending
>> more code section space for a faster result (but sometimes smaller code
>> is faster due to caches, branch prediction buffers, etc.).
>>
>> The ideal situation is that LTO figures this out for you, and the code
>
> At least until LTO keeps to end up with unusable or hardly usable debug
> info, effectively requiring LTO for good compiler performance is a
> non-starter.  And, the inliner we have is not dumb, if it sees an inline
> function, but it is too large, it will usually not inline it.
> So, rather than counting the lines of inline functions in headers, IMHO it
> is better to look at inliner's decisions about those functions, if it
> never inlines them, then supposedly moving them out of line is reasonable.

I mostly agree.  Though please also factor in IPA-CP - the reason that
get_addr_base_and_unit_offset_1 is inline is that it receives a callback
which in all call-sites is constant or NULL and optimizing that function
is cruical (it will be usually called once in each TU and thus either inlined
or cloned for IPA-CP).

Also the operand iterators are very important to optimize, so outlining
them is a non-starter.

There isn't much code I'd expect we want to put back into .c files instead
given that we now use C++ I'd expect more code to be moved into header
files because of the use of templates.

That's the way of life with C++ - you got what you wanted ;)

Richard.


> Jakub


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Jan Hubicka
> On 10/09/13 10:11, Jakub Jelinek wrote:
> > On Tue, Sep 10, 2013 at 10:06:04AM +0200, David Brown wrote:
> >> This last point is crucial.  I haven't looked at the code in question,
> >> but one point to check is how the functions are called.  If they are
> >> often called with constant values, then they may be very much simplified
> >> due to constant propagation.  Secondly, if a function is inlined, the
> >> compiler has full knowledge of the effects of the function "call" and
> >> can thus optimise better (keeping data in registers over the "call",
> >> shuffling around loads and stores, etc.).  Finally, if the functions are
> >> called in loops or other time-critical code, it can be worth spending
> >> more code section space for a faster result (but sometimes smaller code
> >> is faster due to caches, branch prediction buffers, etc.).
> >>
> >> The ideal situation is that LTO figures this out for you, and the code
> > 
> > At least until LTO keeps to end up with unusable or hardly usable debug
> > info, effectively requiring LTO for good compiler performance is a
> > non-starter.  And, the inliner we have is not dumb, if it sees an inline
> > function, but it is too large, it will usually not inline it.
> > So, rather than counting the lines of inline functions in headers, IMHO it
> > is better to look at inliner's decisions about those functions, if it
> > never inlines them, then supposedly moving them out of line is reasonable.
> > 
> > Jakub
> > 
> 
> That should be easy enough with "-Winline", I think.

What i also do from time to time is to do LTO profiledbootstrap and look into
inline dump.  Then you have inlining ordered by pretty realistic measure of
benefits, so you see what matters and what not.

Inliner knows that constant callbacks are good to inline and will bump up
its limits.

One problem is that we do have those functions as static inlines, so they ends
up duplicated, with C++ we may consider moving them to comdat.

Honza
> 
> David


Re: [ping] [buildrobot] gcc/config/linux-android.c:40:7: error: ‘OPTION_BIONIC’ was not declared in this scope

2013-09-10 Thread Jan-Benedict Glaw
On Tue, 2013-09-10 12:01:34 +1200, Maxim Kuvyrkov  wrote:
> On 7/09/2013, at 1:31 AM, Jan-Benedict Glaw wrote:
> > This however still seems to have issues in a current build:
> > 
> > http://toolchain.lug-owl.de/buildbot/showlog.php?id=10520&mode=view
> 
> Mn10300-linux does not appear to be supporting linux.  Mn10300-linux
> target specifier expands into mn10300-unknown-linux-gnu, where *-gnu
> implies using Glibc library, which doesn't have mn10300 port.

Uh, I would have expected an error message then (from configury), but
you may well be right. I changed over to mn10300-elf for now.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
the second  :


signature.asc
Description: Digital signature


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Richard Biener
On Tue, Sep 10, 2013 at 11:06 AM, Jan Hubicka  wrote:
>> On 10/09/13 10:11, Jakub Jelinek wrote:
>> > On Tue, Sep 10, 2013 at 10:06:04AM +0200, David Brown wrote:
>> >> This last point is crucial.  I haven't looked at the code in question,
>> >> but one point to check is how the functions are called.  If they are
>> >> often called with constant values, then they may be very much simplified
>> >> due to constant propagation.  Secondly, if a function is inlined, the
>> >> compiler has full knowledge of the effects of the function "call" and
>> >> can thus optimise better (keeping data in registers over the "call",
>> >> shuffling around loads and stores, etc.).  Finally, if the functions are
>> >> called in loops or other time-critical code, it can be worth spending
>> >> more code section space for a faster result (but sometimes smaller code
>> >> is faster due to caches, branch prediction buffers, etc.).
>> >>
>> >> The ideal situation is that LTO figures this out for you, and the code
>> >
>> > At least until LTO keeps to end up with unusable or hardly usable debug
>> > info, effectively requiring LTO for good compiler performance is a
>> > non-starter.  And, the inliner we have is not dumb, if it sees an inline
>> > function, but it is too large, it will usually not inline it.
>> > So, rather than counting the lines of inline functions in headers, IMHO it
>> > is better to look at inliner's decisions about those functions, if it
>> > never inlines them, then supposedly moving them out of line is reasonable.
>> >
>> > Jakub
>> >
>>
>> That should be easy enough with "-Winline", I think.
>
> What i also do from time to time is to do LTO profiledbootstrap and look into
> inline dump.  Then you have inlining ordered by pretty realistic measure of
> benefits, so you see what matters and what not.
>
> Inliner knows that constant callbacks are good to inline and will bump up
> its limits.
>
> One problem is that we do have those functions as static inlines, so they ends
> up duplicated, with C++ we may consider moving them to comdat.

But then inlining / cloning is no longer cheap, no?  And will be
disabled at -O2?

Richard.

> Honza
>>
>> David


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Jan Hubicka
> 
> But then inlining / cloning is no longer cheap, no?  And will be
> disabled at -O2?

If you declare it "inline" and not "static inline" it will be inlined pretty
much as before, only it will get unified if it ends up out of line in multiple
units.  

Main difference in between static and non-static is in logic deciding when
inlining into all callers will lead to removing the offline copy from the
program.  This is controlable by comdat-sharing-probability parameter.

We won't clone at -O2 unless we know code will shrink that may be something to
revisit.  I think it would be resonable to enable clonning at -O2 for functions
declared inline.

I was also playing with idea adding a GCC decision to function mangling, so
comdat clones can be inlined.  I.e. having something like
mangled_foo->mangled_foo.__gcc_cprop_clone.arg0_0.arg1_17 It would need to
invent unique textual representations of all/most our substitutions that can
get tricky.

Honza


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Jan Hubicka
> > 
> > But then inlining / cloning is no longer cheap, no?  And will be
> > disabled at -O2?
> 
> If you declare it "inline" and not "static inline" it will be inlined pretty
> much as before, only it will get unified if it ends up out of line in multiple
> units.  
> 
> Main difference in between static and non-static is in logic deciding when
> inlining into all callers will lead to removing the offline copy from the
> program.  This is controlable by comdat-sharing-probability parameter.
> 
> We won't clone at -O2 unless we know code will shrink that may be something to
> revisit.  I think it would be resonable to enable clonning at -O2 for 
> functions
> declared inline.
> 
> I was also playing with idea adding a GCC decision to function mangling, so
> comdat clones can be inlined.  I.e. having something like
 inlined means unified by the comdat sharing 
machinery.

Honza
> mangled_foo->mangled_foo.__gcc_cprop_clone.arg0_0.arg1_17 It would need to
> invent unique textual representations of all/most our substitutions that can
> get tricky.
> 
> Honza


Re: RFC: Inlines, LTO and GCC

2013-09-10 Thread Andrew MacLeod

On 09/10/2013 04:44 AM, Richard Biener wrote:

On Tue, Sep 10, 2013 at 10:11 AM, Jakub Jelinek  wrote:

On Tue, Sep 10, 2013 at 10:06:04AM +0200, David Brown wrote:

This last point is crucial.  I haven't looked at the code in question,
but one point to check is how the functions are called.  If they are
often called with constant values, then they may be very much simplified
due to constant propagation.  Secondly, if a function is inlined, the
compiler has full knowledge of the effects of the function "call" and
can thus optimise better (keeping data in registers over the "call",
shuffling around loads and stores, etc.).  Finally, if the functions are
called in loops or other time-critical code, it can be worth spending
more code section space for a faster result (but sometimes smaller code
is faster due to caches, branch prediction buffers, etc.).

The ideal situation is that LTO figures this out for you, and the code

At least until LTO keeps to end up with unusable or hardly usable debug
info, effectively requiring LTO for good compiler performance is a
non-starter.  And, the inliner we have is not dumb, if it sees an inline
function, but it is too large, it will usually not inline it.
So, rather than counting the lines of inline functions in headers, IMHO it
is better to look at inliner's decisions about those functions, if it
never inlines them, then supposedly moving them out of line is reasonable.

I mostly agree.  Though please also factor in IPA-CP - the reason that
get_addr_base_and_unit_offset_1 is inline is that it receives a callback
which in all call-sites is constant or NULL and optimizing that function
is cruical (it will be usually called once in each TU and thus either inlined
or cloned for IPA-CP).

Also the operand iterators are very important to optimize, so outlining
them is a non-starter.

There isn't much code I'd expect we want to put back into .c files instead
given that we now use C++ I'd expect more code to be moved into header
files because of the use of templates.

That's the way of life with C++ - you got what you wanted ;)



:-) Gonna be a lot of little inlines that is for sure.

I think its probably more worthwhile to focus on getting functions into 
appropriate places than trying to make decisions about what should or 
shouldn't be inlined at this point.





Re: [RFC] Offloading Support in libgomp

2013-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2013 at 07:01:26PM +0400, Michael V. Zolotukhin wrote:
> I continued playing with plugins for libgomp, and I have several questions
> regarding that:
> 
> 1) Would it be ok, at least for the beginning, if we'd look for plugins in a
> folder, specified by some environment variable?  A plugin would be considered
> as suitable, if it's named "*.so" and if dlsym finds a certain set of 
> functions
> in it (e.g. "device_available", "offload_function" - names are subjected to
> change of course).

Trying to dlopen random libraries is bad, so when libgomp dlopens something,
it better should be a plugin and not something else.
I'd suggest that the name should be matching libgomp-plugin-*.so.1 or
similar wildcard.

> 2) We need to perform all libgomp initialization once at the first entry to
> libgomp.  Should we add corresponding checks to all GOMP_* routines or should
> the compiler add calls to GOMP_init (which also needs to be introduced) by
> itself before all other calls to libgomp?

Why?  If this is the plugin stuff, then IMNSHO it should be initialized only
on the first call to GOMP_target{,_data,_update} or omp_get_num_devices.
Just use pthread_once to initialize it just once.

> 3) Also, would it be ok if we store libgomp status (already initialized or 
> not)
> in some static variable?  I haven't seen such examples in the existing code
> base, so I don't sure it is a good way to go.

Sure.

> 4) We'll need to store some information about available devices:
>   - a search tree with data about mapping

For the search tree, I was going to actually implement it myself, but got
interrupted this week with work on UDRs again.  I wanted to write just
temporarily a dummy device that would execute on the host, but remap all
memory to something allocated elsewhere in the same address space by malloc.
Sure, #pragma omp declare target vars wouldn't work that way, but otherwise
it could work fine.  Each device that would have a flag set that it doesn't
have shared address space between host and device (I belive HSAIL might have
shared address space, host fallback of course has shared address space,
the rest do not?) would have its own splay tree plus some host mutex to
guard accesses to the tree.

>   - corresponding plugin handler
>   - handlers for functions from the corresponding plugin
>   - maybe some other info

> I guess that's a bad idea to store all this data in some static-sized global
> variables, and it's better to dynamically allocate memory for that.  But it
> implies that we need to care about deallocation, which should be called at 
> some
> moment on the program end.  Shouldn't we introduce something like
> GOMP_deinitialize and insert calls to it during the compilation?

We don't need to care about deallocation, if it is not per-host-thread
stuff, but per-device stuff.  If we wanted, we could add some magic function
for valgrind that could be called (like e.g. glibc has), but it is
definitely not very important and we don't do it right now for parallels
etc.

> 5) We mentioned that similar to a tree data-structure for storing info about
> mapping.  Am I getting it correctly, that currently there is no such
> data-structure at all and we need to design and implement it from scratch?

See above.

Jakub


Re: [RFC] Offloading Support in libgomp

2013-09-10 Thread Michael V. Zolotukhin
Hi Jakub,
I continued playing with plugins for libgomp, and I have several questions
regarding that:

1) Would it be ok, at least for the beginning, if we'd look for plugins in a
folder, specified by some environment variable?  A plugin would be considered
as suitable, if it's named "*.so" and if dlsym finds a certain set of functions
in it (e.g. "device_available", "offload_function" - names are subjected to
change of course).

2) We need to perform all libgomp initialization once at the first entry to
libgomp.  Should we add corresponding checks to all GOMP_* routines or should
the compiler add calls to GOMP_init (which also needs to be introduced) by
itself before all other calls to libgomp?

3) Also, would it be ok if we store libgomp status (already initialized or not)
in some static variable?  I haven't seen such examples in the existing code
base, so I don't sure it is a good way to go.

4) We'll need to store some information about available devices:
  - a search tree with data about mapping
  - corresponding plugin handler
  - handlers for functions from the corresponding plugin
  - maybe some other info
I guess that's a bad idea to store all this data in some static-sized global
variables, and it's better to dynamically allocate memory for that.  But it
implies that we need to care about deallocation, which should be called at some
moment on the program end.  Shouldn't we introduce something like
GOMP_deinitialize and insert calls to it during the compilation?

5) We mentioned that similar to a tree data-structure for storing info about
mapping.  Am I getting it correctly, that currently there is no such
data-structure at all and we need to design and implement it from scratch?

--
Thanks, Michael


Re: [RFC] Offloading Support in libgomp

2013-09-10 Thread Michael V. Zolotukhin
> Trying to dlopen random libraries is bad, so when libgomp dlopens something,
> it better should be a plugin and not something else.
> I'd suggest that the name should be matching libgomp-plugin-*.so.1 or
> similar wildcard.
Ok, sounds reasonable.

> Why?  If this is the plugin stuff, then IMNSHO it should be initialized only
> on the first call to GOMP_target{,_data,_update} or omp_get_num_devices.
> Just use pthread_once to initialize it just once.
Ok, once we don't care about deallocation, that seems reasonable too.

> > 4) We'll need to store some information about available devices:
> >   - a search tree with data about mapping
> 
> For the search tree, I was going to actually implement it myself, but got
> interrupted this week with work on UDRs again.  I wanted to write just
> temporarily a dummy device that would execute on the host, but remap all
> memory to something allocated elsewhere in the same address space by malloc.
> Sure, #pragma omp declare target vars wouldn't work that way, but otherwise
> it could work fine.  Each device that would have a flag set that it doesn't
> have shared address space between host and device (I belive HSAIL might have
> shared address space, host fallback of course has shared address space,
> the rest do not?) would have its own splay tree plus some host mutex to
> guard accesses to the tree.
Ok.  Do you need all plugin infrastructure ready for that or you could
experiment with dummy device without plugins?

Michael
>   Jakub


Re: [RFC] Offloading Support in libgomp

2013-09-10 Thread Michael V. Zolotukhin
> I don't need that infrastructure for that, I meant just a hack that say for
> OMP_DEFAULT_DEVICE=257 I'd use this hackish device, and store the splay tree
> root and lock in a global var with a comment that that in the future will
> belong into the per-device structure.
Okay, hopefully I would have something committable soon on the infrastructure
side as well.

Michael
>   Jakub


Caroline Tice appointed VTV maintainer

2013-09-10 Thread David Edelsohn
I am pleased to announce that the GCC Steering Committee has
appointed Caroline Tice as VTV (libvtv) maintainer.

Please join me in congratulating Caroline on her new role.
Please update your listing in the MAINTAINERS file.

Happy hacking!
David



DJ Delorie and Nick Clifton appointed as MSP430 port maintainers

2013-09-10 Thread Jeff Law

I am pleased to announce that the GCC Steering Committee has
appointed DJ Delorie and Nick Clifton as maintainers for the MSP430 port

Please join me in congratulating DJ and Nick on their new role.
Please update your listing in the MAINTAINERS file.


Jeff



Re: [Suggestion] about h8/300 architecture in gcc and binutils

2013-09-10 Thread Michael Schewe

Hello Maintainers,

if you like to drop h8/300 support in linux kernel, thats OK for me.
But i like to see it still supported in gcc & binutils, at least i have some projects and know companies using this architecture in embedded applications, bare 
metal without OS. These products have lifetime in range of 10...20 years and need toolchain support for software-updates.


Michael

Please note for answers: i am only subscribed to binutils mailing list.

Chen Gang schrieb:

On 09/10/2013 10:19 AM, Jeff Law wrote:

On 09/09/2013 07:13 PM, Chen Gang wrote:

Hello Maintainers:

After google search and check the Linux kernel, H8/300 is dead, and for
gcc-4.9.0 and binutils-2.23.2 still has h8300, do we still need it for
another OS ?

Welcome any suggestions or completions, thanks.


The related information in linux kernel next tree:

commit d02babe847bf96b82b12cc4e4e90028ac3fac73f
Author: Guenter Roeck
Date:   Fri Aug 30 06:01:49 2013 -0700

Drop support for Renesas H8/300 (h8300) architecture

H8/300 has been dead for several years, and the kernel for it
has not compiled for ages. Drop support for it.

Cc: Yoshinori Sato
Acked-by: Greg Kroah-Hartman
Signed-off-by: Guenter Roeck


The related information in gcc/binutils:

We can build h8300 cross-compiler for Linux kernel, successfully,
but it has many bugs when building Linux kernel with -Os.
if we still need h8300 for another OS, is it still valuable to send
these bugs to Bugzilla (although it is found under Linux)?

It is still useful to send code generation bugs for the H8/300 series to
the GCC folks.



OK, thanks, I will wait for 1-2 days which may get another members'
opinions for discussing.

If no additional opinions, I will report them to Bugzilla, and I should
try to continue 'work' with related members (although I am a newbie for
compiler and binutils programming).


jeff





Thanks.


Re: DJ Delorie and Nick Clifton appointed as MSP430 port maintainers

2013-09-10 Thread David Brown

On 10/09/13 20:12, Jeff Law wrote:

 I am pleased to announce that the GCC Steering Committee has
appointed DJ Delorie and Nick Clifton as maintainers for the MSP430 port

 Please join me in congratulating DJ and Nick on their new role.
Please update your listing in the MAINTAINERS file.


Jeff



Can I congratulate the two of them, along with Redhat, TI, and of course 
the "old" msp430 gcc guys (Peter Bigot and helpers and predecessors) 
with finally getting the msp430 port far enough into the mainline gcc 
that the MSP430 port maintainers position exist?  Well done guys!


David




Re: mips16 LRA vs reload - Excess reload registers

2013-09-10 Thread Vladimir Makarov
On 09/09/2013 03:49 PM, Matthew Fortune wrote:
>
>> -Original Message-
>> From: Vladimir Makarov [mailto:vmaka...@redhat.com]
>> Sent: 08 September 2013 17:51
>> To: Matthew Fortune
>> Cc: gcc@gcc.gnu.org; ber...@codesourcery.com
>> Subject: Re: mips16 LRA vs reload - Excess reload registers
>>
>> On 13-08-23 5:26 AM, Matthew Fortune wrote:
>>> Hi Vladimir,
>>>
>>> I've been working on code size improvements for mips16 and have been
>> pleased to see some improvement when switching to use LRA instead of
>> classic reload. At the same time though I have also seen some differences
>> between reload and LRA in terms of how efficiently reload registers are
>> reused.
>>> The trigger for LRA to underperform compared with classic reload is when
>> IRA allocates inappropriate registers and thus puts a lot of stress on
>> reloading. Mips16 showed this because it can only access a small subset of
>> the MIPS registers for general instructions. The remaining MIPS registers are
>> still available as they can be accessed by some special instructions and used
>> via move instructions as temporaries. In the current mips16 backend,
>> register move costings lead IRA to determine that although the preferred
>> class for most pseudos is M16_REGS, the allocno class ends up as GR_REGS.
>> IRA then resorts to allocating registers outside of M16_REGS more and more
>> as register pressure increases, even though this is fairly stupid.
>>> When using classic reload the inappropriate register allocations are
>> effectively reverted as the reload pseudos that get invented tend to all
>> converge on the same hard register completely removing the original
>> pseudo. For LRA the reloads tend to diverge and different hard registers are
>> assigned to the reload pseudos leaving us with two new pseudos and the
>> original. Two extra move instructions and two extra hard registers used.
>> While I'm not saying it is LRA's fault for not fixing this situation 
>> perfectly it
>> does seem that classic reload is better at it.
>>> I have found a potential solution to the original IRA register allocation
>> problem but I think there may still be something to address in LRA to
>> improve this scenario anyway. My proposed solution to the IRA problem for
>> mips16 is to adjust register move costings such that the total of moving
>> between M16_REGS and GR_REGS and back is more expensive than memory,
>> but moving from GR_REGS to GR_REGS is cheaper than memory (even
>> though this is a bit weird as you have to go through an M16_REG to move
>> from one GR_REG to another GR_REG).
>>> GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a
>> candidate pressure class but the additional cost for M16->GR->M16 means
>> that IRA does not use GR_REGS as an alternative class and the allocno class 
>> is
>> just M16_REGS as desired. This feels a bit like a hack but may be the best
>> solution. The hard register costings used when allocating registers from an
>> allocno class just don't seem to be strong enough to prevent poor register
>> allocation in this case, I don't know if the hard register costs are 
>> supposed to
>> resolve this issue or if they are just about fine tuning.
>>> With the fix in place, LRA outperforms classic reload which is fantastic!
>>>
>>> I have a small(ish) test case for this and dumps for IRA, LRA and classic
>> reload along with the patch to enable LRA for mips16. I can also provide the
>> fix to register costing that effectively avoids/hides this problem for 
>> mips16.
>> Should I post them here or put them in a bugzilla ticket?
>>> Any advice on which area needs fixing would be welcome and I am quite
>> happy to work on this given some direction. I suspect these issues are
>> relevant for any architecture that is not 100% orthogonal which is pretty
>> much all and particularly important for compressed instruction sets.
>> Sorry again than I did not find time to answer you earlier, Matt.
>>
>> Your hack could work.  And I guess it is always worth to post the patch for
>> public with examples of the generated code before and after the patch.
>> May be some collective mind helps to figure out more what to do with the
>> patch.
> I'll post that shortly.
>  
>> But I guess there is still a thing to do. After constraining allocation only 
>> to
>> MIPS16 regs we still could use non-MIPS16 GR_REGS for storing values of
>> less frequently used pseudos (as storing them in non-MIPS16 GR_REGS is
>> better than in memory).  E.g. x86-64 LRA can use SSE regs for storing values
>> of less frequently used pseudos requiring GENERAL_REGS.
>> Please look at spill_class target hook and its implementation for x86-64.
> I have indeed implemented that for mips16 and found that not only does it 
> help to enable the use of non-mips16 registers as spill_class registers but 
> including the mips16 call clobbered registers is also worthwhile. It seems 
> that the spill_class logic is able to find some instances where spilled 
> pse

Re: Re: [ping] [buildrobot] gcc/config/linux-android.c:40:7: error: ‘OPTION_BIONIC’ was not declared in this scope

2013-09-10 Thread Joseph S. Myers
On Tue, 10 Sep 2013, Maxim Kuvyrkov wrote:

> Mn10300-linux does not appear to be supporting linux.  Mn10300-linux 
> target specifier expands into mn10300-unknown-linux-gnu, where *-gnu 
> implies using Glibc library, which doesn't have mn10300 port.

It's called am33, and the GCC port is also called am33_2.0-*-linux*.  
(But given the lack of any updates to the glibc patches sent by the 
prospective port maintainer over a year ago 
, following my 
review comments, I'm inclined to think it's time to remove the very 
bitrotten glibc port.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] Offloading Support in libgomp

2013-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2013 at 07:30:53PM +0400, Michael V. Zolotukhin wrote:
> > > 4) We'll need to store some information about available devices:
> > >   - a search tree with data about mapping
> > 
> > For the search tree, I was going to actually implement it myself, but got
> > interrupted this week with work on UDRs again.  I wanted to write just
> > temporarily a dummy device that would execute on the host, but remap all
> > memory to something allocated elsewhere in the same address space by malloc.
> > Sure, #pragma omp declare target vars wouldn't work that way, but otherwise
> > it could work fine.  Each device that would have a flag set that it doesn't
> > have shared address space between host and device (I belive HSAIL might have
> > shared address space, host fallback of course has shared address space,
> > the rest do not?) would have its own splay tree plus some host mutex to
> > guard accesses to the tree.
> Ok.  Do you need all plugin infrastructure ready for that or you could
> experiment with dummy device without plugins?

I don't need that infrastructure for that, I meant just a hack that say for
OMP_DEFAULT_DEVICE=257 I'd use this hackish device, and store the splay tree
root and lock in a global var with a comment that that in the future will
belong into the per-device structure.

Jakub


Re: [Suggestion] about h8/300 architecture in gcc and binutils

2013-09-10 Thread Chen Gang
On 09/11/2013 03:55 AM, Michael Schewe wrote:
> Hello Maintainers,
> 
> if you like to drop h8/300 support in linux kernel, thats OK for me.

OK, thanks.

> But i like to see it still supported in gcc & binutils, at least i have
> some projects and know companies using this architecture in embedded
> applications, bare metal without OS. These products have lifetime in
> range of 10...20 years and need toolchain support for software-updates.
> 

OK, thank you for your valuable information.

And it seems the issues of h8/300 for compiling Linux kernel is still
valuable to be focused on, just like Jeff Law said. :-)

> Michael
> 
> Please note for answers: i am only subscribed to binutils mailing list.
> 

Excuse me, my English is not quite well, and also I am a newbie in
binutils and gcc mailing list. I guess your meaning is:

  When send h8/300 related mails, better always include binut...@sourceware.org 
(although may it is only for gcc issues) ?

Is it correct ? (if it is correct, not need reply)


Thanks.

> Chen Gang schrieb:
>> On 09/10/2013 10:19 AM, Jeff Law wrote:
>>> On 09/09/2013 07:13 PM, Chen Gang wrote:
 Hello Maintainers:

 After google search and check the Linux kernel, H8/300 is dead, and for
 gcc-4.9.0 and binutils-2.23.2 still has h8300, do we still need it for
 another OS ?

 Welcome any suggestions or completions, thanks.


 The related information in linux kernel next tree:

 commit d02babe847bf96b82b12cc4e4e90028ac3fac73f
 Author: Guenter Roeck
 Date:   Fri Aug 30 06:01:49 2013 -0700

 Drop support for Renesas H8/300 (h8300) architecture

 H8/300 has been dead for several years, and the kernel for it
 has not compiled for ages. Drop support for it.

 Cc: Yoshinori Sato
 Acked-by: Greg Kroah-Hartman
 Signed-off-by: Guenter Roeck


 The related information in gcc/binutils:

 We can build h8300 cross-compiler for Linux kernel, successfully,
 but it has many bugs when building Linux kernel with -Os.
 if we still need h8300 for another OS, is it still valuable to send
 these bugs to Bugzilla (although it is found under Linux)?
>>> It is still useful to send code generation bugs for the H8/300 series to
>>> the GCC folks.
>>>
>>
>> OK, thanks, I will wait for 1-2 days which may get another members'
>> opinions for discussing.
>>
>> If no additional opinions, I will report them to Bugzilla, and I should
>> try to continue 'work' with related members (although I am a newbie for
>> compiler and binutils programming).
>>
>>> jeff
>>>
>>>
>>>
>>
>> Thanks.


-- 
Chen Gang


Re: RFC: SIMD pragma independent of Cilk Plus / OpenMPv4

2013-09-10 Thread Andi Kleen
Tobias Burnus  writes:
>
> Those require -fcilkplus and -fopenmp, respectively, and activate much
> more. The question is whether it makes sense to provide a means to ask
> the compiler for SIMD vectorization without enabling all the other things
> of Cilk Plus/OpenMP. What's your opinion?

If you don't use openmp pragmas or the Cilk Plus keywords they should
be noops as far as I know. So I don't really see a problem with
enabling them.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only