Problem with accessing built-ins from Fortran front-end

2014-09-26 Thread FX
Hi,

I’m trying to make the Fortran front-end emit calls to some builtins we don’t 
currently use (isfinite, isnormal). However, trying to use the same code as 
isnan doesn’t work at all. Our gfc_define_builtin does three things:

  decl = add_builtin_function (name, type, code, BUILT_IN_NORMAL, library_name, 
NULL_TREE);
  set_call_expr_flags (decl, attr);
  set_builtin_decl (code, decl, true);

While doing so works with isnan, it fails with isfinite or isnormal. When I try 
to, I get an ICE in tree checking:

  * frame #0: 0x000100c02338 f951`build_call_expr_loc_array(loc=0, 
fndecl=0x, n=2, argarray=0x7fff5fbfeb20) + 24 at tree.h:2846
frame #1: 0x000100c0259a f951`build_call_expr(fndecl=, 
n=) + 186 at tree.c:10550
frame #2: 0x00010046f671 
f951`fold_builtin_interclass_mathfn(loc=, fndecl=, 
arg=0x00014340c990) + 337 at builtins.c:9393
frame #3: 0x000100485e34 f951`fold_builtin_1(loc=260, 
fndecl=0x000143449360, arg0=0x00014340c990, ignore=) + 
2964 at builtins.c:10050
frame #4: 0x00010049030c f951`fold_builtin_n(loc=, 
fndecl=, args=, nargs=, 
ignore=) + 1116 at builtins.c:10409
frame #5: 0x000100491c33 f951`fold_builtin_call_array(loc=260, 
type=0x000143405690, fn=0x00014351fc40, n=1, 
argarray=0x7fff5fbfeeb0) + 355 at builtins.c:10575
frame #6: 0x000100c024c5 f951`build_call_expr_loc(loc=, 
fndecl=, n=) + 181 at tree.c:10533


I don’t understand how the middle-end builtins decl should work, would someone 
have a hint of how to fix this?

Thanks,
FX


PS: the attached patch emits isfinite instead of isnan to test the built-in 
changes with minimal. With that patch, the Fortran source attached triggers the 
ICE above.




problem.diff
Description: Binary data


b.f90
Description: Binary data


Re: cgraph_node::verify - quite strong condition that was met by IPA-ICF

2014-09-26 Thread Richard Biener
On Fri, Sep 26, 2014 at 12:00 AM, Jan Hubicka  wrote:
>> Hello.
>>
>> I've been finalizing IPA ICF testing process and I met a condition
>> for lto-bootstrap, where cgraph_node::verify encounters error:
>>
>> In WPA, I prove that gen_vec_initv16qi can be merged with
>> gen_vec_initv2sf. In the following case, ale local calls are
>> redirected:
>>
>>   while (alias->callers)
>> {
>>   cgraph_edge *e = alias->callers;
>>   e->redirect_callee (local_original);
>>   push_cfun (DECL_STRUCT_FUNCTION (e->caller->decl));
>>
>>   if (e->call_stmt)
>> e->redirect_call_stmt_to_callee ();
>>
>>   pop_cfun ();
>>   redirected = true;
>> }
>>
>> To be more precise, the modification play role in WPA, where
>> e->call_stmt == NULL (bodies are not loaded).
>> And a LTRANS produces following error:
>>
>> In function ‘gen_rotlv16qi3’:
>> lto1: error: edge points to wrong declaration:
>>  > ...
>>  Instead of: 
> Yep, this check basically verifies that the edge redirections are ones that we
> legaly do (that is currently by inliner and ipa-cp).  We need to either drop
> this check or extend it to accept ICF merging.
>
> In longer term, I am quite sure we will want to record functions that ICF 
> proved
> equivalent to a given function for debug info purposes.  For now what about
> "icf_merged" flag on node that will disable this check?
> (the check is quite useful to catch various strange errors in the code as
> Martin J. knows quite well ;)
>
> With complete list we can of course check that the call was previously
> targetting one of the functions that was proved to be equivalent.
>>
>> As I talked to Martin Jambor, problematic BT:
>> #0  cgraph_node::verify_node (this=0x7483b5e0) at
>> ../../gcc/cgraph.c:2837
>> #1  0x006d5a40 in symtab_node::verify (this=0x7483b5e0)
>> at ../../gcc/symtab.c:1158
>> #2  0x00b00c7c in expand_call_inline (bb=0x72875340,
>> stmt=0x72870990, id=0x7fffda70) at
>> ../../gcc/tree-inline.c:4234
>> #3  0x00b018a3 in gimple_expand_calls_inline
>> (bb=0x72875340, id=0x7fffda70) at
>> ../../gcc/tree-inline.c:4522
>> #4  0x00b01e69 in optimize_inline_calls (fn=0x7624a870)
>> at ../../gcc/tree-inline.c:4663
>>
>> Checking code is called from tree_inline, where checking is called
>> right after: cg_edge->callee->get_body ();
>> That means, no redirect_call_stmt_to_callee () hasn't been called
>> for the edge.
>>
>> Possible solutions:
>> 1) We can call get_body in WPA for each caller I do a redirection.
>> So that, call_stmt is fixed for the edge. (ugly solution)
>
> We do not want to do this.
>
>> 2) We can somehow fix edges in tree_inline, I tried:
>
> Actually the code is intended to run before redirection happen, just to check
> that cgraph and the actual IL seems to be in sync.  Lets just relax it on
> ICF merged functions for now.

Btw, isn't cgraph edge redirection a "transform" step?  Thus why is
it performed at WPA time at all?  Shouldn't it be performed at LTRANS
time the same time we materialize clones and inline?

Richard.

> Honza
>>
>>   cg_edge->callee->get_body ();
>>
>>   if (cg_edge->callee->decl != id->dst_node->decl)
>>   {
>> xxx = cg_edge->callee->callees;
>> while(xxx)
>> {
>>   push_cfun (DECL_STRUCT_FUNCTION (cg_edge->callee->decl));
>>   xxx->redirect_call_stmt_to_callee ();
>>   pop_cfun ();
>>   xxx = xxx->next_callee;
>> }
>>
>> cg_edge->callee->verify ();
>>   }
>>
>> But there's an ICE:
>>  0x82e03e gsi_start_bb
>> ../../gcc/gimple-iterator.h:118
>> 0x82f3f2 gsi_for_stmt(gimple_statement_base*)
>> ../../gcc/gimple-iterator.c:623
>> 0x6dcb1e cgraph_edge::redirect_call_stmt_to_callee()
>> ../../gcc/cgraph.c:1375
>>
>> 3) We can somehow reduce checking code or remove this sub-condition?
>>
>> Thank you,
>> Martin


Re: Problem with accessing built-ins from Fortran front-end

2014-09-26 Thread Richard Biener
On Fri, Sep 26, 2014 at 10:29 AM, FX  wrote:
> Hi,
>
> I’m trying to make the Fortran front-end emit calls to some builtins we don’t 
> currently use (isfinite, isnormal). However, trying to use the same code as 
> isnan doesn’t work at all. Our gfc_define_builtin does three things:
>
>   decl = add_builtin_function (name, type, code, BUILT_IN_NORMAL, 
> library_name, NULL_TREE);
>   set_call_expr_flags (decl, attr);
>   set_builtin_decl (code, decl, true);
>
> While doing so works with isnan, it fails with isfinite or isnormal. When I 
> try to, I get an ICE in tree checking:
>
>   * frame #0: 0x000100c02338 f951`build_call_expr_loc_array(loc=0, 
> fndecl=0x, n=2, argarray=0x7fff5fbfeb20) + 24 at 
> tree.h:2846

fndecl is NULL which means that fold_builtin_interclass_mathfn doesn't
expect only some of the classification builtins to exist.  In this case
probably

CASE_FLT_FN (BUILT_IN_FINITE):
case BUILT_IN_ISFINITE:
  {
/* isfinite(x) -> islessequal(fabs(x),DBL_MAX).  */
tree const isle_fn = builtin_decl_explicit (BUILT_IN_ISLESSEQUAL);

You can either amend the transforms in builtins.c to check for
the availability of the decls (simply return NULL_TREE if !isle_fn)
or make sure the fortran frontend defines all of the FP classification
and compare builtins.

Richard.

> frame #1: 0x000100c0259a f951`build_call_expr(fndecl=, 
> n=) + 186 at tree.c:10550
> frame #2: 0x00010046f671 
> f951`fold_builtin_interclass_mathfn(loc=, fndecl=, 
> arg=0x00014340c990) + 337 at builtins.c:9393
> frame #3: 0x000100485e34 f951`fold_builtin_1(loc=260, 
> fndecl=0x000143449360, arg0=0x00014340c990, ignore=) + 
> 2964 at builtins.c:10050
> frame #4: 0x00010049030c f951`fold_builtin_n(loc=, 
> fndecl=, args=, nargs=, 
> ignore=) + 1116 at builtins.c:10409
> frame #5: 0x000100491c33 f951`fold_builtin_call_array(loc=260, 
> type=0x000143405690, fn=0x00014351fc40, n=1, 
> argarray=0x7fff5fbfeeb0) + 355 at builtins.c:10575
> frame #6: 0x000100c024c5 f951`build_call_expr_loc(loc=, 
> fndecl=, n=) + 181 at tree.c:10533
>
>
> I don’t understand how the middle-end builtins decl should work, would 
> someone have a hint of how to fix this?
>
> Thanks,
> FX
>
>
> PS: the attached patch emits isfinite instead of isnan to test the built-in 
> changes with minimal. With that patch, the Fortran source attached triggers 
> the ICE above.
>
>


Re: Problem with accessing built-ins from Fortran front-end

2014-09-26 Thread Jakub Jelinek
On Fri, Sep 26, 2014 at 10:29:57AM +0200, FX wrote:
> Hi,
> 
> I’m trying to make the Fortran front-end emit calls to some builtins we don’t 
> currently use (isfinite, isnormal). However, trying to use the same code as 
> isnan doesn’t work at all. Our gfc_define_builtin does three things:
> 
>   decl = add_builtin_function (name, type, code, BUILT_IN_NORMAL, 
> library_name, NULL_TREE);
>   set_call_expr_flags (decl, attr);
>   set_builtin_decl (code, decl, true);
> 
> While doing so works with isnan, it fails with isfinite or isnormal. When I 
> try to, I get an ICE in tree checking:
> 
>   * frame #0: 0x000100c02338 f951`build_call_expr_loc_array(loc=0, 
> fndecl=0x, n=2, argarray=0x7fff5fbfeb20) + 24 at 
> tree.h:2846
> frame #1: 0x000100c0259a f951`build_call_expr(fndecl=, 
> n=) + 186 at tree.c:10550
> frame #2: 0x00010046f671 
> f951`fold_builtin_interclass_mathfn(loc=, fndecl=, 
> arg=0x00014340c990) + 337 at builtins.c:9393
> frame #3: 0x000100485e34 f951`fold_builtin_1(loc=260, 
> fndecl=0x000143449360, arg0=0x00014340c990, ignore=) + 
> 2964 at builtins.c:10050
> frame #4: 0x00010049030c f951`fold_builtin_n(loc=, 
> fndecl=, args=, nargs=, 
> ignore=) + 1116 at builtins.c:10409
> frame #5: 0x000100491c33 f951`fold_builtin_call_array(loc=260, 
> type=0x000143405690, fn=0x00014351fc40, n=1, 
> argarray=0x7fff5fbfeeb0) + 355 at builtins.c:10575
> frame #6: 0x000100c024c5 f951`build_call_expr_loc(loc=, 
> fndecl=, n=) + 181 at tree.c:10533
> 
> 
> I don’t understand how the middle-end builtins decl should work, would 
> someone have a hint of how to fix this?

Just look what foold_builtin_interclass_mathfn does:
isfinite(x) -> islessequal(fabs(x),DBL_MAX)
isnormal(x) -> isgreaterequal(fabs(x),DBL_MIN) & islessequal(fabs(x),DBL_MAX)

Thus, the middle-end assumes that if you have __builtin_{isfinite,isnormal},
you also have __builtin_is{less,greater}equal builtins too.
Just create those too and you'll be fine (they are also typegeneric, like
__builtin_is{finite,normal}), i.e. they take ... arguments and the FE is
supposed to verify they are ok (or, if the FE creates them artificially, it
will ensure that too).

Jakub


Re: Problem with accessing built-ins from Fortran front-end

2014-09-26 Thread FX
> Thus, the middle-end assumes that if you have __builtin_{isfinite,isnormal},
> you also have __builtin_is{less,greater}equal builtins too.

Many thanks to both of you! I wasn’t looking into the right place at all. I 
defined them, and now it works.

One related question: the __builtin_signbit is not type generic, is there any 
reason for it not to be? Is there any reason for it not to be? I ask for two 
reasons: 1. it would make my life easier, and 2. it’s generated in the folding 
of BUILT_IN_ISINF_SIGN, which is weird because the later is type generic.

Thanks again,
FX

Re: Problem with accessing built-ins from Fortran front-end

2014-09-26 Thread Jakub Jelinek
On Fri, Sep 26, 2014 at 11:07:28AM +0200, FX wrote:
> > Thus, the middle-end assumes that if you have __builtin_{isfinite,isnormal},
> > you also have __builtin_is{less,greater}equal builtins too.
> 
> Many thanks to both of you! I wasn’t looking into the right place at all. I 
> defined them, and now it works.
> 
> One related question: the __builtin_signbit is not type generic, is there
> any reason for it not to be?  Is there any reason for it not to be?  I ask

I guess history, builtins that were added earlier when we didn't have any
typegeneric builtins were all *{f,,l} etc. suffixed.
As they are used quite heavily in user code, it is too late to change that.
Just look at what glibc  does for years:
/* Return nonzero value if sign of X is negative.  */
# ifdef __NO_LONG_DOUBLE_MATH
#  define signbit(x) \
 (sizeof (x) == sizeof (float) ? __signbitf (x) : __signbit (x))
# else
#  define signbit(x) \
 (sizeof (x) == sizeof (float) \
  ? __signbitf (x) \
  : sizeof (x) == sizeof (double) \
  ? __signbit (x) : __signbitl (x))
# endif
We can't stop supporting that.

Jakub


Re: cgraph_node::verify - quite strong condition that was met by IPA-ICF

2014-09-26 Thread Martin Jambor
Hi,

On Fri, Sep 26, 2014 at 10:34:16AM +0200, Richard Biener wrote:
>

...

> 
> Btw, isn't cgraph edge redirection a "transform" step?  Thus why is
> it performed at WPA time at all?  Shouldn't it be performed at LTRANS
> time the same time we materialize clones and inline?
> 

No, not really, changing the callee of an edge is how at WPA stage we
record the decision to redirect the given call to a different target
(an ipa-cp clone or now a semantic equivalent).  The gimple statement
change (which is the real "transform" here I suppose) is done at
LTRANS but is verified like it was an ipa-cp redirection and fails.

Martin


Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-09-26 Thread David Wohlferd


On 9/25/2014 12:36 AM, Yury Gribov wrote:

On 09/24/2014 12:31 PM, Richard Biener wrote:
On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd 
 wrote:

Hans-Peter Nilsson: I should have listened to you back when you raised
concerns about this.  My apologies for ever doubting you.

In summary:

- The "trick" in the docs for using an arbitrarily sized struct to 
force

register flushes for inline asm does not work.
- Placing the inline asm in a separate routine can sometimes mask the
problem with the trick not working.
- The sample that has been in the docs forever performs an unhelpful,
unexpected, and probably unwanted stack allocation + memcpy.

Details:

Here is the text from the docs:

---
One trick to avoid [using the "memory" clobber] is available if the 
size of

the memory being accessed is known at compile time. For example, if
accessing ten bytes of a string, use a memory input like:

 "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )


Well - this can't work because you essentially are using a _value_
here (looking at the GIMPLE - I'm not sure if a statement expression
evaluates to an lvalue.

It should work if you simply do this without a stmt expression:

   "m" (*(struct { char x[10]; } *)ptr)

because that's clearly an lvalue (and the GIMPLE correctly says so):

   :
   c.a = 1;
   c.b = 2;
   __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
&c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
   printf ("%u %u\n", 1, 2);

note that we still constant propagated 1 and 2 for the reason that
the asm didn't get any VDEF.  That's because you do not have any
memory output!  So while it keeps 'c' live it doesn't consider it
modified by the asm.  You'd still need to clobber the memory,
but "m" clobbers are not supported, only "memory".

Thus fixed asm:


   __asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count)
: "0" (&c), "a" (0),
"m" (*( struct foo { char x[8]; } *)&c)
: "memory"
   );

where I'm not 100% sure if the "m" input is now pointless (that is,
if a "memory" clobber also constitutes a use of all memory).


Or maybe even
  __asm__ __volatile__ ("rep; stosb"
   : "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; } 
*)&c)

   : "0" (&c), "a" (0)
  );
to avoid the big-hammer memory clobber?

-Y



Thank you both for the responses.  At this point I've started composing 
some replacement text for the docs (below), cuz clearly what's there is 
both inadequate and wrong.  All comments are welcome.


While the code in Richard's response will always produce the correct 
results, the intent here is to use memory constraints to *avoid* the 
performance penalties of the memory clobber.  The existing docs say this 
should work, and I've seen a number of places using it (linux kernel, 
glibc, etc).  If this does work, we should update the docs to say how 
it's done.  If this doesn't work, we should say that too.


Looking at Yury's response, that code does work (in c, not c++).  At 
least it does sometimes.  The problem is that sometimes gcc can "lose 
track" of what a pointer points to.  And when it does, gcc can get 
confused about what to flush.  Here's a simple example that shows this 
(4.9.0, x64, -O2, 'c'):


#include 

inline void *memset( void * Dest, int c, size_t Count) {
   void *dummy;

   __asm__ (
  "rep stosb"
  : "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; } 
*)Dest)

  : "0" (Dest), "a" (c)
  : "cc"//, "memory"
   );

   return Dest;
}

int main() {
   struct {
 int a;
   } c;

   void *v;
   asm volatile("#" : "=r" (v) : "0" (&c)  );

   c.a = 0x30303030;

   //memset(&c, 0, sizeof(c));
   memset(v, 0, sizeof(c));
   printf("%x\n", c.a);
}

This code will work if you pass &c to memset.  But it will fail if you 
use v.  Oh, the wonders of aliasing.


And this is why I'm having a problem doc'ing this.  I love the potential 
benefits of using this.  But if you are writing general-purpose 
routines, how can you hope to know whether the code calling you will 
pass you a pointer that will work with this trick? This kind of thing 
can introduce *horribly* hard to track down problems.  Note that the 
memory clobber always works, but potentially with a performance penalty. 



I could simply skip describing the whole problem with aliasing, but that 
just hides the problem.  I hate when docs do that.


That said, here's what I've written so far.  Any improvements or 
corrections are very welcome.


==
Flushing registers to memory has performance implications and may be an 
issue

for time-sensitive code. One trick to avoid flushing more registers than is
absolutely required is to use a memory constraint.

For example this i386 code works for both c and c++:

@example
inline void *memset(void *Dest, int c, size_t Count)
@{
   struct _reallybigstruct @{ char x[SSIZE_MAX]; @};
   void *dummy;

   asm (
  "rep stosb"
  : "=

Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-09-26 Thread Richard Biener
On Fri, Sep 26, 2014 at 12:37 PM, David Wohlferd  
wrote:
>
> On 9/25/2014 12:36 AM, Yury Gribov wrote:
>>
>> On 09/24/2014 12:31 PM, Richard Biener wrote:
>>>
>>> On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd 
>>> wrote:

 Hans-Peter Nilsson: I should have listened to you back when you raised
 concerns about this.  My apologies for ever doubting you.

 In summary:

 - The "trick" in the docs for using an arbitrarily sized struct to force
 register flushes for inline asm does not work.
 - Placing the inline asm in a separate routine can sometimes mask the
 problem with the trick not working.
 - The sample that has been in the docs forever performs an unhelpful,
 unexpected, and probably unwanted stack allocation + memcpy.

 Details:

 Here is the text from the docs:

 ---
 One trick to avoid [using the "memory" clobber] is available if the size
 of
 the memory being accessed is known at compile time. For example, if
 accessing ten bytes of a string, use a memory input like:

  "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
>>>
>>>
>>> Well - this can't work because you essentially are using a _value_
>>> here (looking at the GIMPLE - I'm not sure if a statement expression
>>> evaluates to an lvalue.
>>>
>>> It should work if you simply do this without a stmt expression:
>>>
>>>"m" (*(struct { char x[10]; } *)ptr)
>>>
>>> because that's clearly an lvalue (and the GIMPLE correctly says so):
>>>
>>>:
>>>c.a = 1;
>>>c.b = 2;
>>>__asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
>>> &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
>>>printf ("%u %u\n", 1, 2);
>>>
>>> note that we still constant propagated 1 and 2 for the reason that
>>> the asm didn't get any VDEF.  That's because you do not have any
>>> memory output!  So while it keeps 'c' live it doesn't consider it
>>> modified by the asm.  You'd still need to clobber the memory,
>>> but "m" clobbers are not supported, only "memory".
>>>
>>> Thus fixed asm:
>>>
>>>
>>>__asm__ __volatile__ ("rep; stosb"
>>> : "=D" (Dest), "+c" (Count)
>>> : "0" (&c), "a" (0),
>>> "m" (*( struct foo { char x[8]; } *)&c)
>>> : "memory"
>>>);
>>>
>>> where I'm not 100% sure if the "m" input is now pointless (that is,
>>> if a "memory" clobber also constitutes a use of all memory).
>>
>>
>> Or maybe even
>>   __asm__ __volatile__ ("rep; stosb"
>>: "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; }
>> *)&c)
>>: "0" (&c), "a" (0)
>>   );
>> to avoid the big-hammer memory clobber?
>>
>> -Y
>>
>
> Thank you both for the responses.  At this point I've started composing some
> replacement text for the docs (below), cuz clearly what's there is both
> inadequate and wrong.  All comments are welcome.
>
> While the code in Richard's response will always produce the correct
> results, the intent here is to use memory constraints to *avoid* the
> performance penalties of the memory clobber.  The existing docs say this
> should work, and I've seen a number of places using it (linux kernel, glibc,
> etc).  If this does work, we should update the docs to say how it's done.
> If this doesn't work, we should say that too.
>
> Looking at Yury's response, that code does work (in c, not c++).  At least
> it does sometimes.  The problem is that sometimes gcc can "lose track" of
> what a pointer points to.  And when it does, gcc can get confused about what
> to flush.  Here's a simple example that shows this (4.9.0, x64, -O2, 'c'):
>
> #include 
>
> inline void *memset( void * Dest, int c, size_t Count) {
>void *dummy;
>
>__asm__ (
>   "rep stosb"
>   : "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; }
> *)Dest)
>   : "0" (Dest), "a" (c)
>   : "cc"//, "memory"
>);
>
>return Dest;
> }
>
> int main() {
>struct {
>  int a;
>} c;
>
>void *v;
>asm volatile("#" : "=r" (v) : "0" (&c)  );
>
>c.a = 0x30303030;
>
>//memset(&c, 0, sizeof(c));
>memset(v, 0, sizeof(c));
>printf("%x\n", c.a);
> }
>
> This code will work if you pass &c to memset.  But it will fail if you use
> v.  Oh, the wonders of aliasing.

Yeah, in this case because *(struct foo { char x[8]; } *)Dest uses alias-set
'4' but c.a is accessed with alias-set '3'.

You can't expect a struct with a char array to end up using alias-set
zero (which probably was intended).  You want

"=m" (*( struct foo  { char x[8]; } __attribute__((may_alias)) *)Dest)

which makes it work.  Or use *(int *)Dest as you are later accessing
it with int, so you even do not lose TBAA.

Richard.

> And this is why I'm having a problem doc'ing this.  I love the potential
> benefits of using this.  But if you are writing general-purpose routines,
> how can you hope to know whether the code calling you will pass you a
> pointer that will work with this t

Re: Autotuning parameters/heuristics within gcc - best place to start?

2014-09-26 Thread Andi Kleen
Robert Stevenson <9goli...@gmail.com> writes:

> I am planning to begin a project to explore the space of tuning gcc
> internals in an effort to increase performance. I am wondering if
> anyone can point to me any parameterizations, heuristics, or
> priorities functions within gcc that can be tuned/adjusted. Some
> general areas I am considering include register allocation and data
> prefetching.

gcc has lots of params (see gcc -v --help=params) for different
areas that can be tuned. If you need more there's a generic 
parameters frame work to add. And of course there are lots
of command line flags (see --help)

One example of an existing autotuner is the gccflags tuner in opentuner.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only


MIPS Maintainers

2014-09-26 Thread Jeff Law


Sorry this has taken so long, the delays have been totally mine in not 
following-up to get votes, then tally them from the steering committee.


I'm pleased to announce that Catherine Moore and Matthew Fortune have 
been appointed as maintainers for the MIPS port.


Catherine & Matthew, please update the MAINTAINERS file appropriately.

Thanks for everyone's patience,
Jeff


Re: MIPS Maintainers

2014-09-26 Thread Eric Christopher
Congratulations guys!

-eric

On Fri, Sep 26, 2014 at 1:01 PM, Jeff Law  wrote:
>
> Sorry this has taken so long, the delays have been totally mine in not
> following-up to get votes, then tally them from the steering committee.
>
> I'm pleased to announce that Catherine Moore and Matthew Fortune have been
> appointed as maintainers for the MIPS port.
>
> Catherine & Matthew, please update the MAINTAINERS file appropriately.
>
> Thanks for everyone's patience,
> Jeff


Re: Problem with accessing built-ins from Fortran front-end

2014-09-26 Thread Joseph S. Myers
On Fri, 26 Sep 2014, Jakub Jelinek wrote:

> > One related question: the __builtin_signbit is not type generic, is there
> > any reason for it not to be?  Is there any reason for it not to be?  I ask
> 
> I guess history, builtins that were added earlier when we didn't have any
> typegeneric builtins were all *{f,,l} etc. suffixed.
> As they are used quite heavily in user code, it is too late to change that.

I see no reason not to make __builtin_signbit type-generic, and think such 
a change would be a good idea (PR 36757).

> Just look at what glibc  does for years:
> /* Return nonzero value if sign of X is negative.  */
> # ifdef __NO_LONG_DOUBLE_MATH
> #  define signbit(x) \
>  (sizeof (x) == sizeof (float) ? __signbitf (x) : __signbit (x))
> # else
> #  define signbit(x) \
>  (sizeof (x) == sizeof (float) \
>   ? __signbitf (x) \
>   : sizeof (x) == sizeof (double) \
>   ? __signbit (x) : __signbitl (x))
> # endif
> We can't stop supporting that.

Such code would be completely unaffected by __builtin_signbit being made 
type-generic.  It would continue to work, but future glibc versions could 
just use __builtin_signbit conditional on a suitable __GNUC_PREREQ check, 
so eliminating a function call (on architectures not already defining 
__signbit etc. in bits/mathinline.h) and avoiding the macro expanding its 
argument more than once (something that causes exponential blowup of the 
source code size when nested macro calls are involved).

It's true that code using __builtin_signbit on a long double argument 
would lose the conversion to double (not affecting the result, but 
possibly losing exceptions arising from the conversion) and similarly 
calls for a float argument would also lose the conversion to double (so 
losing exceptions for signaling NaN arguments).  In view of the many open 
bugs relating to floating-point exceptions, I don't think such changes 
would be problematic.

-- 
Joseph S. Myers
jos...@codesourcery.com


Issue with __builtin_remainder expansion on i386

2014-09-26 Thread FX
Hi all,

I have just submitted a patch emitting some new floating-point code from the 
Fortran front-end, improving our IEEE support there: 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02444.html

However, in one of the cases where we emit a call to __builtin_remainderf(), we 
get wrong code generation on i386. It is peculiar because:

 - the wrong code occurs at all optimization levels, and the following flags 
(any or all) do not affect it: -mfpmath=sse -msse2 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans
 - the wrong code does not occur with -ffloat-store
 - the code generate looks fine by every aspect I could try. I could not 
generate a direct C testcase, unfortunately, but it is equivalent to:

  D.2303 = __builtin_remainderf (ieee_value_4 (&C.2299, &D.2301), 
1.0e+0);

where ieee_value_4() is a function that returns a positive infinity (for the 
specific values of arguments passed). However, if I split it in:

  x = ieee_value_4 (&C.2289, &D.2293);
  D.2297 = __builtin_remainderf (x, 1.0e+0);

then the wrong result does not occur anymore either. What’s bugging me is that, 
with and without -ffloat-store, the same dump trees are generated. So I don’t 
know how to get further on debugging this. I do think it’s something wrong with 
the expansion of __builtin_remainderf(), especially as it depends on 
-ffloat-store.


I attach the source code triggering it (built on trunk with my patch at 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02444.html), the tree dumps, and 
the assembler outputs (a.s without -ffloat-store, and a.s.floatstore with it). 
I can clearly see, from the assembly diff, that something is indeed changing 
but I cannot understand what.

Thanks in advance if you have any hint to share, or get a minute to look at the 
dumps/assembly.

FX






a.f90
Description: Binary data


a.f90.003t.original
Description: Binary data


a.f90.174t.optimized
Description: Binary data


a.s
Description: Binary data


a.s.floatstore
Description: Binary data