IVs optimization issue

2012-02-29 Thread Aurelien Buhrig
Hi,

I'm porting a gcc backend (4.6.1) for a 16-bit MCU with PSI pmode, and
SI ptr_mode.

I have a QoR problem with loops: the chosen IVs are often not good.
I looked at tree-ssa-loop-ivopts.c but it is hard to understand that
code. So sorry if my questions are a bit confused but I would like to
understand what happens.

First of all, I checked many times and the rtx_cost function is right.

It seems that the choice of IVs is done according to the cost of IV
candidates themselves, but also their uses, register pressure (...) so
that it is difficult for me to understand why a candidate is preferred
from another one.
But what I "feel" is that gcc tries to use "important" candidates to
satisfy all uses. For example in a simple copy from an int array to
another ( for (i=0; i

Re: IVs optimization issue

2012-02-29 Thread Richard Guenther
On Wed, Feb 29, 2012 at 9:11 AM, Aurelien Buhrig
 wrote:
> Hi,
>
> I'm porting a gcc backend (4.6.1) for a 16-bit MCU with PSI pmode, and
> SI ptr_mode.
>
> I have a QoR problem with loops: the chosen IVs are often not good.
> I looked at tree-ssa-loop-ivopts.c but it is hard to understand that
> code. So sorry if my questions are a bit confused but I would like to
> understand what happens.
>
> First of all, I checked many times and the rtx_cost function is right.
>
> It seems that the choice of IVs is done according to the cost of IV
> candidates themselves, but also their uses, register pressure (...) so
> that it is difficult for me to understand why a candidate is preferred
> from another one.
> But what I "feel" is that gcc tries to use "important" candidates to
> satisfy all uses. For example in a simple copy from an int array to
> another ( for (i=0; i (ptr_mode), addresses are computed in SImode from i, and then truncated
> into PSImode. When modifing the code so that the IV is explicited as a
> pointer (ex: for (ptr1=M1; ptr1 reduced by 20%.
>
> Moreover, in loop intensive computations, setting the
> iv-max-considered-uses=2 (so preventing optimization on complex loops)
> can make code size much much better (in Os), until 30% reduction! So it
> seems that, in such test cases, trying to optimize loops is worst than
> doing nothing.
>
>
> Here are my questions:
>
> - Is there a probable explanation for such behaviors when optimizing loops?
>
> - Is there a document (other than gccint) describing loops and their
> optimization?
>
> - It seems that keeping computations and IVs in PSI is often preferable,
> but there is no Pmode in tree representation, right? So when/where is
> the choice for the mode around pointer operations made (ptr_mode vs Pmode) ?
>
> - PSImode is only used in very few backends as Pmode (m32c). Is its use
> really optimized from middle-end algorithms/heuristics ?
>
> - Looking at the code, it seems there are different sets of IVs, for
> instance in find_optimal_ivs_set with origset and set. Sometimes,
> forcing one (often origset) generates better code. But what is the
> difference between origset and set ?
>
> - And finally, is there something I can do from the back-end to make
> loop code better?

The issue is most probably that on GIMPLE we only deal with ptr_mode,
not Pmode, and IVOPTs thinks that pointer induction variables will
have ptr_mode.  To fix this the cost computation would need to take
into account ptr_mode to Pmode conversions _and_ would need to
consider Pmode IVs in the first place (I'm not sure that will be easy).

Richard.

> Thank you by advance!
> Aurélien


Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-02-29 Thread Bin.Cheng
On Mon, Jan 2, 2012 at 10:54 PM, Richard Guenther
 wrote:
> On Mon, Jan 2, 2012 at 3:09 PM, Amker.Cheng  wrote:
>> On Mon, Jan 2, 2012 at 9:37 PM, Richard Guenther
>>  wrote:
>>
>>> Well, with
>>>
>>> Index: gcc/tree-ssa-pre.c
>>> ===
>>> --- gcc/tree-ssa-pre.c  (revision 182784)
>>> +++ gcc/tree-ssa-pre.c  (working copy)
>>> @@ -4335,16 +4335,23 @@ eliminate (void)
>>>             available value-numbers.  */
>>>          else if (gimple_code (stmt) == GIMPLE_COND)
>>>            {
>>> -             tree op0 = gimple_cond_lhs (stmt);
>>> -             tree op1 = gimple_cond_rhs (stmt);
>>> +             tree op[2];
>>>              tree result;
>>> +             vn_nary_op_t nary;
>>>
>>> -             if (TREE_CODE (op0) == SSA_NAME)
>>> -               op0 = VN_INFO (op0)->valnum;
>>> -             if (TREE_CODE (op1) == SSA_NAME)
>>> -               op1 = VN_INFO (op1)->valnum;
>>> +             op[0] = gimple_cond_lhs (stmt);
>>> +             op[1] = gimple_cond_rhs (stmt);
>>> +             if (TREE_CODE (op[0]) == SSA_NAME)
>>> +               op[0] = VN_INFO (op[0])->valnum;
>>> +             if (TREE_CODE (op[1]) == SSA_NAME)
>>> +               op[1] = VN_INFO (op[1])->valnum;
>>>              result = fold_binary (gimple_cond_code (stmt), 
>>> boolean_type_node,
>>> -                                   op0, op1);
>>> +                                   op[0], op[1]);
>>> +             if (!result)
>>> +               result = vn_nary_op_lookup_pieces (2, gimple_cond_code 
>>> (stmt),
>>> +                                                  boolean_type_node,
>>> +                                                  op, &nary);
>>> +
>>>              if (result && TREE_CODE (result) == INTEGER_CST)
>>>                {
>>>                  if (integer_zerop (result))
>>> @@ -4354,6 +4361,13 @@ eliminate (void)
>>>                  update_stmt (stmt);
>>>                  todo = TODO_cleanup_cfg;
>>>                }
>>> +             else if (result && TREE_CODE (result) == SSA_NAME)
>>> +               {
>>> +                 gimple_cond_set_code (stmt, NE_EXPR);
>>> +                 gimple_cond_set_lhs (stmt, result);
>>> +                 gimple_cond_set_rhs (stmt, boolean_false_node);
>>> +                 update_stmt (stmt);
>>> +               }
>>>            }
>>>          /* Visit indirect calls and turn them into direct calls if
>>>             possible.  */
>>>
>>> you get the CSE (too simple patch, you need to check leaders properly).
>>> You can then add similar lookups for an inverted conditional.
>>
>> Thanks for your explanation. On shortcoming of this method is that it
>> cannot find/take cond_expr(and the implicitly defined variable) as the
>> leader in pre. I guess this is why you said it can handle a subset of all
>> cases in previous message?
>
> Yes.  It won't handle
>
>  if (x > 1)
>   ...
>  tem = x > 1;
>
> or
>
>  if (x > 1)
>   ...
>  if (x > 1)
>
> though maybe we could teach PRE to do the insertion by properly
> putting x > 1 into EXP_GEN in compute_avail (but not into AVAIL_OUT).
> Not sure about this though.  Currently we don't do anything to
> GIMPLE_COND operands (which seems odd anyway, we should
> at least add the operands to EXP_GEN).
I spent some time on this issue again.
Yes, we can insert compare EXPR in EXP_GEN(but not AVAIL_OUT).
We also can teach PRE to insert the recorded compare EXPR in previous
mentioned special cases to solve the problem.
But, this way it's not different from factoring compare EXPR out of
GIMPLE_COND, since insertion/elimination itself is a kind of rewriting.
Thus I think the problem should be fixed by rewriting/factoring compare
EXPR out of GIMPLE_COND.

Second point, as you said, PRE often get confused and moves compare
EXPR far from jump statement. Could we rely on register re-materialize
to handle this, or any other solution?

I would like to learn more about this case, so do you have any opinion on
how this should be fixed for now.

Thanks
-- 
Best Regards.


Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-02-29 Thread Richard Guenther
On Wed, Feb 29, 2012 at 3:50 PM, Bin.Cheng  wrote:
> On Mon, Jan 2, 2012 at 10:54 PM, Richard Guenther
>  wrote:
>> On Mon, Jan 2, 2012 at 3:09 PM, Amker.Cheng  wrote:
>>> On Mon, Jan 2, 2012 at 9:37 PM, Richard Guenther
>>>  wrote:
>>>
 Well, with

 Index: gcc/tree-ssa-pre.c
 ===
 --- gcc/tree-ssa-pre.c  (revision 182784)
 +++ gcc/tree-ssa-pre.c  (working copy)
 @@ -4335,16 +4335,23 @@ eliminate (void)
             available value-numbers.  */
          else if (gimple_code (stmt) == GIMPLE_COND)
            {
 -             tree op0 = gimple_cond_lhs (stmt);
 -             tree op1 = gimple_cond_rhs (stmt);
 +             tree op[2];
              tree result;
 +             vn_nary_op_t nary;

 -             if (TREE_CODE (op0) == SSA_NAME)
 -               op0 = VN_INFO (op0)->valnum;
 -             if (TREE_CODE (op1) == SSA_NAME)
 -               op1 = VN_INFO (op1)->valnum;
 +             op[0] = gimple_cond_lhs (stmt);
 +             op[1] = gimple_cond_rhs (stmt);
 +             if (TREE_CODE (op[0]) == SSA_NAME)
 +               op[0] = VN_INFO (op[0])->valnum;
 +             if (TREE_CODE (op[1]) == SSA_NAME)
 +               op[1] = VN_INFO (op[1])->valnum;
              result = fold_binary (gimple_cond_code (stmt), 
 boolean_type_node,
 -                                   op0, op1);
 +                                   op[0], op[1]);
 +             if (!result)
 +               result = vn_nary_op_lookup_pieces (2, gimple_cond_code 
 (stmt),
 +                                                  boolean_type_node,
 +                                                  op, &nary);
 +
              if (result && TREE_CODE (result) == INTEGER_CST)
                {
                  if (integer_zerop (result))
 @@ -4354,6 +4361,13 @@ eliminate (void)
                  update_stmt (stmt);
                  todo = TODO_cleanup_cfg;
                }
 +             else if (result && TREE_CODE (result) == SSA_NAME)
 +               {
 +                 gimple_cond_set_code (stmt, NE_EXPR);
 +                 gimple_cond_set_lhs (stmt, result);
 +                 gimple_cond_set_rhs (stmt, boolean_false_node);
 +                 update_stmt (stmt);
 +               }
            }
          /* Visit indirect calls and turn them into direct calls if
             possible.  */

 you get the CSE (too simple patch, you need to check leaders properly).
 You can then add similar lookups for an inverted conditional.
>>>
>>> Thanks for your explanation. On shortcoming of this method is that it
>>> cannot find/take cond_expr(and the implicitly defined variable) as the
>>> leader in pre. I guess this is why you said it can handle a subset of all
>>> cases in previous message?
>>
>> Yes.  It won't handle
>>
>>  if (x > 1)
>>   ...
>>  tem = x > 1;
>>
>> or
>>
>>  if (x > 1)
>>   ...
>>  if (x > 1)
>>
>> though maybe we could teach PRE to do the insertion by properly
>> putting x > 1 into EXP_GEN in compute_avail (but not into AVAIL_OUT).
>> Not sure about this though.  Currently we don't do anything to
>> GIMPLE_COND operands (which seems odd anyway, we should
>> at least add the operands to EXP_GEN).
> I spent some time on this issue again.
> Yes, we can insert compare EXPR in EXP_GEN(but not AVAIL_OUT).
> We also can teach PRE to insert the recorded compare EXPR in previous
> mentioned special cases to solve the problem.
> But, this way it's not different from factoring compare EXPR out of
> GIMPLE_COND, since insertion/elimination itself is a kind of rewriting.
> Thus I think the problem should be fixed by rewriting/factoring compare
> EXPR out of GIMPLE_COND.

Yes, see http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02098.html
unfortunately it is a non-trivial task ;)

> Second point, as you said, PRE often get confused and moves compare
> EXPR far from jump statement. Could we rely on register re-materialize
> to handle this, or any other solution?

Well, a simple kind of solution would be to preprocess the IL before
redundancy elimination and separate the predicate computation from
their uses and then as followup combine predicates back (tree forwprop
would do that, for example - even for multiple uses).  The question is
what you gain in the end.

> I would like to learn more about this case, so do you have any opinion on
> how this should be fixed for now.

The GIMPLE IL should be better here, especially if you consider that
we force away predicate computation that may trap for -fnon-call-exceptions
already.  So, simplifying the IL is still the way to go IMHO.  But as I said
above - it's a non-trivial task with possibly much fallout.

Richard.

> Thanks
> --
> Best Regards.


Re: IVs optimization issue

2012-02-29 Thread Aurelien Buhrig

> The issue is most probably that on GIMPLE we only deal with ptr_mode,
> not Pmode, and IVOPTs thinks that pointer induction variables will
> have ptr_mode.  To fix this the cost computation would need to take
> into account ptr_mode to Pmode conversions _and_ would need to
> consider Pmode IVs in the first place (I'm not sure that will be easy).


Thank you Richard for you reply.

I guess such an issue is not in the top priority tasks of main
developers. So I think I'll have to look at it myself if I feel
confident enough to carry out such a job (I've never worked at tree level).

My main wonder is about Pmode IVs: since GIMPLE representation only
deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?

BTW, this wonder is not limited to IVs. What does control the choice of
Pmode vs ptr_mode when mapping to RTL?

Thanks,
Aurélien



Re: IVs optimization issue

2012-02-29 Thread Richard Guenther
On Wed, Feb 29, 2012 at 4:08 PM, Aurelien Buhrig
 wrote:
>
>> The issue is most probably that on GIMPLE we only deal with ptr_mode,
>> not Pmode, and IVOPTs thinks that pointer induction variables will
>> have ptr_mode.  To fix this the cost computation would need to take
>> into account ptr_mode to Pmode conversions _and_ would need to
>> consider Pmode IVs in the first place (I'm not sure that will be easy).
>
>
> Thank you Richard for you reply.
>
> I guess such an issue is not in the top priority tasks of main
> developers. So I think I'll have to look at it myself if I feel
> confident enough to carry out such a job (I've never worked at tree level).
>
> My main wonder is about Pmode IVs: since GIMPLE representation only
> deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?

Its TREE_TYPE.  In your case you'd have a POINTER_TYPE with
PSImode for Pmode and a POINTER_TYPE with SImode for ptr_mode
pointers.  They will differ in TYPE_MODE and TYPE_PRECISION.

> BTW, this wonder is not limited to IVs. What does control the choice of
> Pmode vs ptr_mode when mapping to RTL?

ptr_mode is the C language specified mode for all pointers.  Pmode is
the mode used for pointers in address operands of CPU instructions.
Usually they are the same.  When mapping to RTL all ptr_mode uses
for memory accesses are translated to Pmode while operations on
the value of ptr_mode quantities are done on ptr_mode (IIRC).

Richard.

> Thanks,
> Aurélien
>


Re: IVs optimization issue

2012-02-29 Thread Aurelien Buhrig
Le 29/02/2012 16:15, Richard Guenther a écrit :
> On Wed, Feb 29, 2012 at 4:08 PM, Aurelien Buhrig
>  wrote:
>>
>>> The issue is most probably that on GIMPLE we only deal with ptr_mode,
>>> not Pmode, and IVOPTs thinks that pointer induction variables will
>>> have ptr_mode.  To fix this the cost computation would need to take
>>> into account ptr_mode to Pmode conversions _and_ would need to
>>> consider Pmode IVs in the first place (I'm not sure that will be easy).
>>
>>
>> Thank you Richard for you reply.
>>
>> I guess such an issue is not in the top priority tasks of main
>> developers. So I think I'll have to look at it myself if I feel
>> confident enough to carry out such a job (I've never worked at tree level).
>>
>> My main wonder is about Pmode IVs: since GIMPLE representation only
>> deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?
> 
> Its TREE_TYPE.  In your case you'd have a POINTER_TYPE with
> PSImode for Pmode and a POINTER_TYPE with SImode for ptr_mode
> pointers.  They will differ in TYPE_MODE and TYPE_PRECISION.

Thanks, I will look at it.

>> BTW, this wonder is not limited to IVs. What does control the choice of
>> Pmode vs ptr_mode when mapping to RTL?
> 
> ptr_mode is the C language specified mode for all pointers.  Pmode is
> the mode used for pointers in address operands of CPU instructions.
> Usually they are the same.  When mapping to RTL all ptr_mode uses
> for memory accesses are translated to Pmode while operations on
> the value of ptr_mode quantities are done on ptr_mode (IIRC).

Another point that is not optimal for my backend is when computing the
address of an array element (M[i]). Now, both the M address and i are
extended to ptr_mode and the sum is truncated in Pmode; whereas it would
be much better to extend i to Pmode, and then perform the add in Pmode.
So if I understand correctly, the later option cannot be generated. Right?

> Richard.
> 
>> Thanks,
>> Aurélien
>>



Re: IVs optimization issue

2012-02-29 Thread Richard Guenther
On Wed, Feb 29, 2012 at 4:41 PM, Aurelien Buhrig
 wrote:
> Le 29/02/2012 16:15, Richard Guenther a écrit :
>> On Wed, Feb 29, 2012 at 4:08 PM, Aurelien Buhrig
>>  wrote:
>>>
 The issue is most probably that on GIMPLE we only deal with ptr_mode,
 not Pmode, and IVOPTs thinks that pointer induction variables will
 have ptr_mode.  To fix this the cost computation would need to take
 into account ptr_mode to Pmode conversions _and_ would need to
 consider Pmode IVs in the first place (I'm not sure that will be easy).
>>>
>>>
>>> Thank you Richard for you reply.
>>>
>>> I guess such an issue is not in the top priority tasks of main
>>> developers. So I think I'll have to look at it myself if I feel
>>> confident enough to carry out such a job (I've never worked at tree level).
>>>
>>> My main wonder is about Pmode IVs: since GIMPLE representation only
>>> deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?
>>
>> Its TREE_TYPE.  In your case you'd have a POINTER_TYPE with
>> PSImode for Pmode and a POINTER_TYPE with SImode for ptr_mode
>> pointers.  They will differ in TYPE_MODE and TYPE_PRECISION.
>
> Thanks, I will look at it.
>
>>> BTW, this wonder is not limited to IVs. What does control the choice of
>>> Pmode vs ptr_mode when mapping to RTL?
>>
>> ptr_mode is the C language specified mode for all pointers.  Pmode is
>> the mode used for pointers in address operands of CPU instructions.
>> Usually they are the same.  When mapping to RTL all ptr_mode uses
>> for memory accesses are translated to Pmode while operations on
>> the value of ptr_mode quantities are done on ptr_mode (IIRC).
>
> Another point that is not optimal for my backend is when computing the
> address of an array element (M[i]). Now, both the M address and i are
> extended to ptr_mode and the sum is truncated in Pmode; whereas it would
> be much better to extend i to Pmode, and then perform the add in Pmode.
> So if I understand correctly, the later option cannot be generated. Right?

Not by IVOPTs at least.  There is also the long-standing issue that
POINTER_PLUS_EXPR only accepts sizetype offsets - that may cause
issues if your target does not define sizetype having the same mode as
ptr_mode.  (And of course complicates using Pmode on the gimple level)

Richard.

>> Richard.
>>
>>> Thanks,
>>> Aurélien
>>>
>


Re: IVs optimization issue

2012-02-29 Thread Aurelien Buhrig
Le 29/02/2012 17:08, Richard Guenther a écrit :
> On Wed, Feb 29, 2012 at 4:41 PM, Aurelien Buhrig
>  wrote:
>> Le 29/02/2012 16:15, Richard Guenther a écrit :
>>> On Wed, Feb 29, 2012 at 4:08 PM, Aurelien Buhrig
>>>  wrote:

> The issue is most probably that on GIMPLE we only deal with ptr_mode,
> not Pmode, and IVOPTs thinks that pointer induction variables will
> have ptr_mode.  To fix this the cost computation would need to take
> into account ptr_mode to Pmode conversions _and_ would need to
> consider Pmode IVs in the first place (I'm not sure that will be easy).


 Thank you Richard for you reply.

 I guess such an issue is not in the top priority tasks of main
 developers. So I think I'll have to look at it myself if I feel
 confident enough to carry out such a job (I've never worked at tree level).

 My main wonder is about Pmode IVs: since GIMPLE representation only
 deals with ptr_mode, what differentiates a Pmode IV from a ptr_mode one?
>>>
>>> Its TREE_TYPE.  In your case you'd have a POINTER_TYPE with
>>> PSImode for Pmode and a POINTER_TYPE with SImode for ptr_mode
>>> pointers.  They will differ in TYPE_MODE and TYPE_PRECISION.
>>
>> Thanks, I will look at it.
>>
 BTW, this wonder is not limited to IVs. What does control the choice of
 Pmode vs ptr_mode when mapping to RTL?
>>>
>>> ptr_mode is the C language specified mode for all pointers.  Pmode is
>>> the mode used for pointers in address operands of CPU instructions.
>>> Usually they are the same.  When mapping to RTL all ptr_mode uses
>>> for memory accesses are translated to Pmode while operations on
>>> the value of ptr_mode quantities are done on ptr_mode (IIRC).
>>
>> Another point that is not optimal for my backend is when computing the
>> address of an array element (M[i]). Now, both the M address and i are
>> extended to ptr_mode and the sum is truncated in Pmode; whereas it would
>> be much better to extend i to Pmode, and then perform the add in Pmode.
>> So if I understand correctly, the later option cannot be generated. Right?
> 
> Not by IVOPTs at least.  There is also the long-standing issue that
> POINTER_PLUS_EXPR only accepts sizetype offsets - that may cause
> issues if your target does not define sizetype having the same mode as
> ptr_mode.  (And of course complicates using Pmode on the gimple level)

Sorry, it wasn't related to ivopts, but on the use of Pmode from Gimple,
and especially when computing a M[i] address. (My ptr_mode and SIZE_TYPE
mode are the same). Can you confirm that it's not possible to compute
the address of M[i] in Pmode without truncating from ptr_mode? Because
mapping POINTER_PLUS_EXPR directly to Pmode would also be (with ivopts
PSI support) a great improvement for Pmode=PSImode targets.

Thanks for your help,
Aurélien

> Richard.
> 
>>> Richard.
>>>
 Thanks,
 Aurélien

>>



The state of glibc libm

2012-02-29 Thread Joseph S. Myers
I've reviewed many (not yet all) of glibc's open "math" component bugs.  I 
hope some actual summary information on what the current state of libm 
looks like may be of interest to the people involved in the various past 
discussions of better libm for GCC or glibc - and those interested in 
fixing things, whether through patches to existing code, new 
implementations or both.

I would say the actual main issues with the present glibc libm are the 
following (if there are others, they are not well-reflected in the open 
bugs).  When I refer to a function by name I'm generally referred to all 
of the float, double and various long double implementations (x86 
extended, IEEE binary128, IBM double+double) - and fixes will often need 
to fix multiple versions of a function in similar ways.

(a) Most libm functions are not correctly rounded - and do not make an 
attempt at being correctly rounded.

A full fix would likely require new (automatically generated and tuned) 
implementations such as proposed at 
.  As I understand it, 
correct rounding (proved correct) is generally feasible for functions of 
one binary32, binary64 or x86 extended argument.  For functions of two 
arguments, or one binary128 argument, it may not be feasible to search for 
worst cases for correct rounding, although it may be possible to produce 
implementations that are "probably" correctly rounding.  For functions of 
complex arguments or IBM long double, correct rounding may be less 
feasible (even complex multiplication and division, and all of +-*/ on IBM 
long double, are not correctly rounding, and correct rounding isn't so 
well-defined for IBM long double with its possibility of discontiguous 
mantissa bits).

Known inaccuracies in functions are indicated by the libm-test-ulps files 
in glibc.  Given my patch 
 to reduce some 
old figures that look like having been left behind after tests or the 
library were fixed, all listed errors are 24ulp or below (9ulp or below 
for x86 and x86_64; I haven't tested the larger errors on other 
architectures to see if they are actually still applicable).

(b) Where functions do make attempts at being correctly rounded 
(especially the IBM Accurate Mathematical Library functions), they tend to 
be sufficiently slow that the slowness attracts bug reports.  Again, this 
would likely be addressed by new implementations that use careful error 
bounds and information about worst cases to reduce the cost of being 
correctly rounding.

(c) Various functions do not set errno correctly (many cases) or raise the 
proper floating-point exceptions (a smaller number of cases - both 
spurious exceptions where not permitted by ISO C, and failing to raise 
required overflow/underflow exceptions).  In general this is a separate 
bug for each function (filed as many separate bugs in glibc Bugzilla) and 
can be fixed by a separate local patch for each function (adding a 
testcase, of course - note that glibc's main libm-test.inc presently only 
tests invalid and divide-by-zero exceptions, so if working on these error 
handling issues it might be useful to extend it to cover other exceptions 
as well as errno values).

(d) There are some specific bugs filed for functions such as nexttoward 
whose results are precisely specified by ISO C; in general these should be 
fixable by local patches.

(e) Various functions, mainly IBM Accurate Mathematical Library ones, 
produce wildly wrong results or crash in non-default rounding modes.  I 
have a patch for exp pending review at 
 and I expect 
others can be fixed similarly.

(f) Various trigonometrical functions are inaccurate on x86 and x86_64 
(see glibc bug 13658 and recent discussions).

(g) Bessel function implementations handle large inputs in different ways 
to other functions, as I discuss at 
.

(h) Various complex functions have problems such as inaccuracy or wrong 
branch cuts.

(i) Some real functions have particular issues (which should be fixable by 
local changes short of new correctly rounded implementations):

  - erfc (my patch 
 is pending 
review).

  - pow (bugs 369, 706, 2678, 3866).  The assembly implementations may 
complicate fixing these issues, though it's probably possible to fix 
only some bugs (in all relevant implementations, with plenty of 
testcases) rather than a patch needing to fix both all issues and all 
implementations at once.

  - lgamma, in cases where the result is close to 0 and there is a lot of 
cancellation in the present calculations.

  - tgamma, in cases of results of large magnitude (where the approach of 
using exp (lgamma) leads to large errors).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: The state of glibc libm

2012-02-29 Thread David Miller
From: "Joseph S. Myers" 
Date: Wed, 29 Feb 2012 17:17:17 + (UTC)

Thanks for looking into all of these issues.

> (c) Various functions do not set errno correctly (many cases) or raise the 
> proper floating-point exceptions (a smaller number of cases - both 
> spurious exceptions where not permitted by ISO C, and failing to raise 
> required overflow/underflow exceptions).  In general this is a separate 
> bug for each function (filed as many separate bugs in glibc Bugzilla) and 
> can be fixed by a separate local patch for each function (adding a 
> testcase, of course - note that glibc's main libm-test.inc presently only 
> tests invalid and divide-by-zero exceptions, so if working on these error 
> handling issues it might be useful to extend it to cover other exceptions 
> as well as errno values).

This reminds me that there are math tests local to the powerpc port
that therefore only run on powerpc.  For example, I've looked at
sysdeps/powerpc/test-arith{,f}.c and they don't seem so non-portable
that we couldn't run them everywhere with just some small tweaks.


Re: random commentary on -fsplit-stack (and a bug report)

2012-02-29 Thread Ian Lance Taylor
"Jay Freeman (saurik)"  writes:

>> As you know, I wanted to allow for future expansion.  I agree that it
>> would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
>> off space for time, since it would mean that setcontext would have to
>> walk up the list.  I think CURRENT_STACK is required for
>> __splitstack_find_context.  And __splitstack_find_context is required
>> for Go's garbage collector.  At least, it's not obvious to me how to
>> avoid requiring CURRENT_STACK for that case.
>
> The basis of that suggestion was not just that the items in the
> context could be removed, but that the underlying state used by split
> stacks might not need the values at all. In this case, I am not
> certain why __morestack_segments is needed: it seems to only come in
> to play when __morestack_current_segment is NULL (and I'm not certain
> how that would happen) and while deallocating dynamic blocks (which is
> already linear).

I think I see what you mean.  If you can eliminate __morestack_segments
entirely, that is fine with me.


>> > 7) Using the linker to handle the transition between split-stack and
>> > non-split-stack code seems like a good way to solve the problem of "we
>> > need large stacks when hitting external code", but in staring at the
>> > resulting code I have in my project I'm seeing that it isn't reliable:
>> > if you have a pointer to a function the linker will not know what you
>> > are calling. In my case, this is coming up often due to using
>> > std::function.
>> 
>> Yes, good point.  I think I had some plan for handling that but I no
>> longer recall what it was.
>
> After getting more sleep, I realize that this problem is actually much
> more endemic than I had even previously thought. Most any vaguely
> object-oriented library is going to have tons of function pointers in
> it, and you often interact solely with those function pointers (as in,
> you have no actual symbol references anywhere). A simple example: in
> the case of C++, any call to a non-split-stack virtual function will
> fail.

Certainly true in principle, but unlikely in practice.  Why would you
compile part of your C++ program with split-stack and part without?
Implementing child classes that define virtual methods for classes
defined in a precompiled library seems like an unusual case to me.


> """Function pointers are a tricky case. In general we don't know
> whether a function pointer points to split-stack code. Therefore, all
> calls through a function pointer will be modified to call (or jump to)
> a special function __fnptr_morestack. This will use a target specific
> function calling sequence, and will be implemented as though it were
> itself a function call instruction. That is, all the parameters will
> be set up, and then the code will jump to __fnptr_morestack. The
> __fnptr_morestack function takes two parameters: the function pointer
> to call, and the number of bytes of arguments pushed on the
> stack. (This is not yet implemented.)"""
>
> That paragraph is from your design document (SplitStacks on the GCC
> wiki). I presume that this solution would only work if
> __fnptr_morestack always assumed that the target did not support
> split-stack? Alternatively, I can see having that stub look at the
> function to see if its first instruction was a comparison to the TCB
> stack limit entry (using similar logic to that used by the linker)?
> [also, see below in this e-mail]

So at least I did have a plan, even if I didn't really flesh it out or
actually implement it.  Yes, looking at the first instruction seems like
a good way to tell whether a large stack much be allocated.  I think
this could be fairly efficient when using a function pointer to call
split-stack code, something like 8 extra instructions and a memory load.


>> > More awkwardly, split-stack functions that mention (but do not call)
>> > non-split-stack functions (such as to return their address) are being
>> > mis-flagged by the linker. Honestly, I question whether the linker
>> > fundamentally has enough information about what is going on to be able
>> > to make sufficiently accurate decisions with regards to stack
>> > constraints to warrant the painful abstraction breakage that
>> > split-stack uses. :(
>> 
>> Your're right that the linker doesn't really have enough information.
>> But is a split-stack function that returns the address of a
>> non-split-stack function really so frequent that it's worth worrying
>> about?
>
> I guess the question I have is: is one of the goals to make this
> option "safe to turn on for a random project"? Given the abstraction
> break that was made between the compiler and the linker, it would seem
> like this was a rather critically important goal (as now both the
> linker and the compiler are less modular and more difficult to
> modify), but in fact the result doesn't manage to solve seemingly
> simple corner cases.

The abstraction break exists not because I thought it was a good idea,
but bec