Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Richard Guenther
On Thu, Feb 9, 2012 at 8:16 PM, Geert Bosch  wrote:
>
> On Feb 9, 2012, at 12:55, Joseph S. Myers wrote:
>
>> No, that's not the case.  Rather, the point would be that both GCC's
>> library and glibc's end up being based on the new GNU project (which might
>> take some code from glibc and some from elsewhere - and quite possibly
>> write some from scratch, taking care to ensure new code is properly
>> commented and all constants are properly explained with free software code
>> available to calculate all the precomputed tables).  (Is
>> sysdeps/ieee754/dbl-64/uatan.tbl in glibc - a half-megabyte file of
>> precomputed numbers - *really* free software?  No doubt it wouldn't be
>> hard to work out what the numbers are from the comment "x0,cij for
>> (1/16,1)" and write a program using MPFR to compute them - but you
>> shouldn't need such an exercise to figure it out, such generated tables
>> should come with free software to generate them since the table itself
>> certainly isn't the preferred form for modification.)
>
> I don't agree having such a libm is the ultimate goal. It could be
> a first step along the way, addressing correctness issues. This
> would be great progress, but does not remove the need for having
> at least versions of common elementary functions directly integrated
> with GCC.
>
> In particular, it would not address the issue of performance. For
> that we need at least a permissive license to allow full inlining,
> but it seems unlikely to me that we'd be able to get glibc code
> under those conditions any time soon.

I don't buy the argument that inlining math routines (apart from those
we already handle) would improve performance.  What will improve
performance is to have separate entry points to the routines
to skip errno handling, NaN/Inf checking or rounding mode selection
when certain compilation flags are set.  That as well as a more
sane calling convention for, for example sincos, or in general
on x86_64 (have at least _some_ callee-saved XMM registers).

The issue with libm in glibc here is that Drepper absolutely does
not want new ABIs in libm - he believes that for example vectorized
routines do not belong there (nor the SSE calling-convention variants
for i686 I tried to push once).

> I'd rather start collecting suitable code in GCC now. If/when a
> libm project materializes, it can take our freely licensed code
> and integrate it. I don't see why we need to wait.
> Still, licensing is not the only issue for keeping this in GCC.

True.  I think we can ignore glibc and rather think as of that newlib
might use it if we're going to put it in toplevel src.

> From a technical point of view, I see many reasons to tightly couple
> implementations of elementary functions with the compiler like we
> do now for basic arithmetic in libgcc. On some targets we will
> want to directly implement elementary functions in the back end,
> such as we do for sqrt on ia64 today.  In other cases we may want
> to do something similar in machine independent code.
>
> I think nowadays it makes as little sense to have sqrt, sin, cos,
> exp and ln and other common simple elementary functions in an
> external library, as it would be for multiplication and division to
> be there. Often we'll want to generate these functions inline taking
> advantage of all knowledge we have about arguments, context, rounding
> modes, required accuracy, etc.
>
> So we need both an accurate libm, and routines with a permissive
> license for integration with GCC.

And a license, that for example allows installing a static library variant
with LTO bytecode embedded so we indeed _can_ inline and re-optimize it.
Of course I expect the fastest paths to be architecture specific assembly
anyway ...

Richard.

>
>  -Geert


Re: Building gcc on Ubuntu 11.10

2012-02-10 Thread Andreas Schwab
Russ Allbery  writes:

> For example, suppose I'm doing development on an amd64 box targeting armel
> and I want to use Kerberos libraries in my armel application.  I'd like to
> be able to install the armel Kerberos libraries on my Debian system using
> regular package management commands, just like any other package.

Just add a --sysroot option to the packager (that also transparently
translates symlinks), case closed.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Building gcc on Ubuntu 11.10

2012-02-10 Thread Andrew Haley
On 02/10/2012 10:15 AM, Andreas Schwab wrote:
> Russ Allbery  writes:
> 
>> For example, suppose I'm doing development on an amd64 box targeting armel
>> and I want to use Kerberos libraries in my armel application.  I'd like to
>> be able to install the armel Kerberos libraries on my Debian system using
>> regular package management commands, just like any other package.
> 
> Just add a --sysroot option to the packager (that also transparently
> translates symlinks), case closed.

Eaxctly: the packager should install packages inside the sysroot.
This is so obvious that it's hard to believe anyone would do it
differently.

Andrew.


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Andrew Haley
On 02/10/2012 10:07 AM, Richard Guenther wrote:
> 
> The issue with libm in glibc here is that Drepper absolutely does
> not want new ABIs in libm - he believes that for example vectorized
> routines do not belong there (nor the SSE calling-convention variants
> for i686 I tried to push once).

That's a good reason why we can't do this in glibc.  I accept
the point; it'll have to be GNU libm.

Andrew.


register used as both FP and GP register when -Os switch used

2012-02-10 Thread Paul S
I'm porting gcc 4.6.2 to a 16 bit CPU that has four GP registers. I've 
chosen to allocate R3 as the frame pointer when one is needed.


In line with GCC Internals info on FIXED_REGISTERS ("except on machines 
where that can be used as a general register when no frame pointer is 
needed") I have not marked R3 as fixed. The problem I'm describing below 
doesn't occur when FP (R3) is marked as FIXED.


#define FIXED_REGISTERS
  {
 /* r0  r1  r2  r3  sp  pc  cc .*/
0,  0,  0,  0,  1,  1,  1,  1
  }

and FP is marked as ELIMINABLE

#define ELIMINABLE_REGS \
  { \
{ ARG_POINTER_REGNUM,   STACK_POINTER_REGNUM}, \
{ ARG_POINTER_REGNUM,   FRAME_POINTER_REGNUM}, \
{ FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}  \
  }

TARGET_FUNCTION_ARG only allows parameter passing in R0 & R1.

This functions perfectly for all optimizations (-O0...-O3) and both 
-fomit-frame-pointer and -fno-omit-frame-pointer excepting "-Os 
-fno-omit-frame-pointer" where the following test snippet


struct ts
{
int x,z,y;
};

int f( struct ts *s)
{
while( --s->y )
{
s->x *= s->z/s->x;
}
return s->x;
}

produces ..

...
.L3:
movr1,r2;# 13*movhihi/1[length = 2]
--> movr3,[r3+2];# 51*movhihi/3[length = 4]
movr0,[r3+2];# 14*movhihi/3[length = 4]
call___divhi3;# 15call_value_label/1[length = 4]
mulr2,r0;# 17mulhi3/2[length = 2]
movr0,[r3+2];# 52*movhihi/3[length = 4]
strr2,[r0];# 18*movhihi/4[length = 4]
.L2:
inc[r3],#-1;# 65inchi3_cc[length = 2]
--> movr3,[r3+2];# 55*movhihi/3[length = 4]
movr2,[r3];# 41*movhihi/3[length = 4]
bne.L3;# 24cbranchcc4[length = 6]
...

I have marked the lines that uses r3 as a GP register and clobbers it 
with -->. The corresponding RTL is ...


(insn 51 13 14 3 (set (reg:HIreg:3)
(mem/c:HI (plus:HI (reg/f:HIreg:3)
(const_int 2 [0x2])) [4 %sfp+2 S2 A16])) t.c:14 88 
{*movhihi}

 (nil))
(insn 14 51 15 3 (set (reg:HIreg:0)
(mem/s:HI (plus:HI (reg:HIreg:3)
(const_int 2 [0x2])) [2 s_1(D)->z+0 S2 A16])) t.c:14 88 
{*movhihi}

 (nil))
(call_insn/u 15 14 16 3 (set (reg:HIreg:0)
(call (mem:HI (symbol_ref:HI ("__divhi3") [flags 0x41]) [0 S2 A8])

when -O2 is used the offending fragment is correct ...


.L4:
movr1,[r3];# 23*movhihi/3[length = 4]
movr0,[r3+2];# 24*movhihi/3[length = 4]
call___divhi3;# 25call_value_label/1[length = 4]
movr1,[r3];# 64*movhihi/3[length = 4]
mulr1,r0;# 28mulhi3/2[length = 2]
strr1,[r3];# 65*movhihi/4[length = 4]
addr2,#-1;# 29addhi3_cc/3[length = 2]
bne.L4;# 32cbranchcc4[length = 6]


I wonder if anyone can provide some hints before I waste time hunting 
down an optimisation bug that is really a problem with my configuration ?


Cheers, Paul










Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread James Courtier-Dutton
On 10 February 2012 10:42, Andrew Haley  wrote:
> On 02/10/2012 10:07 AM, Richard Guenther wrote:
>>
>> The issue with libm in glibc here is that Drepper absolutely does
>> not want new ABIs in libm - he believes that for example vectorized
>> routines do not belong there (nor the SSE calling-convention variants
>> for i686 I tried to push once).
>
> That's a good reason why we can't do this in glibc.  I accept
> the point; it'll have to be GNU libm.
>
> Andrew.

I think a starting point would be at least documenting correctly the
accuracy of the current libm, because what is currently in the
documents is obviously wrong.
It certainly does not document that sin() is currently more accurate
than sinl() on x86_64 platforms, even with -O0.
I think the documentation should be along the lines of for
Function name: sin(x)
-pi/2 < x < pi/2, accuracy is +-N
pi/2 < x < large number, accuracy is +-N
large number < x < max_float, accuracy is +-N
If a performance figure could also be added, so much the better. We
could then know that implementation 1 is quicker than implementation 2
and the programmer can do the trade offs.

And have the figures repeated for various options such as -ffast-math
etc, and also repeated for different CPUs etc.

Once we have the documentation, we can then ensure, using automated
testing, that the libm actually performs as designed.


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Joseph S. Myers
On Fri, 10 Feb 2012, Richard Guenther wrote:

> I don't buy the argument that inlining math routines (apart from those
> we already handle) would improve performance.  What will improve
> performance is to have separate entry points to the routines
> to skip errno handling, NaN/Inf checking or rounding mode selection
> when certain compilation flags are set.  That as well as a more
> sane calling convention for, for example sincos, or in general
> on x86_64 (have at least _some_ callee-saved XMM registers).

glibc has some extra entry points for -ffinite-math-only in 2.15.

> The issue with libm in glibc here is that Drepper absolutely does
> not want new ABIs in libm - he believes that for example vectorized
> routines do not belong there (nor the SSE calling-convention variants
> for i686 I tried to push once).

And this fits in with the general principle that glibc's libm is for 
general-purpose use (balancing speed, size, accuracy etc.) and it makes 
sense to have the extra variants in a separate library (that won't 
necessarily be loaded into every program linked with libstdc++ or with a 
computation of a square root that's not at all performance critical, for 
example); being conservative about extra interfaces in a basic system 
library does make sense.

One key difference with the -ffinite-math-only entry points is that they 
generally are just exported names for functions that already existed - the 
code was already structured to do checks for errors / exceptional values 
and then call the main function for the no-error case as needed.

-- 
Joseph S. Myers
jos...@codesourcery.com


Documenting libm function errors

2012-02-10 Thread Joseph S. Myers
[Including libc-alpha on discussion starting on the gcc list.]

On Fri, 10 Feb 2012, James Courtier-Dutton wrote:

> I think a starting point would be at least documenting correctly the
> accuracy of the current libm, because what is currently in the
> documents is obviously wrong.

To the extent that there is documentation I think some of it is 
automatically generated from the ULPs files in the glibc testsuite 
(manual/libm-err-tab.pl).  Apart from the limited information that can be 
conveyed by a single number if accuracy varies over the domain of the 
function, it's also likely that functions may have a theoretical 
justification that gives error bounds that can be relied upon more than 
the ones in the ULPs files.  Also, libm-err-tab.pl hardcodes a list of 
platforms - it would be better for each platform's name for the manual to 
go somewhere in its sysdeps directory - and it also seems unfortunate for 
the output generated to depend on whether or not the ports add-on is 
present (although I don't see a good way to avoid that with this 
approach).

> It certainly does not document that sin() is currently more accurate
> than sinl() on x86_64 platforms, even with -O0.
> I think the documentation should be along the lines of for
> Function name: sin(x)
> -pi/2 < x < pi/2, accuracy is +-N
> pi/2 < x < large number, accuracy is +-N
> large number < x < max_float, accuracy is +-N
> If a performance figure could also be added, so much the better. We
> could then know that implementation 1 is quicker than implementation 2
> and the programmer can do the trade offs.

That sort of thing would make sense - though it would be a lot of work to 
add for existing functions.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Andrew Haley
On 02/10/2012 01:30 PM, James Courtier-Dutton wrote:
> On 10 February 2012 10:42, Andrew Haley  wrote:
> 
> I think a starting point would be at least documenting correctly the
> accuracy of the current libm, because what is currently in the
> documents is obviously wrong.
> It certainly does not document that sin() is currently more accurate
> than sinl() on x86_64 platforms, even with -O0.

It isn't.

  cout << scientific << setprecision(20) << sin(0.5) << "\n";
  cout << scientific << setprecision(20) << sinl(0.5) << "\n";
  cout << Sin(Real("0.5")).toString(20,10) << "\n";

gives:

4.79425538604203005377e-01
4.79425538604203000282e-01
4.79425538604203000273e-01

The last one is from MPFR.

In most cases you'll get more accurate results from sinl().

Andrew.


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread James Courtier-Dutton
On 10 February 2012 14:05, Andrew Haley  wrote:
> On 02/10/2012 01:30 PM, James Courtier-Dutton wrote:
>> On 10 February 2012 10:42, Andrew Haley  wrote:
>>
>> I think a starting point would be at least documenting correctly the
>> accuracy of the current libm, because what is currently in the
>> documents is obviously wrong.
>> It certainly does not document that sin() is currently more accurate
>> than sinl() on x86_64 platforms, even with -O0.
>
> It isn't.
>
>  cout << scientific << setprecision(20) << sin(0.5) << "\n";
>  cout << scientific << setprecision(20) << sinl(0.5) << "\n";
>  cout << Sin(Real("0.5")).toString(20,10) << "\n";
>
> gives:
>
> 4.79425538604203005377e-01
> 4.79425538604203000282e-01
> 4.79425538604203000273e-01
>
> The last one is from MPFR.
>
> In most cases you'll get more accurate results from sinl().
>

but if you replace 0.5 with 1.0e22 ?


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Andrew Haley
On 02/10/2012 02:24 PM, James Courtier-Dutton wrote:
> On 10 February 2012 14:05, Andrew Haley  wrote:
>> On 02/10/2012 01:30 PM, James Courtier-Dutton wrote:
>>> On 10 February 2012 10:42, Andrew Haley  wrote:
>>>
>>> I think a starting point would be at least documenting correctly the
>>> accuracy of the current libm, because what is currently in the
>>> documents is obviously wrong.
>>> It certainly does not document that sin() is currently more accurate
>>> than sinl() on x86_64 platforms, even with -O0.
>>
>> It isn't.
>>
>>  cout << scientific << setprecision(20) << sin(0.5) << "\n";
>>  cout << scientific << setprecision(20) << sinl(0.5) << "\n";
>>  cout << Sin(Real("0.5")).toString(20,10) << "\n";
>>
>> gives:
>>
>> 4.79425538604203005377e-01
>> 4.79425538604203000282e-01
>> 4.79425538604203000273e-01
>>
>> The last one is from MPFR.
>>
>> In most cases you'll get more accurate results from sinl().
> 
> but if you replace 0.5 with 1.0e22 ?

Then it's less accurate.  In most cases, however, the long double
version wins.  Your statement is not true: in most reasonable cases
sinl() is currently more accurate than sin(), and it does readers of
this list a disservice to suggest otherwise.  In the case of absurdly
large arguments it's less accurate, but those cases are less likely to
be significant for real-world computation.

Andrew.


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread James Courtier-Dutton
On 10 February 2012 14:36, Andrew Haley  wrote:
> On 02/10/2012 02:24 PM, James Courtier-Dutton wrote:
>> On 10 February 2012 14:05, Andrew Haley  wrote:
>>> On 02/10/2012 01:30 PM, James Courtier-Dutton wrote:
 On 10 February 2012 10:42, Andrew Haley  wrote:

 I think a starting point would be at least documenting correctly the
 accuracy of the current libm, because what is currently in the
 documents is obviously wrong.
 It certainly does not document that sin() is currently more accurate
 than sinl() on x86_64 platforms, even with -O0.
>>>
>>> It isn't.
>>>
>>>  cout << scientific << setprecision(20) << sin(0.5) << "\n";
>>>  cout << scientific << setprecision(20) << sinl(0.5) << "\n";
>>>  cout << Sin(Real("0.5")).toString(20,10) << "\n";
>>>
>>> gives:
>>>
>>> 4.79425538604203005377e-01
>>> 4.79425538604203000282e-01
>>> 4.79425538604203000273e-01
>>>
>>> The last one is from MPFR.
>>>
>>> In most cases you'll get more accurate results from sinl().
>>
>> but if you replace 0.5 with 1.0e22 ?
>
> Then it's less accurate.  In most cases, however, the long double
> version wins.  Your statement is not true: in most reasonable cases
> sinl() is currently more accurate than sin(), and it does readers of
> this list a disservice to suggest otherwise.  In the case of absurdly
> large arguments it's less accurate, but those cases are less likely to
> be significant for real-world computation.
>

I was trying to make the point that the accuracy varies depending on
input values, and this fact should be documented.
I.e. Accurate to 1 ulp for x in the range of ...
Not just "sin(x) is accurate to 1 ulp"
You say "In most reasonable cases sinl() is currently more accurate than sin()".
I am suggesting that we specify the range of x considered to be reasonable.
The term "reasonable" seems to me to be inexact.


[pph] Merge from trunk

2012-02-10 Thread Diego Novillo


I've merged trunk rev 183972 into the pph branch.


Diego.


Re: GCC GCOV

2012-02-10 Thread David Malcolm
On Thu, 2012-02-09 at 15:52 -0800, Satya Prakash Prasad wrote:
> Hi All,
> 
> I am a new joinee to this group and a C/C++ developer for around 2
> yrs. What interest me most is gcc / gcov combination output. It list
> the code execution details.
> 
> Is there a possibility that gcc build binaries can print file
> name:line number  of the code it is executing at run time.
> Something like gcov produces after code execution but has to be in
> order of execution to benefit projects build from more than one source
> code unit. So can the gcov part be built by default and results be
> available at runtime in sequence of execution.
> 
> Does this requires modification of gcc code - any pointers would help me a 
> lot?

If I'm understanding you correctly, this seems like a nice feature to
have for people learning to program in C/C++, but an unacceptably high
speed cost for most other uses.  For example, it's nice to be able to
watch the order in which constructors and destructors get invoked when
learning C++.

A GCC plugin is probably the most appropriate way to implement this: you
could write a new pass that walks the gimple statements, and injects new
print statements to add the diagnostics that you want.

I would shamelessly promote my python plugin at this point [1] :) which
in theory would make doing this perhaps a 20-30 line python script, but
it doesn't yet support modifying gcc's internal representation, merely
reading it - but that could change.

Hope this is helpful
Dave

[1] https://fedorahosted.org/gcc-python-plugin/



Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Geert Bosch

On Feb 9, 2012, at 15:33, Joseph S. Myers wrote:
> For a few, yes, inline support (such as already exists for some functions 
> on some targets) makes sense.  But for some more complicated cases it 
> seems plausible that LTO information in a library might be an appropriate 
> way of inlining while allowing the functions to be written directly in C.  
> (Or they could be inline in a header provided with GCC, but that would 
> only help for use in C and C++ code.)
But neither LTO nor header-based inlining would be compatible with LGPL, right?

>> In particular, it would not address the issue of performance. For 
>> that we need at least a permissive license to allow full inlining, 
>> but it seems unlikely to me that we'd be able to get glibc code 
>> under those conditions any time soon.
> 
> Indeed, if we can't then other sources would need using for the functions.  
> But it seems worth explaining the issues to RMS - that is, that there is a 
> need for permissively licensed versions of these functions in GNU, the 
> question is just whether new ones are written or taken from elsewhere or 
> whether glibc versions can be used for some functions.

I'm skeptical that talking to RMS will result in relicensing of these glibc
functions. If we're writing new ones (or incorporate libraries such as fdlibm),
we don't need permission.

>> I'd rather start collecting suitable code in GCC now. If/when a 
> 
> Whereas I'd suggest: start collecting in a separate project and have GCC 
> importing from that project from the start.
That seems mostly like a matter of terminology. We can declare that
the new libm project currently lives in gcc/libm, has a webpage at
gcc.gnu.org/wiki/LibM and uses the GCC mailinglists with [libm] in
the subject. It's fine with me either way.

>> libm project materializes, it can take our freely licensed code 
>> and integrate it. I don't see why we need to wait. 
> 
> I don't think of it as waiting - I think of it as collecting the code now, 
> but in an appropriate place, and written in an appropriate way (using 
> types such as binary32/binary64/binary128 rather than particular names 
> such as float/double/long double, recording assumptions such as whether 
> the code will work on x87 with excess precision and what it needs for 
> rounding modes / exceptions / subnormals support).  Aim initially for GCC 
> use, but try to put things in a more generally usable form.

Agreed, we probably should do use such types regardless of where
the library resides. Note that for some functions we may want to
have dozens of implementations, or rather a few implementations
that can be customized in myriad ways. In all cases I would expect
we would provide ways of interfacing to be at source level, or
a precompiled static library with LTO bytecode, in addition to 
libraries using the existing libm interface.

>> I think nowadays it makes as little sense to have sqrt, sin, cos,
>> exp and ln and other common simple elementary functions in an
>> external library, as it would be for multiplication and division to
>> be there. Often we'll want to generate these functions inline taking
> 
> I've previously suggested GNU libm (rather than glibc) as a natural master 
> home for the soft-fp code that does provide multiplication and division 
> for processors without them in hardware
> 
> (Different strategies would obviously be optimal for many functions on 
> soft-float targets, although I doubt there's much demand for implementing 
> them since if you care much for floating-point performance you'll be using 
> a processor with hardware floating point.)


The same reasoning goes here: it would be best if we have a flexible
interface with the compiler, so we can for example have entry points
that accept and return soft-floats in "unpacked" form. Wether performance
of soft-float is important or not is debatable.

  -Geert


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Geert Bosch

On Feb 10, 2012, at 05:07, Richard Guenther wrote:

> On Thu, Feb 9, 2012 at 8:16 PM, Geert Bosch  wrote:
>> I don't agree having such a libm is the ultimate goal. It could be
>> a first step along the way, addressing correctness issues. This
>> would be great progress, but does not remove the need for having
>> at least versions of common elementary functions directly integrated
>> with GCC.
>> 
>> In particular, it would not address the issue of performance. For
>> that we need at least a permissive license to allow full inlining,
>> but it seems unlikely to me that we'd be able to get glibc code
>> under those conditions any time soon.
> 
> I don't buy the argument that inlining math routines (apart from those
> we already handle) would improve performance.  What will improve
> performance is to have separate entry points to the routines
> to skip errno handling, NaN/Inf checking or rounding mode selection
> when certain compilation flags are set.  That as well as a more
> sane calling convention for, for example sincos, or in general
> on x86_64 (have at least _some_ callee-saved XMM registers).

I'm probably oversimplifying a bit, but I see extra entry points as
being similar to inlining. When you write sin (x), this is a function 
not only of x, but also of the implicit rounding mode and special 
checking options. With that view specializing the sin function for 
round-to-even is essentially a form of partial inlining.

Also, evaluation of a single math function typically has high
latency as most instructions depend on results of previous instructions.
For optimal throughput, you'd really like to be able to schedule
these instructions. Anyway, those are implementation details.

The main point is that we don't want to be bound by a
frozen interface with the math routines. We'll want to change
the interface when we have new functionality or optimizations.

> The issue with libm in glibc here is that Drepper absolutely does
> not want new ABIs in libm - he believes that for example vectorized
> routines do not belong there (nor the SSE calling-convention variants
> for i686 I tried to push once).

Right. I even understand where he is coming from. Adding new interfaces
is indeed a big deal as they'll pretty much have to stay around forever.
We need something more flexible that is tied to the compiler and not
a fixed interface that stays constant over time. Even if we'd add things
to glibc now, it takes many years for that to trickle down. Of course,
a large number of GCC uses doesn't use glibc at all.

>> I'd rather start collecting suitable code in GCC now. If/when a
>> libm project materializes, it can take our freely licensed code
>> and integrate it. I don't see why we need to wait.
>> Still, licensing is not the only issue for keeping this in GCC.
> 
> True.  I think we can ignore glibc and rather think as of that newlib
> might use it if we're going to put it in toplevel src.

Agreed.

>> So we need both an accurate libm, and routines with a permissive
>> license for integration with GCC.
> 
> And a license, that for example allows installing a static library variant
> with LTO bytecode embedded so we indeed _can_ inline and re-optimize it.
> Of course I expect the fastest paths to be architecture specific assembly
> anyway ...

Yes, that seems like a good plan. Now, how do we start this?
As a separate project or just as part of GCC?

  -Geert



Combine misses commutativity

2012-02-10 Thread Paulo J. Matos
Hi,

I just noticed something strange with my iorqi3 rule. 
I have the following:

(define_insn "iorqi3"
  [(set (match_operand:QI 0 "register_operand" "=c")
(ior:QI (match_operand:QI 1 "register_operand" "%0")
(match_operand:QI 2 "general_operand" "cwmi")))
   (clobber (reg:CC RCC))]
  ""
  "or\\t%0,%f2")


However, there's a failure to combine looking like:
(parallel [
(set (reg:QI 1 AL)
(ior:QI (mem/c/i:QI (reg/f:QI 4 AP) [2 y+0 S1 A16])
(reg:QI 30 [ x+1 ])))
(clobber (reg:CC 13 CC))
])

I am pretty sure that this combine should be successful. It's just a 
matter of inverting the operands. This is proved by the fact that if I 
instead of the above, I have:
(define_insn "iorqi3"
  [(set (match_operand:QI 0 "register_operand" "=c")
(ior:QI (match_operand:QI 1 "register_operand" "0")
(match_operand:QI 2 "general_operand" "cwmi")))
   (clobber (reg:CC RCC))]
  ""
  "or\\t%0,%f2")

(define_insn "*iorqi3_inv"
  [(set (match_operand:QI 0 "register_operand" "=c")
(ior:QI (match_operand:QI 1 "general_operand" "cwmi")
(match_operand:QI 2 "register_operand" "0")))
   (clobber (reg:CC RCC))]
  ""
  "or\\t%0,%f1")

combine reports success:
Successfully matched this instruction:
(parallel [
(set (reg:QI 1 AL)
(ior:QI (mem/c/i:QI (reg/f:QI 4 AP) [2 y+0 S1 A16])
(reg:QI 30 [ x+1 ])))
(clobber (reg:CC 13 CC))

However, duplicating the instructions and inverting operand order seems 
to defeat the purpose of '%'. So, what's the catch? Or is it a genuine 
bug?

Cheers,

-- 
PMatos



Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Joseph S. Myers
On Fri, 10 Feb 2012, Geert Bosch wrote:

> On Feb 9, 2012, at 15:33, Joseph S. Myers wrote:
> > For a few, yes, inline support (such as already exists for some functions 
> > on some targets) makes sense.  But for some more complicated cases it 
> > seems plausible that LTO information in a library might be an appropriate 
> > way of inlining while allowing the functions to be written directly in C.  
> > (Or they could be inline in a header provided with GCC, but that would 
> > only help for use in C and C++ code.)
> But neither LTO nor header-based inlining would be compatible with LGPL, 
> right?

Indeed.  Either the LGPL+exception used in soft-fp, or a generic 
permissive license, would be more appropriate there.  (GPL+exception as 
for most of libgcc should also work, though I don't think there's any of 
that in glibc at present.)

> > Indeed, if we can't then other sources would need using for the functions.  
> > But it seems worth explaining the issues to RMS - that is, that there is a 
> > need for permissively licensed versions of these functions in GNU, the 
> > question is just whether new ones are written or taken from elsewhere or 
> > whether glibc versions can be used for some functions.
> 
> I'm skeptical that talking to RMS will result in relicensing of these glibc
> functions. If we're writing new ones (or incorporate libraries such as 
> fdlibm),
> we don't need permission.

Relicensing of soft-fp to LGPL+exception was approved by RMS.

There are certainly some justifications for a relicensing that might not 
be sufficient - that a change would cause some software to be more widely 
used, for example, may not be a justification by itself - but the 
justification here is more in terms of the GNU system working better 
together as a whole, that permissively licensed versions of many of the 
functions already exist, and especially if as you suggest you don't want 
to maintain a stable ABI to the functions called from GCC (so many would 
only be linked in statically) permissive licensing is needed to meet the 
expectations people have for use of code built with GCC.  You may of 
course get approval for specific concrete functions rather than more 
generally, if you identify particular functions where the glibc 
implementations are of interest - and there may be functions not owned by 
the FSF, or where the assignments to the FSF do not permit a more 
permissive relicensing.  There may also be functions assigned to the FSF 
by a few big contributors who could grant permissive licenses themselves 
if they wish (where there haven't been significant changes to those 
functions after the contribution).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Combine misses commutativity

2012-02-10 Thread Paulo J. Matos


On Fri, 10 Feb 2012 16:57:48 +, Paulo J. Matos wrote:
 
> However, duplicating the instructions and inverting operand order seems
> to defeat the purpose of '%'. So, what's the catch? Or is it a genuine
> bug?

I just understood my miss understanding above. '%' is part of constraints 
which as far as I know is only taken into account by the ira/reload 
stuff, etc. Combine doesn't look at it and therefore misses this 
information.

Now, is there any way to hint at combine that ior is commutative without 
having more permissive predicates?

-- 
PMatos



Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Paweł Sikora
On Friday 10 of February 2012 13:30:25 James Courtier-Dutton wrote:
> On 10 February 2012 10:42, Andrew Haley  wrote:
> > On 02/10/2012 10:07 AM, Richard Guenther wrote:
> >>
> >> The issue with libm in glibc here is that Drepper absolutely does
> >> not want new ABIs in libm - he believes that for example vectorized
> >> routines do not belong there (nor the SSE calling-convention variants
> >> for i686 I tried to push once).
> >
> > That's a good reason why we can't do this in glibc.  I accept
> > the point; it'll have to be GNU libm.
> >
> > Andrew.
> 
> I think a starting point would be at least documenting correctly the
> accuracy of the current libm, because what is currently in the
> documents is obviously wrong.
> It certainly does not document that sin() is currently more accurate
> than sinl() on x86_64 platforms, even with -O0.
> I think the documentation should be along the lines of for
> Function name: sin(x)
> -pi/2 < x < pi/2, accuracy is +-N
> pi/2 < x < large number, accuracy is +-N
> large number < x < max_float, accuracy is +-N
> If a performance figure could also be added, so much the better. We
> could then know that implementation 1 is quicker than implementation 2
> and the programmer can do the trade offs.
> 
> And have the figures repeated for various options such as -ffast-math
> etc, and also repeated for different CPUs etc.

it would be also nice to see functions for reducing argument range in public 
api.
finally the end-user can use e.g. sin(reduce(x)) to get the best precision
with some declared cpu overhead.

> Once we have the documentation, we can then ensure, using automated
> testing, that the libm actually performs as designed.



Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Joseph S. Myers
On Fri, 10 Feb 2012, Geert Bosch wrote:

> Right. I even understand where he is coming from. Adding new interfaces
> is indeed a big deal as they'll pretty much have to stay around forever.

And: even if the interface is a known, public, standard, stable interface, 
glibc may still not be the right place.

Consider the ISO 24747 special mathematical functions.  This is an 
extension to ISO C (there's a similar extension to C++ with the functions 
that were in TR1).  No doubt implementations would be useful to some 
people, though I haven't heard of any interest in implementing them for 
the GNU system, and a good implementation (care about error bounds and 
performance etc.) would be a lot of work.  (There is an implementation of 
the TR1 versions in libstdc++.  It doesn't have anything much in the way 
of error or performance bounds - it's a generic implementation without 
precomputed tables, that can involve very long loops calling other 
functions with such long loops in some cases, and I expect various 
precomputed tables would be involved in getting particularly good error 
bounds and performance - but I think simply implementing the functions at 
all involved a lot of work.)

But if you implemented the functions in libm in a fast, accurate way, then 
suddenly all the code is loaded into everything that uses shared libm, and 
hardly any such programs actually need these functions - cf. the argument 
for avoiding decimal floating-point support in shared libc or libgcc when 
hardly any typical GNU/Linux programs are using that.  As a well-defined 
set of functions a separate shared library probably makes more sense.  
And given that nothing else in glibc depends on them, there's no need for 
such a library to be part of the glibc project, which may as well keep a 
narrower focus.  (Bits of libc *do* depend on a few bits of functionality 
from libm, which get built into libc as well; some things such as rounding 
mode support and separating numbers into mantissa and exponent are 
relevant for libc functionality such as printf and strtod.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Andrew Haley
On 02/10/2012 05:31 PM, Paweł Sikora wrote:
> it would be also nice to see functions for reducing argument range in public 
> api.
> finally the end-user can use e.g. sin(reduce(x)) to get the best precision
> with some declared cpu overhead.

Hmm.  I'm not sure this is such a terrific idea: each function has its
own argument reduction (e.g. to [0, pi/4], [0, pi/8] or even [0, some
weird constant that looks arbitrary but just happens to be exactly
representable in binary]).  You really don't want double rounding,
which is what would happen with sin(reduce(x)).

Andrew.


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Paweł Sikora
On Friday 10 of February 2012 17:41:49 Andrew Haley wrote:
> On 02/10/2012 05:31 PM, Paweł Sikora wrote:
> > it would be also nice to see functions for reducing argument range in 
> > public api.
> > finally the end-user can use e.g. sin(reduce(x)) to get the best precision
> > with some declared cpu overhead.
> 
> Hmm.  I'm not sure this is such a terrific idea: each function has its
> own argument reduction (e.g. to [0, pi/4], [0, pi/8] or even [0, some
> weird constant that looks arbitrary but just happens to be exactly
> representable in binary]).  You really don't want double rounding,
> which is what would happen with sin(reduce(x)).

hmm, maybe the math functions should contain an additioanal
'bool reduce' argument with some default (./configure-ed) value?
with such approach the end-user can avoid double rounding
and choose desired behaviour.



Re: Building gcc on Ubuntu 11.10

2012-02-10 Thread Russ Allbery
Andreas Schwab  writes:
> Russ Allbery  writes:

>> For example, suppose I'm doing development on an amd64 box targeting
>> armel and I want to use Kerberos libraries in my armel application.
>> I'd like to be able to install the armel Kerberos libraries on my
>> Debian system using regular package management commands, just like any
>> other package.

> Just add a --sysroot option to the packager (that also transparently
> translates symlinks), case closed.

While this addresses the cross development case, it doesn't address the
multiple native ABI case.  It's elegant to be able to use the same
solution to address multiple problems, particularly since there are other
limitations with --sysroot.  For example, I'd like apt-get upgrade, my
pinning, my repository preferences, and so forth to apply to *all* of my
installed packages rather than having to duplicate that setup work inside
various alternative package roots.  Which you can do with --sysroot, of
course, by adding more complexity to the packaging system and having it
track all the --sysroots that you've used, but with multiarch you get
those properties for free.

Anyway, I'll stop discussing this here, as it's not really on topic.  I
just wanted to provide some background, since I realize on the surface
it's a somewhat puzzling decision.

-- 
Russ Allbery (r...@stanford.edu) 


Re: Combine misses commutativity

2012-02-10 Thread Richard Henderson
On 02/10/2012 08:57 AM, Paulo J. Matos wrote:
> However, there's a failure to combine looking like:
> (parallel [
> (set (reg:QI 1 AL)
> (ior:QI (mem/c/i:QI (reg/f:QI 4 AP) [2 y+0 S1 A16])
> (reg:QI 30 [ x+1 ])))
> (clobber (reg:CC 13 CC))
> ])

Why do you think that combine should create this?

Really, I'd have expected swap_commutative_operands_p to have
put things around the other way.  That's the way that almost
all cisc machines represent their memory operands.

... Although I don't see any specific test for this in the
code for commutative_operand_precedence.  That's probably at
least a think-o.


r~


Re: Memory corruption due to word sharing

2012-02-10 Thread Richard Henderson
On 02/03/2012 12:00 PM, Linus Torvalds wrote:
>do {
>  load-link %r,%m
>  if (r == value)
>  return 0;
>  add
>} while (store-conditional %r,%m)
>return 1;
> 
> and it is used to implement two *very* common (and critical)
> reference-counting use cases:
> 
>  - decrement ref-count locklessly, and if it hits free, take a lock
> atomically (ie "it would be a bug for anybody to ever see it decrement
> to zero while holding the lock")
> 
>  - increment ref-count locklessly, but if we race with the final free,
> don't touch it, because it's gone (ie "if somebody decremented it to
> zero while holding the lock, we absolutely cannot increment it again")
> 
> They may sound odd, but those two operations are absolutely critical
> for most lock-less refcount management. And taking locks for reference
> counting is absolutely death to performance, and is often impossible
> anyway (the lock may be a local lock that is *inside* the structure
> that is being reference-counted, so if the refcount is zero, then you
> cannot even look at the lock!)
> 
> In the above example, the load-locked -> store-conditional would
> obviously be a cmpxchg loop instead on architectures that do cmpxchg
> instead of ll/sc.
> 
> Of course, it you expose some intrinsic for the whole "ll/sc" model
> (and you then turn it into cmpxchg on demand), we could literally
> open-code it.

We can't expose the ll/sc model directly, due to various target-specific
hardware constraints (e.g. no other memory references, i.e. no spills).
But the "weak" compare-exchange is sorta-kinda close.

A "weak" compare-exchange is a ll/sc pair, without the loop for restart
on sc failure.  Thus a weak compare-exchange can fail for arbitrary reasons.
You do, however, get back the current value in the memory, so you can loop
back yourself and try again.

So that loop above becomes the familiar expression as for CAS:

  r = *m;
  do {
if (r == value)
  return;
n = r + inc;
  } while (!atomic_compare_exchange(m, &r, n, /*weak=*/true,
__ATOMIC_RELAXED, __ATOMIC_RELAXED));

which, unlike with the current __sync_val_compare_and_swap, does not
result in a nested loop for the ll/sc architectures.

Yes, the ll/sc architectures can technically do slightly better by writing
this as inline asm, but the gain is significantly less than before.  Given
that the line must be in L1 cache already, the redundant load from M in
the ll insn should be nearly negligible.

Of course for architectures that implement cas, there is no difference
between "weak" and "strong".


r~


Re: GCC GCOV

2012-02-10 Thread Satya Prakash Prasad
Thanks for the info Dave. I downloaded the tar ball but facing issues
while building it:

prompt:>~/shared_scripts/bin/gcc-py-plugin/gcc-python-plugin-6f960cf 1020> make
python generate-config-h.py -o autogenerated-config.h --gcc=gcc
Traceback (most recent call last):
  File "generate-config-h.py", line 21, in 
from configbuilder import ConfigBuilder
  File 
"/x/home/satprasad/shared_scripts/bin/gcc-py-plugin/gcc-python-plugin-6f960cf/configbuilder.py",
line 20, in 
from subprocess import Popen, PIPE, check_output
ImportError: cannot import name check_output
make: *** [autogenerated-config.h] Error 1
hyper66:~/shared_scripts/bin/gcc-py-plugin/gcc-python-plugin-6f960cf 1021>

I am not that familiar with Python - can you please help me out?

Also please let me know some references on how to write a new PASS
within GCC codebase?

Regards,
Prakash


On Fri, Feb 10, 2012 at 7:58 AM, David Malcolm  wrote:
> On Thu, 2012-02-09 at 15:52 -0800, Satya Prakash Prasad wrote:
>> Hi All,
>>
>> I am a new joinee to this group and a C/C++ developer for around 2
>> yrs. What interest me most is gcc / gcov combination output. It list
>> the code execution details.
>>
>> Is there a possibility that gcc build binaries can print file
>> name:line number  of the code it is executing at run time.
>> Something like gcov produces after code execution but has to be in
>> order of execution to benefit projects build from more than one source
>> code unit. So can the gcov part be built by default and results be
>> available at runtime in sequence of execution.
>>
>> Does this requires modification of gcc code - any pointers would help me a 
>> lot?
>
> If I'm understanding you correctly, this seems like a nice feature to
> have for people learning to program in C/C++, but an unacceptably high
> speed cost for most other uses.  For example, it's nice to be able to
> watch the order in which constructors and destructors get invoked when
> learning C++.
>
> A GCC plugin is probably the most appropriate way to implement this: you
> could write a new pass that walks the gimple statements, and injects new
> print statements to add the diagnostics that you want.
>
> I would shamelessly promote my python plugin at this point [1] :) which
> in theory would make doing this perhaps a 20-30 line python script, but
> it doesn't yet support modifying gcc's internal representation, merely
> reading it - but that could change.
>
> Hope this is helpful
> Dave
>
> [1] https://fedorahosted.org/gcc-python-plugin/
>


Re: Building gcc on Ubuntu 11.10

2012-02-10 Thread Toon Moene

On 02/10/2012 07:02 PM, Russ Allbery wrote:


Anyway, I'll stop discussing this here, as it's not really on topic.  I
just wanted to provide some background, since I realize on the surface
it's a somewhat puzzling decision.


Thanks for the explanation.  Is there a rationale document (and a design 
document that explains what we have to expect from this change) 
somewhere on the Debian web site ?


I couldn't find it, but perhaps I didn't search it right.

If this is such an obvious solution to the problems you mention, one 
would assume that other distributors would clamor for it, too.


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: GCC GCOV

2012-02-10 Thread David Malcolm
On Fri, 2012-02-10 at 12:14 -0800, Satya Prakash Prasad wrote:
> Thanks for the info Dave. I downloaded the tar ball but facing issues
> while building it:

This is probably more appropriate for the plugin's mailing list:
  https://fedorahosted.org/mailman/listinfo/gcc-python-plugin
rather than the general gcc list...

> ImportError: cannot import name check_output

...but to answer the question, it looks like I accidentally added a
python 2.7 requirement here [subprocess.check_output was added in 2.7]


> I am not that familiar with Python - can you please help me out?
If you're not particularly familiar with Python, then it may be easier
to write the pass directly in C.

> Also please let me know some references on how to write a new PASS
> within GCC codebase?
If you're new, it's probably easiest to do it within a plugin, see:
  http://gcc.gnu.org/wiki/plugins
  http://gcc.gnu.org/onlinedocs/gccint/Plugins.html

LWN had an excellent tutorial on creating a new gcc pass via a plugin
here: https://lwn.net/Articles/457543/   [1]


Hope this is helpful
Dave

[1] fwiw I ported the example from the tutorial to Python here:
http://gcc-python-plugin.readthedocs.org/en/latest/working-with-c.html#spell-checking-string-constants-within-source-code



Re: Building gcc on Ubuntu 11.10

2012-02-10 Thread Russ Allbery
Toon Moene  writes:

> Thanks for the explanation.  Is there a rationale document (and a design
> document that explains what we have to expect from this change)
> somewhere on the Debian web site ?

> I couldn't find it, but perhaps I didn't search it right.

The documentation that I'm aware of is at:

http://wiki.debian.org/Multiarch

The first link there is a more comprehensive statement on the discussion
that we just had.

> If this is such an obvious solution to the problems you mention, one
> would assume that other distributors would clamor for it, too.

Well, the approach that Debian (and Ubuntu) chose is a significant amount
of very disruptive work (as you're all noticing!).  It's considerably more
disruptive in some ways than the lib32/lib64 solution, particularly if one
takes into account the other things that weren't discussed in this thread,
such as overlap in package contents between packages for two different
architectures.

These sorts of designs are always tradeoffs between the level of
disruption and the level of long-term benefit, and I can certainly
understand other people making different decisions.  Debian and Ubuntu are
pursuing a direction that we think is more comprehensive and will provide
a lot of long-term benefit, but I think it's fair to say that the jury is
still out on whether that was the right tradeoff to take.

-- 
Russ Allbery (r...@stanford.edu) 


Re: weird optimization in sin+cos, x86 backend

2012-02-10 Thread Jakub Jelinek
On Thu, Feb 09, 2012 at 04:59:55PM +0100, Richard Guenther wrote:
> On Thu, Feb 9, 2012 at 4:57 PM, Andrew Haley  wrote:
> > On 02/09/2012 03:56 PM, Michael Matz wrote:
> >> On Thu, 9 Feb 2012, Andrew Haley wrote:
> >>
> >>> On 02/09/2012 03:28 PM, Richard Guenther wrote:
>  So - do you have an idea what routines we can start off with to get
>  a full C99 set of routines for float, double and long double?  The last
>  time I was exploring the idea again I was looking at the BSD libm.
> >>>
> >>> I'd start with INRIA's crlibm.
> >>
> >> Not complete.  Too slow.
> >
> > Not complete, fair enough.  But it's the fastest accurate one,
> > isn't it?
> 
> Maybe.  Nothing would prevent us from composing from multiple sources
> of course.  crlibm also only provides double precision routines.

We don't want just a single libm, we need several different ones,
one accurrate (crlibm/IBM ulp based for doubles, unclear what to base it
on for the rest, if we don't have something 0.5ulp precise for
float and long double (__float80/__float128/IBM double double formats),
at least something as precise as possible, perhaps glibc sources), one
or more faster/less precise variants, ideally ABI compatible with glibc
and then vectorized variants.
If it can't live in GCC tree, it could be a separate project, which then
certainly could use glibc/crlibm/AMD vector library etc. sources.

Jakub


gcc-4.6-20120210 is now available

2012-02-10 Thread gccadmin
Snapshot gcc-4.6-20120210 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20120210/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 184114

You'll find:

 gcc-4.6-20120210.tar.bz2 Complete GCC

  MD5=a6d68d358be617f775a3f7067b4b68d0
  SHA1=1f7b9f0a7bc7d6bef174fd375f364b8777aadd58

Diffs from 4.6-20120203 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.