Re: Memory corruption due to word sharing

2012-02-03 Thread Richard Guenther
On Fri, 3 Feb 2012, DJ Delorie wrote:

> 
> Jan Kara  writes:
> >   we've spotted the following mismatch between what kernel folks expect
> > from a compiler and what GCC really does, resulting in memory corruption on
> > some architectures. Consider the following structure:
> > struct x {
> > long a;
> > unsigned int b1;
> > unsigned int b2:1;
> > };
> 
> If this structure were volatile, you could try
> -fstrict-volatile-bitfields, which forces GCC to use the C type to
> define the access width, instead of doing whatever it thinks is optimal.
> 
> Note: that flag is enabled by default for some targets already, most
> notably ARM.

Note that -fstrict-volatile-bitfields does not work for

volatile struct S {
int i : 1;
char c;
} s;
int main()
{
  s.i = 1;
  s.c = 2;
}

where it accesses s.i using SImode.  -fstrict-volatile-bitfields
falls foul of all the games bitfield layout plays and the
irrelevantness of the declared bitfield type (but maybe the
ARM ABI exactly specifies it that way).

So no, I would not recommend -fstrict-volatile-bitfields.

Richard.


Re: Memory corruption due to word sharing

2012-02-03 Thread Matthew Gretton-Dann
On Fri, Feb 03, 2012 at 09:37:22AM +, Richard Guenther wrote:
> On Fri, 3 Feb 2012, DJ Delorie wrote:
> 
> > 
> > Jan Kara  writes:
> > >   we've spotted the following mismatch between what kernel folks expect
> > > from a compiler and what GCC really does, resulting in memory corruption 
> > > on
> > > some architectures. Consider the following structure:
> > > struct x {
> > > long a;
> > > unsigned int b1;
> > > unsigned int b2:1;
> > > };
> > 
> > If this structure were volatile, you could try
> > -fstrict-volatile-bitfields, which forces GCC to use the C type to
> > define the access width, instead of doing whatever it thinks is optimal.
> > 
> > Note: that flag is enabled by default for some targets already, most
> > notably ARM.
> 
> Note that -fstrict-volatile-bitfields does not work for
> 
> volatile struct S {
> int i : 1;
> char c;
> } s;
> int main()
> {
>   s.i = 1;
>   s.c = 2;
> }
> 
> where it accesses s.i using SImode.  -fstrict-volatile-bitfields
> falls foul of all the games bitfield layout plays and the
> irrelevantness of the declared bitfield type (but maybe the
> ARM ABI exactly specifies it that way).

Indeed the ARM ABI does - see Section 7.1.7.5 of the PCS available at:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042-/

In fact the example above is pretty much the same as that given in the ABI
docs, and it says that accessing s.i will also cause an access
to s.c, but not vice-versa.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Principal Engineer, PD Software, ARM Ltd.



Re: Assignment to volatile objects

2012-02-03 Thread Vincent Lefevre
On 2012-01-31 09:16:09 +0100, David Brown wrote:
> For normal variables, "a = b = 0" is just ugly - but that is a
> matter of opinion.

and it is handled badly by GCC:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52106

(this is just a missing warning... Still, inconsistency is bad.)

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


weird optimization in sin+cos, x86 backend

2012-02-03 Thread Konstantin Vladimirov
Hi,

Consider minimal reproduction code:

#include "math.h"
#include "stdio.h"

double __attribute__ ((noinline))
slip(double a)
{
  return (cos(a) + sin(a));
}

int main(void)
{
  double a = 4.47460300787e+182;
  double slipped = slip(a);
  printf("slipped = %lf\n", slipped);
  return 0;
}

Compiling on
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3

First on O0:

gcc -o sincos.64 sincos.c -O0 -lm
./sincos.64
slipped = -1.141385

That is correct.

Next on O1:

gcc -o sincos.64 sincos.c -O1 -lm
./sincos.64
slipped = -0.432436

That is obviously incorrect.

Lets dive inside: on O0:

slip:
.LFB0:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  movq  %rsp, %rbp
  .cfi_offset 6, -16
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movsd %xmm0, -8(%rbp)
  movsd -8(%rbp), %xmm0
  call  cos
  movsd %xmm0, -16(%rbp)
  movsd -8(%rbp), %xmm0
  call  sin
  addsd -16(%rbp), %xmm0
  leave
  ret

we have separate sin and cos calls, and everything works fine.

On O1:

slip:
.LFB25:
  .cfi_startproc
  subq  $24, %rsp
  .cfi_def_cfa_offset 32
  leaq  8(%rsp), %rdi
  movq  %rsp, %rsi
  call  sincos
  movsd 8(%rsp), %xmm0
  addsd (%rsp), %xmm0
  addq  $24, %rsp
  ret

Here we have one sincos call, and it works wrong.

Why gcc performs such buggy optimization, and may I switch it off
somehow? Or may be I don't understand something and problem is in my
code?

---
With best regards, Konstantin


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Richard Guenther
On Fri, Feb 3, 2012 at 2:26 PM, Konstantin Vladimirov
 wrote:
> Hi,
>
> Consider minimal reproduction code:
>
> #include "math.h"
> #include "stdio.h"
>
> double __attribute__ ((noinline))
> slip(double a)
> {
>  return (cos(a) + sin(a));
> }
>
> int main(void)
> {
>  double a = 4.47460300787e+182;
>  double slipped = slip(a);
>  printf("slipped = %lf\n", slipped);
>  return 0;
> }
>
> Compiling on
> gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
>
> First on O0:
>
> gcc -o sincos.64 sincos.c -O0 -lm
> ./sincos.64
> slipped = -1.141385
>
> That is correct.
>
> Next on O1:
>
> gcc -o sincos.64 sincos.c -O1 -lm
> ./sincos.64
> slipped = -0.432436
>
> That is obviously incorrect.
>
> Lets dive inside: on O0:
>
> slip:
> .LFB0:
>  .cfi_startproc
>  pushq %rbp
>  .cfi_def_cfa_offset 16
>  movq  %rsp, %rbp
>  .cfi_offset 6, -16
>  .cfi_def_cfa_register 6
>  subq  $16, %rsp
>  movsd %xmm0, -8(%rbp)
>  movsd -8(%rbp), %xmm0
>  call  cos
>  movsd %xmm0, -16(%rbp)
>  movsd -8(%rbp), %xmm0
>  call  sin
>  addsd -16(%rbp), %xmm0
>  leave
>  ret
>
> we have separate sin and cos calls, and everything works fine.
>
> On O1:
>
> slip:
> .LFB25:
>  .cfi_startproc
>  subq  $24, %rsp
>  .cfi_def_cfa_offset 32
>  leaq  8(%rsp), %rdi
>  movq  %rsp, %rsi
>  call  sincos
>  movsd 8(%rsp), %xmm0
>  addsd (%rsp), %xmm0
>  addq  $24, %rsp
>  ret
>
> Here we have one sincos call, and it works wrong.
>
> Why gcc performs such buggy optimization, and may I switch it off
> somehow? Or may be I don't understand something and problem is in my
> code?

Your math library is broken then (you can verify this yourself by
using sincos).  You can use -fno-builtin-sincos to disable this
optimization.

Richard.

> ---
> With best regards, Konstantin


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Michael Matz
Hi,

On Fri, 3 Feb 2012, Richard Guenther wrote:

> > int main(void)
> > {
> >  double a = 4.47460300787e+182;

> > slipped = -1.141385
> > That is correct.
> >
> > slipped = -0.432436
> > That is obviously incorrect.

How did you determine that one is correct and the other obviously 
incorrect?  Note how the input number is much too large for the usual 
implementations of the trigonometric functions to return any sensible 
result.  Even arbitrary precision calculators return different numbers (I 
tried some).

> > Here we have one sincos call, and it works wrong.
> >
> > Why gcc performs such buggy optimization, and may I switch it off
> > somehow? Or may be I don't understand something and problem is in my
> > code?
> 
> Your math library is broken then (you can verify this yourself by
> using sincos).  You can use -fno-builtin-sincos to disable this
> optimization.

No normal math library supports such an extreme range, even basic 
identities (like cos^2+sin^2=1) aren't retained with such inputs.


Ciao,
Michael.

Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Robert Dewar

On 2/3/2012 10:01 AM, Michael Matz wrote:


No normal math library supports such an extreme range, even basic
identities (like cos^2+sin^2=1) aren't retained with such inputs.


I agree: the program is complete nonsense. It would be useful to know
what the intent was.



Ciao,
Michael.




Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 10:13:58 -0500, Robert Dewar wrote:
> On 2/3/2012 10:01 AM, Michael Matz wrote:
> >No normal math library supports such an extreme range, even basic
> >identities (like cos^2+sin^2=1) aren't retained with such inputs.
> 
> I agree: the program is complete nonsense.

I disagree: there may be cases where large inputs can be meaningful.
And it may be important that some identities (like cos^2+sin^2=1) be
preserved.

> It would be useful to know what the intent was.

If the user requested such a computation, there should at least be
some intent. Unless an option like -ffast-math is given, the result
should be accurate.

For the glibc, I've finally reported a bug here:

  http://sourceware.org/bugzilla/show_bug.cgi?id=13658

Note that there have been other (somewhat different) complaints in
the past, e.g.

  http://sourceware.org/bugzilla/show_bug.cgi?id=13381

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Robert Dewar

On 2/3/2012 10:28 AM, Vincent Lefevre wrote:

On 2012-02-03 10:13:58 -0500, Robert Dewar wrote:

On 2/3/2012 10:01 AM, Michael Matz wrote:

No normal math library supports such an extreme range, even basic
identities (like cos^2+sin^2=1) aren't retained with such inputs.


I agree: the program is complete nonsense.


I disagree: there may be cases where large inputs can be meaningful.
And it may be important that some identities (like cos^2+sin^2=1) be
preserved.


It would be useful to know what the intent was.


If the user requested such a computation, there should at least be
some intent. Unless an option like -ffast-math is given, the result
should be accurate.


What is the basis for that claim? to me it seems useless to expect
anything from such absurd arguments. Can you site a requirement to
the contrary (other than your (to me) unrealistic expectations).
In particular, such large numbers are of course represented
imprecisely. Given that, i do not see a possible useful intent,
obviously a 1 bit rounding error makes a gigantic difference to
the final result here.


For the glibc, I've finally reported a bug here:

   http://sourceware.org/bugzilla/show_bug.cgi?id=13658

Note that there have been other (somewhat different) complaints in
the past, e.g.

   http://sourceware.org/bugzilla/show_bug.cgi?id=13381





Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 10:33:58 -0500, Robert Dewar wrote:
> On 2/3/2012 10:28 AM, Vincent Lefevre wrote:
> >If the user requested such a computation, there should at least be
> >some intent. Unless an option like -ffast-math is given, the result
> >should be accurate.
> 
> What is the basis for that claim? to me it seems useless to expect
> anything from such absurd arguments. Can you site a requirement to
> the contrary (other than your (to me) unrealistic expectations).
> In particular, such large numbers are of course represented
> imprecisely.

Actually you don't know. Of course, the value probably comes from
somewhere, where it is imprecise. But there are codes that assume
that their input values should be regarded as exact or they will
no longer work. Reasons can be that algorithms are designed in such
a way and/or that consistency is important. A particular field is
computational geometry. For instance, you have a point and a line
given by their coordinates, which are in general imprecise.
Nevertheless, one generally wants to consider that the point is
always seen as being on one fixed side of the line (or exactly on
the line). If some parts of the program, because they do not compute
with high precision enough, behave as if the point were on some side
and other parts behave as if the point were on the other side, this
can yield important problems.

Another property that one may want is "correct rounding", mainly
for reproducibility. For instance, this was needed by the LHC@home
project of CERN (to check results performed on different machines,
IIRC), even though the results were complete chaos.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Michael Matz
Hi,

On Fri, 3 Feb 2012, Vincent Lefevre wrote:

> > >No normal math library supports such an extreme range, even basic
> > >identities (like cos^2+sin^2=1) aren't retained with such inputs.
> > 
> > I agree: the program is complete nonsense.
> 
> I disagree: there may be cases where large inputs can be meaningful.

Very doubtful when those values are themself computed by some mean, in 
difference to being meant as test integer input.  IMHO integer radians are 
already doubtful, and a rouding error of just half an ULP for these large 
numbers already mean wildy different integers and hence completely off 
reductions.

> And it may be important that some identities (like cos^2+sin^2=1) be
> preserved.

Well, you're not going to get this without much more work in sin/cos.

> > It would be useful to know what the intent was.
> 
> If the user requested such a computation, there should at least be some 
> intent. Unless an option like -ffast-math is given, the result should be 
> accurate.
> 
> For the glibc, I've finally reported a bug here:
> 
>   http://sourceware.org/bugzilla/show_bug.cgi?id=13658

That is about 1.0e22, not the obscene 4.47460300787e+182 of the original 
poster.  Btw, the correct results for sin are about .7021835074240, and 
cos about -.71199601256024, so the sum is about -.0098125051362.  His math 
library didn't get these results in either way.  Nevertheless it doesn't 
make sense to require such precision for these input for a math library 
intended to be used for normal means.

If you want to have precise numbers use an arbitrary precision math 
library.


Ciao,
Michael.


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 16:57:19 +0100, Michael Matz wrote:
> > And it may be important that some identities (like cos^2+sin^2=1) be
> > preserved.
> 
> Well, you're not going to get this without much more work in sin/cos.

If you use the glibc sin() and cos(), you already have this (possibly
up to a few ulp's).

> > For the glibc, I've finally reported a bug here:
> > 
> >   http://sourceware.org/bugzilla/show_bug.cgi?id=13658
> 
> That is about 1.0e22, not the obscene 4.47460300787e+182 of the original 
> poster.

But 1.0e22 cannot be handled correctly.

> Btw, the correct results for sin are about .7021835074240, and 
> cos about -.71199601256024, so the sum is about -.0098125051362.  His math 
> library didn't get these results in either way.  Nevertheless it doesn't 
> make sense to require such precision for these input for a math library 
> intended to be used for normal means.

That's just your opinion.

> If you want to have precise numbers use an arbitrary precision math 
> library.

This is double precision. An arbitrary precision math library (even
though I develop one) shouldn't be needed.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Robert Dewar

On 2/3/2012 10:55 AM, Vincent Lefevre wrote:

On 2012-02-03 10:33:58 -0500, Robert Dewar wrote:

On 2/3/2012 10:28 AM, Vincent Lefevre wrote:

If the user requested such a computation, there should at least be
some intent. Unless an option like -ffast-math is given, the result
should be accurate.


What is the basis for that claim? to me it seems useless to expect
anything from such absurd arguments. Can you site a requirement to
the contrary (other than your (to me) unrealistic expectations).
In particular, such large numbers are of course represented
imprecisely.


Actually you don't know.


Yes, I do! The floating-point representation of this number
does NOT represent the number you wrote, but a slightly
different number, whose cos/sin values will be wildly
different from the cos/sin values of the number you wrote,
so what's the point of trying to get that value exact, when
it is not the value you are looking for anyway.


Of course, the value probably comes from
somewhere, where it is imprecise. But there are codes that assume
that their input values should be regarded as exact or they will
no longer work. Reasons can be that algorithms are designed in such
a way and/or that consistency is important. A particular field is
computational geometry. For instance, you have a point and a line
given by their coordinates, which are in general imprecise.
Nevertheless, one generally wants to consider that the point is
always seen as being on one fixed side of the line (or exactly on
the line). If some parts of the program, because they do not compute
with high precision enough, behave as if the point were on some side
and other parts behave as if the point were on the other side, this
can yield important problems.


But if you write arbitrary floating-point constants, then of
course they are not represented exactly in general.


Another property that one may want is "correct rounding", mainly
for reproducibility. For instance, this was needed by the LHC@home
project of CERN (to check results performed on different machines,
IIRC), even though the results were complete chaos.


Well of course you will get different results for this on different
machines, regardless of "correct rounding", whatever that means!






Re: Memory corruption due to word sharing

2012-02-03 Thread Andrew MacLeod



And I assume that since the compiler does them, that would now make it
impossible for us to gather a list of all the 'lock' prefixes so that
we can undo them if it turns out that we are running on a UP machine.

When we do SMP operations, we don't just add a "lock" prefix to it. We do this:

   #define LOCK_PREFIX_HERE \
 ".section .smp_locks,\"a\"\n"   \
 ".balign 4\n"   \
 ".long 671f - .\n" /* offset */ \
 ".previous\n"   \
 "671:"

   #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "



I don't see why  we cant do something  similar when the compiler issues 
a lock on an atomic operation.  I would guess we'd want to put it under 
some sort of flag control (something like -fatomic-lock-list ) since 
most applications aren't going to want that section.  It certainly seems 
plausible to me anyway.

and I'm sure you know that, but I'm not sure the gcc people realize
the kinds of games we play to make things work better.


No, but someone just needs to tell us -)


We need both variants in the kernel. If the compiler generates one of
them for us, that doesn't really much help.

I must admit that the non-x86 per-CPU atomics are, ummm, "interesting".

Most non-x86 cpu's would probably be better off treating them the same
as smp-atomics (load-locked + store-conditional), but right now we
have this insane generic infrastructure for having versions that are
irq-safe by disabling interrupts etc. Ugh. Mainly because nobody
really is willing to work on and fix up the 25 architectures that
really don't matter.


The atomic intrinsics were created for c++11  memory model compliance, 
but I am certainly open to enhancements that would make them more 
useful.   I am planning some enhancements for 4.8 now, and it sounds 
like you may have some suggestions...


Andrew


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Dominique Dhumieres
While I fail to see how the "correct value" of 
cos(4.47460300787e+182)+sin(4.47460300787e+182)
can be defined in the 'double' world, cos^2(x)+sin^2(x)=1 and 
sin(2*x)=2*sin(x)*cos(x) seems to be verified (at least for this value)
even if the actual values of sin and cos depends on the optimisation level.
Note that sqrt(2.0)*sin(4.47460300787e+182+pi/4) gives a diffeent value
for the sum.

Cheers,

Dominique


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Michael Matz
Hi,

On Fri, 3 Feb 2012, Vincent Lefevre wrote:

> > > For the glibc, I've finally reported a bug here:
> > > 
> > >   http://sourceware.org/bugzilla/show_bug.cgi?id=13658
> > 
> > That is about 1.0e22, not the obscene 4.47460300787e+182 of the 
> > original poster.
> 
> But 1.0e22 cannot be handled correctly.

I'm not sure what you're getting at.  Yes, 1e22 isn't handled correctly, 
but this thread was about 4.47460300787e+182 until you changed topics.

> > If you want to have precise numbers use an arbitrary precision math 
> > library.
> 
> This is double precision. An arbitrary precision math library (even 
> though I develop one) shouldn't be needed.

But the reduction step needs much more precision than it currently uses to 
handle inputs as large as 4e182 correctly.


Ciao,
Michael.


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Michael Matz
On Fri, 3 Feb 2012, Dominique Dhumieres wrote:

> Note that sqrt(2.0)*sin(4.47460300787e+182+pi/4) gives a diffeent value 
> for the sum.

In double: 4.47460300787e+182 + pi/4 == 4.47460300787e+182


Ciao,
Michael.


Re: Memory corruption due to word sharing

2012-02-03 Thread Linus Torvalds
On Fri, Feb 3, 2012 at 8:38 AM, Andrew MacLeod  wrote:
>
> The atomic intrinsics were created for c++11  memory model compliance, but I
> am certainly open to enhancements that would make them more useful.   I am
> planning some enhancements for 4.8 now, and it sounds like you may have some
> suggestions...

So we have several atomics we use in the kernel, with the more common being

 - add (and subtract) and cmpchg of both 'int' and 'long'

 - add_return (add and return new value)

 - special cases of the above:
  dec_and_test (decrement and test result for zero)
  inc_and_test (decrement and test result for zero)
  add_negative (add and check if result is negative)

   The special cases are because older x86 cannot do the generic
"add_return" efficiently - it needs xadd - but can do atomic versions
that test the end result and give zero or sign information.

 - atomic_add_unless() - basically an optimized cmpxchg.

 - atomic bit array operations (bit set, clear, set-and-test,
clear-and-test). We do them on "unsigned long" exclusively, and in
fact we do them on arrays of unsigned long, ie we have the whole "bts
reg,mem" semantics. I'm not sure we really care about the atomic
versions for the arrays, so it's possible we only really care about a
single long.

   The only complication with the bit setting is that we have a
concept of "set/clear bit with memory barrier before or after the bit"
(for locking). We don't do the whole release/acquire thing, though.

 - compare_xchg_double

We also do byte/word atomic increments and decrements, but that' sin
the x86 spinlock implementation, so it's not a generic need.

We also do the add version in particular as CPU-local optimizations
that do not need to be SMP-safe, but do need to be interrupt-safe. On
x86, this is just an r-m-w op, on most other architectures it ends up
being the usual load-locked/store-conditional.

I think that's pretty much it, but maybe I'm missing something.

Of course, locking itself tends to be special cases of the above with
extra memory barriers, but it's usually hidden in asm for other
reasons (the bit-op + barrier being a special case).

  Linus


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Konstantin Vladimirov
Hi,

I agree, that this case have no practical value. It was autogenerated
between other thousands of tests and showed really strange results, so
I decided to ask. I thought, this value fits double precision range
and, according to C standard, all double-precision arithmetics must be
avaliable for it.

Thanks everybody for explanation, I will constrain trig function
arguments, according to "x is separable with x+pi/4" rule. It seems,
everything works inside this range.

---
With best regards, Konstantin

On Fri, Feb 3, 2012 at 7:13 PM, Robert Dewar  wrote:
> On 2/3/2012 10:01 AM, Michael Matz wrote:
>
>> No normal math library supports such an extreme range, even basic
>> identities (like cos^2+sin^2=1) aren't retained with such inputs.
>
>
> I agree: the program is complete nonsense. It would be useful to know
> what the intent was.
>>
>>
>>
>> Ciao,
>> Michael.
>
>


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Robert Dewar

On 2/3/2012 1:12 PM, Konstantin Vladimirov wrote:

Hi,

I agree, that this case have no practical value. It was autogenerated
between other thousands of tests and showed really strange results, so
I decided to ask. I thought, this value fits double precision range
and, according to C standard, all double-precision arithmetics must be
avaliable for it.


Yes, indeed and it was available. There was nothing "really strange"
about the results. The only thing strange was your expectations here :-)


Thanks everybody for explanation, I will constrain trig function
arguments, according to "x is separable with x+pi/4" rule. It seems,
everything works inside this range.


Yes, it is a good idea to only generate useful tests when you
are autogenerating, otherwise you will get garbage in garbage out :-)


---
With best regards, Konstantin

On Fri, Feb 3, 2012 at 7:13 PM, Robert Dewar  wrote:

On 2/3/2012 10:01 AM, Michael Matz wrote:


No normal math library supports such an extreme range, even basic
identities (like cos^2+sin^2=1) aren't retained with such inputs.



I agree: the program is complete nonsense. It would be useful to know
what the intent was.




Ciao,
Michael.







Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Toon Moene

On 02/03/2012 04:13 PM, Robert Dewar wrote:


On 2/3/2012 10:01 AM, Michael Matz wrote:


No normal math library supports such an extreme range, even basic
identities (like cos^2+sin^2=1) aren't retained with such inputs.



I agree: the program is complete nonsense. It would be useful to know
what the intent was.


You think this is bad ?

We got a complaint from a user in the pre-egcs days about g77.  It 
turned out he tried to use a Fortran program on his shiny new Pentium 
computer that was written on one of the descendents of the CDC 6600, 
with CDC 60-bit floating point values encoded in octal in a DATA statement.


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Memory corruption due to word sharing

2012-02-03 Thread Andrew MacLeod

On 02/03/2012 12:16 PM, Linus Torvalds wrote:


So we have several atomics we use in the kernel, with the more common being

  - add (and subtract) and cmpchg of both 'int' and 'long'
This would be __atomic_fetch_add, __atomic_fetch_sub, and 
__atomic_compare_exchange.


For 4.8 __atomic_compare_exchange is planned to be better optimized then 
it is now...  ie, it currently uses the same form as c++ requires:


 atomic_compare_exchange (&var, &expected, value, weak/strong, 
memorymodel).


'expected' is updated in place with the current value if it doesn't 
match.  With the address of expected taken, we dont always do a good job 
generating code for it... I plan to remedy that in 4.8 so that it is 
efficient and doesn't impact optimization of 'expected' elsewhere.

  - add_return (add and return new value)
__atomic_add_fetch returns the new value. (__atomic_fetch_add returns 
the old value).   If it isn't as efficient as it needs to be, the RTL 
pattern can be fixed.what sequence do you currently use for this?  
The compiler currently generates the equivilent of


lock; xadd
add

ie, it performs the atomic add then re-adds the same value to the 
previous value to get the atomic post-add value.  If there is something 
more efficient, we ought to be able to do the same.

  - special cases of the above:
   dec_and_test (decrement and test result for zero)
   inc_and_test (decrement and test result for zero)
   add_negative (add and check if result is negative)

The special cases are because older x86 cannot do the generic
"add_return" efficiently - it needs xadd - but can do atomic versions
that test the end result and give zero or sign information.
Since these are older x86 only, could you use add_return() always and 
then have the compiler use new peephole optimizations to detect those 
usage patterns and change the instruction sequence for x86 when 
required?  would that be acceptable?Or maybe you don't trust the 
compiler :-)   Or maybe I can innocently ask if the performance impact 
on older x86 matters enough any more? :-)

  - atomic_add_unless() - basically an optimized cmpxchg.


is this the reverse of a compare_exchange and add?  Ie, add if the value 
ISN'T expected?   or some form of compare_exchange_and_add?This 
might require a new atomic builltin.


What exactly does it do?


  - atomic bit array operations (bit set, clear, set-and-test,
clear-and-test). We do them on "unsigned long" exclusively, and in
fact we do them on arrays of unsigned long, ie we have the whole "bts
reg,mem" semantics. I'm not sure we really care about the atomic
versions for the arrays, so it's possible we only really care about a
single long.

The only complication with the bit setting is that we have a
concept of "set/clear bit with memory barrier before or after the bit"
(for locking). We don't do the whole release/acquire thing, though.
Are these functions wrappers to a  tight load, mask, cmpxchg loop? or 
something else?  These could also require new built-ins if they can't be 
constructed from the existing operations...


  - compare_xchg_double

We also do byte/word atomic increments and decrements, but that' sin
the x86 spinlock implementation, so it's not a generic need.
The existing __atomic builtins will work on 1,2,4,8 or 16 byte values 
regardless of type, as long as the hardware supports those sizes.  so 
x86-64 can do a 16 byte cmpxchg.


In theory, the add_fetch and sub_fetch are suppose to use INC/DEC if the 
operand is 1/-1 and the result isn't used.  If it isnt doing this right 
now, I will fix it.

We also do the add version in particular as CPU-local optimizations
that do not need to be SMP-safe, but do need to be interrupt-safe. On
x86, this is just an r-m-w op, on most other architectures it ends up
being the usual load-locked/store-conditional.

It may be possible to add modifier extensions to the memory model 
component for such a thing. ie


v = __atomic_add_fetch (&v, __ATOMIC_RELAXED | __ATOMIC_CPU_LOCAL)

which will allow fine tuning for something more specific like this. 
Targets which dont care can ignore it, but x86 could have atomic_add  
avoid the lock when the CPU_LOCAL modifier flag is present.



I think that's pretty much it, but maybe I'm missing something.

Of course, locking itself tends to be special cases of the above with
extra memory barriers, but it's usually hidden in asm for other
reasons (the bit-op + barrier being a special case).


All of the __atomic operations are currently optimization barriers in 
both directions, the optimizers tend to treat them like function calls.  
I hope to enable some sorts of optimizations eventually, especially 
based on memory model... but for now we play it safe.


Synchronization barriers are inserted based on the memory model used.  
If it can be determined that something additional is required that the 
existing memory model doesn't cover, it could be possible to add 
extensions beyond the c++11 memory mo

Re: Memory corruption due to word sharing

2012-02-03 Thread Linus Torvalds
On Fri, Feb 3, 2012 at 11:16 AM, Andrew MacLeod  wrote:
>>    The special cases are because older x86 cannot do the generic
>> "add_return" efficiently - it needs xadd - but can do atomic versions
>> that test the end result and give zero or sign information.
>
> Since these are older x86 only, could you use add_return() always and then
> have the compiler use new peephole optimizations to detect those usage
> patterns and change the instruction sequence for x86 when required?  would
> that be acceptable?    Or maybe you don't trust the compiler :-)   Or maybe
> I can innocently ask if the performance impact on older x86 matters enough
> any more? :-)

Since xadd is often slow even when it does exist, the peepholes would
definitely be required. A "dec_and_test" does need to be

   decl %m (or "subl $1" for x86-64)
   je

rather than

   xadd %r,%m
   cmp $1,%r
   je

simply because the latter is not just larger, but often quite a bit
slower (newer CPU's seem to be better at xadd).

But with the peepholes, by the time we could depend on gcc doing the
intrinsics, we'd almost certainly be ready to let the old pre-xadd
machines just go.

We're already almost there, and by the time we can trust the compiler
to do it for us, we're almost certainly going to be at a point where
we really don't care any more about pre-xadd (but we will still care
about older machines where xadd is slow).

>>  - atomic_add_unless() - basically an optimized cmpxchg.
>
> is this the reverse of a compare_exchange and add?  Ie, add if the value
> ISN'T expected?   or some form of compare_exchange_and_add?    This might
> require a new atomic builltin.
>
> What exactly does it do?

It's basically

   do {
 load-link %r,%m
 if (r == value)
 return 0;
 add
   } while (store-conditional %r,%m)
   return 1;

and it is used to implement two *very* common (and critical)
reference-counting use cases:

 - decrement ref-count locklessly, and if it hits free, take a lock
atomically (ie "it would be a bug for anybody to ever see it decrement
to zero while holding the lock")

 - increment ref-count locklessly, but if we race with the final free,
don't touch it, because it's gone (ie "if somebody decremented it to
zero while holding the lock, we absolutely cannot increment it again")

They may sound odd, but those two operations are absolutely critical
for most lock-less refcount management. And taking locks for reference
counting is absolutely death to performance, and is often impossible
anyway (the lock may be a local lock that is *inside* the structure
that is being reference-counted, so if the refcount is zero, then you
cannot even look at the lock!)

In the above example, the load-locked -> store-conditional would
obviously be a cmpxchg loop instead on architectures that do cmpxchg
instead of ll/sc.

Of course, it you expose some intrinsic for the whole "ll/sc" model
(and you then turn it into cmpxchg on demand), we could literally
open-code it.

That is likely the most *flexible* approach for a compiler. I think
pretty much everything the kernel needs (except for cmpxchg_double)
can be very naturally written as a "ll/sc" sequence, and if the
compiler then just does the right thing with peephole optimizations,
that's fine.

IOW, we don't really need "atomic_add()" or anything like that. If we can do

  do {
 val = __load_linked(mem);
 val++;
  } while (__store_conditional(val, mem));

and the compiler just automagically turns that into "lock add" on x86,
that's perfectly fine.

It might not be too hard, because you really don't need to recognize
very many patterns, and any pattern you don't recognize could be
turned into a cmpxchg loop.

NOTE NOTE NOTE! The "turned into a cmpxchg loop" is not the true
correct translation of load-linked/store-conditional, since it allows
the memory to be modified as long as it's modified *back* before the
store-conditional, and that actually matters for a few algorithms. But
if you document the fact that it's not a "true" ll/sc (and maybe have
some compile-time way to detect when it isn't), it would be very
flexible.

Of course, the syntax could be something completely different. Maybe
you'd want to do it as

   __builtin_ll_sc(&var, update-expression, return-expression,
failure-expression)

rather than an explicit loop.

But it doesn't sound like the internal gcc model is based on some
generic ll/sc model.

>>  - atomic bit array operations (bit set, clear, set-and-test,
>> clear-and-test). We do them on "unsigned long" exclusively, and in
>> fact we do them on arrays of unsigned long, ie we have the whole "bts
>> reg,mem" semantics. I'm not sure we really care about the atomic
>> versions for the arrays, so it's possible we only really care about a
>> single long.
>>
>>    The only complication with the bit setting is that we have a
>> concept of "set/clear bit with memory barrier before or after the bit"
>> (for locking). We don't do the whole release/acquire thing, though.
>
> Are

Re: Memory corruption due to word sharing

2012-02-03 Thread Paul E. McKenney
On Fri, Feb 03, 2012 at 12:00:03PM -0800, Linus Torvalds wrote:
> On Fri, Feb 3, 2012 at 11:16 AM, Andrew MacLeod  wrote:

[ . . . ]

> Having access to __ATOMIC_ACQUIRE would actually be an improvement -
> it's just that the architectures that really care about things like
> that simply don't matter enough for us to really worry about them
> (ia64 being the prime example), and everybody else either doesn't need
> extra barriers at all (x86), or might as well use a full barrier (most
> everything else).

I believe that version 8 of the ARM architecture will provide load-acquire
and store-release instructions.

Thanx, Paul



Misleading error message with templated c++ code

2012-02-03 Thread Peter A. Felvegi

Hello,

compiling the following:
---8<---8<---8<---8<---
template
struct Base
{
typename T::Typevar;
};
template
struct Derived : Base >
{
typedef U   Type;
};
void foo()
{
Derived i;
}
---8<---8<---8<---8<---
gives the error

gcctempl.cpp: In instantiation of ‘struct Base >’:
gcctempl.cpp:7:8:   required from ‘struct Derived’
gcctempl.cpp:13:15:   required from here
gcctempl.cpp:4:19: error: no type named ‘Type’ in ‘struct Derived’

on all tested gcc versions (4.4, 4.5, 4.6,4.7). There is definitely a 
type called 'Type' in struct 'Derived'. I'm not sure, the above code 
might be ill-formed, but then I'd like to see a specific error message.


Regards, Peter




Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 11:35:39 -0500, Robert Dewar wrote:
> On 2/3/2012 10:55 AM, Vincent Lefevre wrote:
> >On 2012-02-03 10:33:58 -0500, Robert Dewar wrote:
> >>What is the basis for that claim? to me it seems useless to expect
> >>anything from such absurd arguments. Can you site a requirement to
> >>the contrary (other than your (to me) unrealistic expectations).
> >>In particular, such large numbers are of course represented
   ^^
> >>imprecisely.
> >
> >Actually you don't know.
> 
> Yes, I do! The floating-point representation of this number

Well, there is a difference between large numbers in general (which
may or may not be represented exactly, depending on the value) and
the example given by Konstantin.

> does NOT represent the number you wrote, but a slightly different
> number,

This fact is not even necessarily correct because you don't know the
intent of the programmer. In the program,

  double a = 4.47460300787e+182;

could mean two things:

1. A number which approximates the decimal number 4.47460300787e+182,
in which case I agree with you. Note that even though it is an
approximation, the approximated value a is the same on two different
IEEE 754 platforms, so that one can expect that sin(a) gives two
values that are close to each other on these two different platforms.

2. A number exactly equal to the rounding (to nearest) of the decimal
number 4.47460300787e+182 in double precision. Imagine that you have
a binary64 (double precision) number, convert it to decimal with
sufficient precision in order to be able to convert it back to the
original binary64 number. This decimal string could have been the
result of such a conversion. IEEE 754 has been designed to be able
to do that. This choice has also been followed by some XML spec on
schemas, i.e. if you write 4.47460300787e+182, this really means a
binary64 number, not the decimal number 4.47460300787e+182 (even
though an hex format would be less ambiguous without context, the
decimal format also allows the human to have an idea about the
number).

> whose cos/sin values will be wildly
> different from the cos/sin values of the number you wrote,
> so what's the point of trying to get that value exact, when
> it is not the value you are looking for anyway.
> 
> >Of course, the value probably comes from
> >somewhere, where it is imprecise. But there are codes that assume
> >that their input values should be regarded as exact or they will
> >no longer work. Reasons can be that algorithms are designed in such
> >a way and/or that consistency is important. A particular field is
> >computational geometry. For instance, you have a point and a line
> >given by their coordinates, which are in general imprecise.
> >Nevertheless, one generally wants to consider that the point is
> >always seen as being on one fixed side of the line (or exactly on
> >the line). If some parts of the program, because they do not compute
> >with high precision enough, behave as if the point were on some side
> >and other parts behave as if the point were on the other side, this
> >can yield important problems.
> 
> But if you write arbitrary floating-point constants, then of
> course they are not represented exactly in general.

In general, but not always. And even if the initial constants do not
represent exactly the real values, one may want them to be handled
in a consistent manner (as if they were exact).

> >Another property that one may want is "correct rounding", mainly
> >for reproducibility. For instance, this was needed by the LHC@home
> >project of CERN (to check results performed on different machines,
> >IIRC), even though the results were complete chaos.
> 
> Well of course you will get different results for this on different
> machines, regardless of "correct rounding", whatever that means!

No, thanks to correct rounding (provided by CRlibm), all machines with
the same inputs were giving the same results, even though the results
were meaningless.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 17:44:21 +0100, Michael Matz wrote:
> Hi,
> 
> On Fri, 3 Feb 2012, Vincent Lefevre wrote:
> 
> > > > For the glibc, I've finally reported a bug here:
> > > > 
> > > >   http://sourceware.org/bugzilla/show_bug.cgi?id=13658
> > > 
> > > That is about 1.0e22, not the obscene 4.47460300787e+182 of the 
> > > original poster.
> > 
> > But 1.0e22 cannot be handled correctly.
> 
> I'm not sure what you're getting at.  Yes, 1e22 isn't handled correctly, 
> but this thread was about 4.47460300787e+182 until you changed topics.

No, the topic is: "weird optimization in sin+cos, x86 backend".
But actually, as said by Richard Guenther, the bug is not the
optimization, but the behavior of the sincos function. What I'm
saying is that this wrong behavior can also be seen on 1.0e22
(at least with the glibc, and any implementation that uses the
fsincos x86 instruction).

> > > If you want to have precise numbers use an arbitrary precision math 
> > > library.
> > 
> > This is double precision. An arbitrary precision math library (even 
> > though I develop one) shouldn't be needed.
> 
> But the reduction step needs much more precision than it currently uses to 
> handle inputs as large as 4e182 correctly.

No, the range reduction can cleverly be done using the working
precision (such as with Payne and Hanek's algorithm).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Robert Dewar

On 2/3/2012 4:32 PM, Vincent Lefevre wrote:


Yes, I do! The floating-point representation of this number



This fact is not even necessarily correct because you don't know the
intent of the programmer. In the program,



   double a = 4.47460300787e+182;

could mean two things:

1. A number which approximates the decimal number 4.47460300787e+182,
in which case I agree with you. Note that even though it is an
approximation, the approximated value a is the same on two different
IEEE 754 platforms, so that one can expect that sin(a) gives two
values that are close to each other on these two different platforms.

2. A number exactly equal to the rounding (to nearest) of the decimal
number 4.47460300787e+182 in double precision. Imagine that you have
a binary64 (double precision) number, convert it to decimal with
sufficient precision in order to be able to convert it back to the
original binary64 number. This decimal string could have been the
result of such a conversion. IEEE 754 has been designed to be able
to do that. This choice has also been followed by some XML spec on
schemas, i.e. if you write 4.47460300787e+182, this really means a
binary64 number, not the decimal number 4.47460300787e+182 (even
though an hex format would be less ambiguous without context, the
decimal format also allows the human to have an idea about the
number).


Sure, in general that might be possible, but seemed unlikely in
this case, and indeed what was really going on is essentially
random numbers chosen by a test program generating automatic
tests.

Really that's in neither category, though I suppose you could argue
that it is closer to 2, i.e. that the intent of the automatically
generated test program is to get (and test) this rounding.

But in any case, it seems better for him to apply his suggestion
of sticking within the pi/4 makes a difference range :-)


No, thanks to correct rounding (provided by CRlibm), all machines with
the same inputs were giving the same results, even though the results
were meaningless.


All machines that implement IEEE arithmetic :-) As we know only too well
from the universe of machines on which we implement GNAT, this is not
all machines :-)






Re: Misleading error message with templated c++ code

2012-02-03 Thread Jonathan Wakely
On 3 February 2012 21:28, Peter A. Felvegi wrote:
> on all tested gcc versions (4.4, 4.5, 4.6,4.7). There is definitely a type
> called 'Type' in struct 'Derived'. I'm not sure, the above code might be
> ill-formed, but then I'd like to see a specific error message.

I think the problem is that Derived is incomplete, but please report
it to bugzilla instead of this list so it can be dealt with properly,
thanks.


gcc-4.6-20120203 is now available

2012-02-03 Thread gccadmin
Snapshot gcc-4.6-20120203 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20120203/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 183881

You'll find:

 gcc-4.6-20120203.tar.bz2 Complete GCC

  MD5=ef7d77f1d8987a0dd8c8c7cb94aeffd6
  SHA1=92061c0becbffd007d774d7521d22a057d59049f

Diffs from 4.6-20120127 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread James Courtier-Dutton
On 3 February 2012 16:24, Vincent Lefevre  wrote:
> On 2012-02-03 16:57:19 +0100, Michael Matz wrote:
>> > And it may be important that some identities (like cos^2+sin^2=1) be
>> > preserved.
>>
>> Well, you're not going to get this without much more work in sin/cos.
>
> If you use the glibc sin() and cos(), you already have this (possibly
> up to a few ulp's).
>
>> > For the glibc, I've finally reported a bug here:
>> >
>> >   http://sourceware.org/bugzilla/show_bug.cgi?id=13658
>>
>> That is about 1.0e22, not the obscene 4.47460300787e+182 of the original
>> poster.
>
> But 1.0e22 cannot be handled correctly.
>

Of course it can't.
You only have 52 bits of precision in double floating point numbers.
So, although you can have 1.0e22, you cannot represent (1.0e22 + 1)
If you understood how the sin and cos functions actually get
calculated, you would understand why the ability to handle +1 is so
important.
sin and cos first have to translate the input value into a range of 0
to 2Pi, so if you cannot represent (1.0e22) and (1.0e22 +  2Pi), it is
not possible to translate the input value with any sort of way that
makes sense.

To represent the 1.0e22+1, you need more than 52 bits of precision.
If you move down to the 1.0e14 range, things start making sense again.

The easiest way to ensure correct calculations in floating point maths
is to convert everything into binary and then make sure you have
enough bits of precision at every stage.

Summary: 1.0e22 is just as obscene as 4.47460300787e+182 from the
point of view of sin and cos.


Kind Regards

James


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread James Courtier-Dutton
On 3 February 2012 18:12, Konstantin Vladimirov
 wrote:
> Hi,
>
> I agree, that this case have no practical value. It was autogenerated
> between other thousands of tests and showed really strange results, so
> I decided to ask. I thought, this value fits double precision range
> and, according to C standard, all double-precision arithmetics must be
> avaliable for it.
>
> Thanks everybody for explanation, I will constrain trig function
> arguments, according to "x is separable with x+pi/4" rule. It seems,
> everything works inside this range.
>

The important bit is ensuring the input can be represented in 52 bits
of precision of the double floating point number.
The higher the input number, the fewer discrete real values you can
represent in the 0 to pi/4 range.

I will try to explain in base 10, with 3 digits of precision.
Take the range change to between 0 to 9 (instead of 0 to pi/4)
Input value 3.21, no range change (like a "n mod 10" but for reals)
needed, I can have input after the range change of values 1.00, 1.01,
1.02 ... 3.21 ... 9.99.  I.e 1000 discrete values between 0 and 10.

Input value 32.1, range change to 2.1, I can have input values 1.0,
1.1, 1.2, ... 9.9. I.e. Only 100 discrete values between 0 and 10.
Input value 321, range change to 1, I can have input values of 1, 2,
3, ... 9. I.e. Only 10 discrete values between 0 and 10.
Input value 3210, range change to 0, I can have input values of 0.
I.e. Only 1 discrete value between 0 and 10.
Input value 32100...  etc. (note the 3 digits of precision)

Kind Regards

James


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 17:40:05 +0100, Dominique Dhumieres wrote:
> While I fail to see how the "correct value" of 
> cos(4.47460300787e+182)+sin(4.47460300787e+182)
> can be defined in the 'double' world, cos^2(x)+sin^2(x)=1 and 
> sin(2*x)=2*sin(x)*cos(x) seems to be verified (at least for this value)
> even if the actual values of sin and cos depends on the optimisation level.

Actually this depends on the context. It is even worse: the value
of sin(some_value) depends on the context. Consider the following
program:

#include 
#include 

int main (void)
{
  double x, c, s;
  volatile double v;

  x = 1.0e22;
  s = sin (x);
  printf ("sin(%.17g) = %.17g\n", x, s);

  v = x;
  x = v;
  c = cos (x);
  s = sin (x);
  printf ("sin(%.17g) = %.17g\n", x, s);

  return c == 0;
}

With "gcc -O" on x86_64 with glibc 2.13, one gets:

sin(1e+22) = -0.85220084976718879
sin(1e+22) = 0.46261304076460175

In the program, you can replace 1.0e22 by 4.47460300787e+182 or
whatever large constant you want. You'll get the same problem.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 16:51:22 -0500, Robert Dewar wrote:
> All machines that implement IEEE arithmetic :-) As we know only too well
> from the universe of machines on which we implement GNAT, this is not
> all machines :-)

But I think that machines with no IEEE support will tend to disappear
(and already tends to disappear). IEEE support has even been introduced
in GPU's. And users who need a good arithmetic will run their code on
machines capable of providing one.

Concerning the sincos problem here, this is just a bug in the library,
which doesn't follow Intel's spec. Note that the glibc manual says
that sincos is implemented on the x86_64 with a maximum error of 1 ulp
(Section 19.7 Known Maximum Errors in Math Functions), so that the code
clearly doesn't follow the documentation.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: weird optimization in sin+cos, x86 backend

2012-02-03 Thread Vincent Lefevre
On 2012-02-03 22:57:31 +, James Courtier-Dutton wrote:
> On 3 February 2012 16:24, Vincent Lefevre  wrote:
> > But 1.0e22 cannot be handled correctly.
> 
> Of course it can't.
> You only have 52 bits of precision in double floating point numbers.

Wrong. 53 bits of precision. And 10^22 is the last power of 10
exactly representable in double precision (FYI, this example has
been chosen because of this property).

In case you don't know, I've participated in the revision of
the IEEE 754 standard, giving the current IEEE 754-2008. And
our team works on computer arithmetic (as my signature says),
mainly floating-point arithmetic. So, I know what I'm talking
about (though I can also make mistakes, as everyone...).

> So, although you can have 1.0e22, you cannot represent (1.0e22 + 1)
[...]

The constant is 1.0e22. Whether you can represent 1.0e22 + 1 or
not doesn't matter.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)