Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

vladimir . mezentsev Mon, 29 Jan 2018 12:38:45 -0800

On 01/25/2018 04:14 PM, Joseph Myers wrote:
> On Thu, 25 Jan 2018, vladimir.mezent...@oracle.com wrote:
>
>> From: Vladimir Mezentsev <vladimir.mezent...@oracle.com>
>>
>> Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and 
>> sparc64-unknown-linux-gnu.
>> No regression. New tests now passed.
>> There is a performance degradation for complex double type:
> This is, of course, something to consider for GCC 9; it's not suitable for 
> the current development stage.
  Why ?


>
> Could you provide a more extended writeup of the issue (starting with the 
> argument for it being a bug at all), the approach you took and the 
> rationale for the approach, when you resubmit the patch for GCC 9 stage 1?

The formula for complex division (a+bi)/(c+di) is (a*c+b*d)/(c*c+d*d) +
(b*c-a*d)/(c*c+d*d)i

The current implementation avoids overflow in (c*c + d*d) using
x=MAX(|c|, |d|);  c1=c/x; d1=d/x;
and then calculating the result as
(a*c1+b*d1)/(c1*c1+d1*d1)*(1.0/x) + (b*c1-a*d1)/(c1*c1+d1*d1)*(1.0/x)i

This approach has two issues:
1. Unexpected rounding when x is not a power of 2.0
For example, (1.0+3.0i) /(1.0+3.0i) should be 1.0
but it is (1.0 - 1.66533e-17i) on aarch64 (PR59714)

2. Unexpected underflow in c/x or d/x.
For example, (0x1.0p+1023 + 0x0.8p-1022 i)/(0x1.0p+677 + 0x1.0p-677 i)
should be (0x1.0p+346 - 0x1.0p-1008 i) but it is (0x1.0p+346)

My implementation avoids these issues.


>
>> * libgcc/config/sparc/sfp-machine.h: New
> Are you introducing a requirement for all architectures to provide 
> sfp-machine.h?  If so, did you determine that SPARC was the only one 
> lacking it?  If not the only one lacking it, you need to make sure that 
> you do not break any existing architecture in GCC.
>
> What about architectures using non-IEEE formats?  soft-fp is not suitable 
> for those, but you mustn't break them.  Remember that e.g. TFmode can be 
> IBM long double, and other ?Fmode similarly need not be IEEE; the name 
> only indicates how many bytes are in the format, nothing else about it.
Agree. 
An additional fix is needed.
Build  can be broken without sfp-mashine.h or on  architectures using
non-IEEE format.

>
> What about powerpc __divkc3?
>
> What was the rationale for using soft-fp rather than adding appropriate 
> built-in functions as suggested in a comment?

I had a version with built-in functions and I can restore it.

Let's design what we want to use soft-fp or built-in function.
I'd prefer to use built-in functions but performance is in two times worse.

The test below demonstrates performance degradation:
% cat r1.c
#include <stdio.h>

int main(int argc, char** argv)
{
  int i;
  long long sum = 0;
  for(i = 1; i < 1000000000; i++)
    {
      int exp;
      double d = (double) i;
#ifdef BUITIN
      double v = __builtin_frexp(d, &exp);
#else
      union _FP_UNION_D
      {
        double flt;
        struct _FP_STRUCT_LAYOUT
        {
          unsigned frac0 : 32;
          unsigned frac1 : 20;
          unsigned exp   : 11;
          unsigned sign  : 1;
        } bits __attribute__ ((packed));
        long long LL;
      } u;
      u.flt = d;
      exp = u.bits.exp - 1022;
#endif
      sum += exp * (i % 2 == 0 ? 1 : -1);
    }
  printf("sum = %d\n", sum);
  return 0;
}

%gcc -O2 -DBUITIN r1.c ; time ./a.out
real    0m4.247s
user    0m4.243s
sys    0m0.001s

%gcc -O2 r1.c ; time ./a.out
real    0m1.977s
user    0m1.973s
sys    0m0.001s


  -Vladimir

Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

Reply via email to