On 01/25/2018 04:14 PM, Joseph Myers wrote: > On Thu, 25 Jan 2018, vladimir.mezent...@oracle.com wrote: > >> From: Vladimir Mezentsev <vladimir.mezent...@oracle.com> >> >> Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and >> sparc64-unknown-linux-gnu. >> No regression. New tests now passed. >> There is a performance degradation for complex double type: > This is, of course, something to consider for GCC 9; it's not suitable for > the current development stage. Why ?
> > Could you provide a more extended writeup of the issue (starting with the > argument for it being a bug at all), the approach you took and the > rationale for the approach, when you resubmit the patch for GCC 9 stage 1? The formula for complex division (a+bi)/(c+di) is (a*c+b*d)/(c*c+d*d) + (b*c-a*d)/(c*c+d*d)i The current implementation avoids overflow in (c*c + d*d) using x=MAX(|c|, |d|); c1=c/x; d1=d/x; and then calculating the result as (a*c1+b*d1)/(c1*c1+d1*d1)*(1.0/x) + (b*c1-a*d1)/(c1*c1+d1*d1)*(1.0/x)i This approach has two issues: 1. Unexpected rounding when x is not a power of 2.0 For example, (1.0+3.0i) /(1.0+3.0i) should be 1.0 but it is (1.0 - 1.66533e-17i) on aarch64 (PR59714) 2. Unexpected underflow in c/x or d/x. For example, (0x1.0p+1023 + 0x0.8p-1022 i)/(0x1.0p+677 + 0x1.0p-677 i) should be (0x1.0p+346 - 0x1.0p-1008 i) but it is (0x1.0p+346) My implementation avoids these issues. > >> * libgcc/config/sparc/sfp-machine.h: New > Are you introducing a requirement for all architectures to provide > sfp-machine.h? If so, did you determine that SPARC was the only one > lacking it? If not the only one lacking it, you need to make sure that > you do not break any existing architecture in GCC. > > What about architectures using non-IEEE formats? soft-fp is not suitable > for those, but you mustn't break them. Remember that e.g. TFmode can be > IBM long double, and other ?Fmode similarly need not be IEEE; the name > only indicates how many bytes are in the format, nothing else about it. Agree. An additional fix is needed. Build can be broken without sfp-mashine.h or on architectures using non-IEEE format. > > What about powerpc __divkc3? > > What was the rationale for using soft-fp rather than adding appropriate > built-in functions as suggested in a comment? I had a version with built-in functions and I can restore it. Let's design what we want to use soft-fp or built-in function. I'd prefer to use built-in functions but performance is in two times worse. The test below demonstrates performance degradation: % cat r1.c #include <stdio.h> int main(int argc, char** argv) { int i; long long sum = 0; for(i = 1; i < 1000000000; i++) { int exp; double d = (double) i; #ifdef BUITIN double v = __builtin_frexp(d, &exp); #else union _FP_UNION_D { double flt; struct _FP_STRUCT_LAYOUT { unsigned frac0 : 32; unsigned frac1 : 20; unsigned exp : 11; unsigned sign : 1; } bits __attribute__ ((packed)); long long LL; } u; u.flt = d; exp = u.bits.exp - 1022; #endif sum += exp * (i % 2 == 0 ? 1 : -1); } printf("sum = %d\n", sum); return 0; } %gcc -O2 -DBUITIN r1.c ; time ./a.out real 0m4.247s user 0m4.243s sys 0m0.001s %gcc -O2 r1.c ; time ./a.out real 0m1.977s user 0m1.973s sys 0m0.001s -Vladimir