https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68664
Bug ID: 68664
Summary: PowerPC: speculative sqrt in c-ray main loop causes
large slow down
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: anton at samba dot org
Target Milestone: ---
c-ray (a tiny ray tracer) can be found at:
http://www.sgidepot.co.uk/depot/c-ray-1.1.tar.gz
When built and run with the following args:
gcc -mcpu=power8 -O3 -ffast-math -o c-ray-mt c-ray-mt.c -lpthread -lm
./c-ray-mt -t 1 -s 768x432 -r 8 -i sphfract -o sphfract.ppm
a speculative sqrt is scheduled inside the hottest loop:
2.46 : 10001a44: ld r9,72(r9)
0.00 : 10001a48: cmpdi cr7,r9,0
0.00 : 10001a4c: beq cr7,10001be0 <shade+0x320>
0.05 : 10001a50: lfd f8,0(r9)
0.00 : 10001a54: lfd f9,8(r9)
0.00 : 10001a58: lfd f0,16(r9)
0.00 : 10001a5c: lfd f7,24(r9)
2.74 : 10001a60: fmadd f2,f8,f8,f1
0.05 : 10001a64: fmul f12,f9,f5
0.00 : 10001a68: fsub f11,f5,f9
0.00 : 10001a6c: fsub f6,f4,f8
0.00 : 10001a70: fsub f10,f3,f0
5.34 : 10001a74: fmadd f9,f9,f9,f2
0.08 : 10001a78: fnmadd f8,f8,f4,f12
0.00 : 10001a7c: xsmuldp vs11,vs11,vs33
0.00 : 10001a80: fmadd f9,f0,f0,f9
9.99 : 10001a84: fnmsub f12,f0,f3,f8
0.10 : 10001a88: xsmaddadp vs11,vs32,vs6
0.00 : 10001a8c: fnmsub f0,f7,f7,f9
0.00 : 10001a90: xsmaddadp vs11,vs45,vs10
11.32 : 10001a94: xsmaddadp vs0,vs12,vs43
0.16 : 10001a98: fadd f12,f11,f11
0.00 : 10001a9c: xsmuldp vs0,vs0,vs44
0.01 : 10001aa0: fmsub f0,f12,f12,f0
64.34 : 10001aa4: fcmpu cr7,f0,f31
0.97 : 10001aa8: fsqrt f11,f0 <----- here I am
0.00 : 10001aac: blt cr7,10001a44 <shade+0x184>
Building with -fno-sched-spec improves performance by almost 2x:
gcc -mcpu=power8 -O3 -ffast-math -fno-sched-spec -o c-ray-mt c-ray-mt.c
-lpthread -lm