https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69132
--- Comment #2 from Wang Xuancong <xuancong84 at gmail dot com> --- I assume rcp(b)=1/b, so a/b=a*(1/b)=a*rcp(b). There is no longer a need to do the Newton-Rhapson method. And of course, computing [ a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b)))] is slower than computing [a*rcp(b)]. I understand that vdivps takes a very long time, but the straight-forward method only takes vrcpps+vmulps time, which is much faster than what the compiler is doing currently, i.e. vrcpps+3*vmulps+vaddps+vsubps time.