https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69132

--- Comment #2 from Wang Xuancong <xuancong84 at gmail dot com> ---
I assume rcp(b)=1/b, so a/b=a*(1/b)=a*rcp(b).
There is no longer a need to do the Newton-Rhapson method.
And of course, computing [ a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b)))] is
slower than computing [a*rcp(b)].
I understand that vdivps takes a very long time, but the straight-forward
method only takes vrcpps+vmulps time, which is much faster than what the
compiler is doing currently, i.e. vrcpps+3*vmulps+vaddps+vsubps time.

Reply via email to