https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69132
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org,
| |uros at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Your snippet is not self-contained, the cout in there is supposedly useless for
the testcase, but it is unclear what headers are you using and thus whether the
sqrt is in the end __builtin_sqrtf or __builtin_sqrt.
Also, GCC 4.8 is no longer supported.
That said, the "weird" single precision vector division is because it is
computing the division using Newton-Rhapson approximation, as
a / b = a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b)))
You can disable this e.g. with -mrecip='default,!vec-div'
Now, whether this is beneficial even for AVX capable CPUs by default or not
depends on the timing/latencies of vrcpps+3*vmulps+vaddps+vsubps instructions
vs. vdivps.