https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #25 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> 
> --- Comment #24 from Chris Elrod <elrodc at gmail dot com> ---
> The dump looks like this:
> 
>   vect__67.78_217 = SQRT (vect__213.77_225);
>   vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
> 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0
> } / vect__67.78_217;
>   vect__71.80_249 = vect__246.59_65 * vect_ui33_68.79_248;
>   vect_u13_73.81_250 = vect__187.71_14 * vect_ui33_68.79_248;
>   vect_u23_75.82_251 = vect__200.74_5 * vect_ui33_68.79_248;
> 
> so the vrsqrt optimization happens later. g++ shows the same problems with
> weird code generation. However this:
> 
>  /* sqrt(a)  = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0)
>     rsqrt(a) = -0.5     * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */
> 
> does not match this:
> 
>         vrsqrt14ps      %zmm1, %zmm2 # comparison and mask removed
>         vmulps  %zmm1, %zmm2, %zmm0
>         vmulps  %zmm2, %zmm0, %zmm1
>         vmulps  %zmm6, %zmm0, %zmm0
>         vaddps  %zmm7, %zmm1, %zmm1
>         vmulps  %zmm0, %zmm1, %zmm1
>         vrcp14ps        %zmm1, %zmm0
>         vmulps  %zmm1, %zmm0, %zmm1
>         vmulps  %zmm1, %zmm0, %zmm1
>         vaddps  %zmm0, %zmm0, %zmm0
>         vsubps  %zmm1, %zmm0, %zmm0
> 
> Recommendations on the next place to look for what's going on?

You can try enabling -mrecip to see RSQRT in .optimized - there's
probably late 1/sqrt optimization on RTL.

Reply via email to