https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #25 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 22 Jan 2019, elrodc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 > > --- Comment #24 from Chris Elrod <elrodc at gmail dot com> --- > The dump looks like this: > > vect__67.78_217 = SQRT (vect__213.77_225); > vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, > 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 > } / vect__67.78_217; > vect__71.80_249 = vect__246.59_65 * vect_ui33_68.79_248; > vect_u13_73.81_250 = vect__187.71_14 * vect_ui33_68.79_248; > vect_u23_75.82_251 = vect__200.74_5 * vect_ui33_68.79_248; > > so the vrsqrt optimization happens later. g++ shows the same problems with > weird code generation. However this: > > /* sqrt(a) = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) > rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ > > does not match this: > > vrsqrt14ps %zmm1, %zmm2 # comparison and mask removed > vmulps %zmm1, %zmm2, %zmm0 > vmulps %zmm2, %zmm0, %zmm1 > vmulps %zmm6, %zmm0, %zmm0 > vaddps %zmm7, %zmm1, %zmm1 > vmulps %zmm0, %zmm1, %zmm1 > vrcp14ps %zmm1, %zmm0 > vmulps %zmm1, %zmm0, %zmm1 > vmulps %zmm1, %zmm0, %zmm1 > vaddps %zmm0, %zmm0, %zmm0 > vsubps %zmm1, %zmm0, %zmm0 > > Recommendations on the next place to look for what's going on? You can try enabling -mrecip to see RSQRT in .optimized - there's probably late 1/sqrt optimization on RTL.