https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89155

            Bug ID: 89155
           Summary: Suboptimal code generation for SSE intrinsics based
                    rsqrt
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nok.raven at gmail dot com
  Target Milestone: ---
            Target: x86_64

#include <x86intrin.h>
float rsqrtf_a(float x) {
  return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ps1(x)));
}
float rsqrtf_b(float x) {
  return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ss(x)));
}
float rsqrtf_c(float x) {
  return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ps(0, 0, 0, x)));
}

https://godbolt.org/z/VrF-vM

All these functions should result in a single rsqrtss instruction, but
currently GCC produces suboptimal code (Clang 3.9+ optimizes it perfectly).
Related to bug 55016.

Reply via email to