https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89155
Bug ID: 89155 Summary: Suboptimal code generation for SSE intrinsics based rsqrt Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: nok.raven at gmail dot com Target Milestone: --- Target: x86_64 #include <x86intrin.h> float rsqrtf_a(float x) { return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ps1(x))); } float rsqrtf_b(float x) { return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ss(x))); } float rsqrtf_c(float x) { return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ps(0, 0, 0, x))); } https://godbolt.org/z/VrF-vM All these functions should result in a single rsqrtss instruction, but currently GCC produces suboptimal code (Clang 3.9+ optimizes it perfectly). Related to bug 55016.