https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099
Bug ID: 121099 Summary: GCC doesn't optimize `_mm_set_ps()` very well Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: lh_mouse at 126 dot com Target Milestone: --- For this 4-way comparison function: (https://gcc.godbolt.org/z/sY4vdcjdq) ``` // Returns (angles are in degrees) // - 0b1110 for 0 - 45 where x > y > 0 > -x // - 0b1111 for 45 - 90 where y > x > 0 > -x // - 0b0111 for 90 - 135 where y > -x > 0 > x // - 0b0011 for 135 - 180 where -x > y > 0 > x // - 0b0001 for 180 - 225 where -x > 0 > y > x // - 0b0000 for 225 - 270 where -x > 0 > x > y // - 0b1000 for 270 - 315 where x > 0 > -x > y // - 0b1100 for 315 - 360 where x > 0 > y > -x int octant_of_angle(float y, float x) { __m128 ps = _mm_cmpgt_ps(_mm_set_ps(x, x, y, y), _mm_set_ps(0, -y, 0, x)); return _mm_movemask_ps(ps); } ``` GCC emits 8 instructions for the two `_mm_set_ps()` intrins: ``` vunpcklps xmm2, xmm0, xmm0 vxorps xmm0, xmm0, XMMWORD PTR .LC0[rip] vxorps xmm4, xmm4, xmm4 vunpcklps xmm3, xmm1, xmm1 vunpcklps xmm1, xmm1, xmm4 vunpcklps xmm0, xmm0, xmm4 vmovlhps xmm2, xmm2, xmm3 vmovlhps xmm1, xmm1, xmm0 ``` while Clang only emits 3: ``` vshufps xmm2, xmm0, xmm1, 0 vxorps xmm0, xmm0, xmmword ptr [rip + .LCPI0_0] vinsertps xmm0, xmm1, xmm0, 42 ```