https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099

            Bug ID: 121099
           Summary: GCC doesn't optimize `_mm_set_ps()` very well
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lh_mouse at 126 dot com
  Target Milestone: ---

For this 4-way comparison function:
(https://gcc.godbolt.org/z/sY4vdcjdq)
```
// Returns (angles are in degrees)
// - 0b1110 for   0 -  45 where x > y > 0 > -x
// - 0b1111 for  45 -  90 where y > x > 0 > -x
// - 0b0111 for  90 - 135 where y > -x > 0 > x
// - 0b0011 for 135 - 180 where -x > y > 0 > x
// - 0b0001 for 180 - 225 where -x > 0 > y > x
// - 0b0000 for 225 - 270 where -x > 0 > x > y
// - 0b1000 for 270 - 315 where x > 0 > -x > y
// - 0b1100 for 315 - 360 where x > 0 > y > -x
int
octant_of_angle(float y, float x)
  {
    __m128 ps = _mm_cmpgt_ps(_mm_set_ps(x, x, y, y), _mm_set_ps(0, -y, 0, x));
    return _mm_movemask_ps(ps);
  }
```

GCC emits 8 instructions for the two `_mm_set_ps()` intrins:
```
        vunpcklps       xmm2, xmm0, xmm0
        vxorps  xmm0, xmm0, XMMWORD PTR .LC0[rip]
        vxorps  xmm4, xmm4, xmm4
        vunpcklps       xmm3, xmm1, xmm1
        vunpcklps       xmm1, xmm1, xmm4
        vunpcklps       xmm0, xmm0, xmm4
        vmovlhps        xmm2, xmm2, xmm3
        vmovlhps        xmm1, xmm1, xmm0
```

while Clang only emits 3:
```
        vshufps xmm2, xmm0, xmm1, 0
        vxorps  xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vinsertps       xmm0, xmm1, xmm0, 42
```

Reply via email to