https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
_mm_storel_pi could be implemented using __builtin_shufflevector these days.
Which shows exactly the same issue:

typedef float __attribute__((vector_size(8))) v2sf_t;
typedef float __attribute__((vector_size(16))) v4sf_t;

v2sf_t test(v4sf_t x, v4sf_t y) {
        v2sf_t x2, y2;

        x2 = __builtin_shufflevector (x, x, 0, 1);
        y2 = __builtin_shufflevector (y, x, 0, 1);

        return x2 + y2;
}

expands to

(insn 7 4 8 2 (set (reg:DI 88)
        (vec_select:DI (subreg:V2DI (reg/v:V4SF 85 [ x ]) 0)
            (parallel [
                    (const_int 0 [0])
                ]))) "t.c":7:5 -1
     (nil))
(insn 8 7 9 2 (set (reg:DI 89)
        (vec_select:DI (subreg:V2DI (reg/v:V4SF 86 [ y ]) 0)
            (parallel [
                    (const_int 0 [0])
                ]))) "t.c":8:5 -1
     (nil))
(insn 9 8 10 2 (set (reg:V2SF 87)
        (plus:V2SF (subreg:V2SF (reg:DI 88) 0)
            (subreg:V2SF (reg:DI 89) 0))) "t.c":12:12 -1
     (nil))

and is recognized by the same set_noop_p code.  On GIMPLE we have

  x2_2 = BIT_FIELD_REF <x_1(D), 64, 0>;
  y2_4 = BIT_FIELD_REF <y_3(D), 64, 0>;
  _5 = x2_2 + y2_4;

Reply via email to