https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- _mm_storel_pi could be implemented using __builtin_shufflevector these days. Which shows exactly the same issue: typedef float __attribute__((vector_size(8))) v2sf_t; typedef float __attribute__((vector_size(16))) v4sf_t; v2sf_t test(v4sf_t x, v4sf_t y) { v2sf_t x2, y2; x2 = __builtin_shufflevector (x, x, 0, 1); y2 = __builtin_shufflevector (y, x, 0, 1); return x2 + y2; } expands to (insn 7 4 8 2 (set (reg:DI 88) (vec_select:DI (subreg:V2DI (reg/v:V4SF 85 [ x ]) 0) (parallel [ (const_int 0 [0]) ]))) "t.c":7:5 -1 (nil)) (insn 8 7 9 2 (set (reg:DI 89) (vec_select:DI (subreg:V2DI (reg/v:V4SF 86 [ y ]) 0) (parallel [ (const_int 0 [0]) ]))) "t.c":8:5 -1 (nil)) (insn 9 8 10 2 (set (reg:V2SF 87) (plus:V2SF (subreg:V2SF (reg:DI 88) 0) (subreg:V2SF (reg:DI 89) 0))) "t.c":12:12 -1 (nil)) and is recognized by the same set_noop_p code. On GIMPLE we have x2_2 = BIT_FIELD_REF <x_1(D), 64, 0>; y2_4 = BIT_FIELD_REF <y_3(D), 64, 0>; _5 = x2_2 + y2_4;