https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #10) > > > > I do wonder about the usefulness of the memory alternative on the > > sse_movhlps pattern though, there's the sse_storehps pattern which > > also models the store part more precisely as V2SFmode. Is > > sse_movhlps_exp ever invoked with a memory destination? > > > > Like this? > > typedef float v4sf __attribute__((vector_size(16))); > void > foo (v4sf a, v4sf* b) > { > *b = __builtin_shufflevector (*b, a, 0, 1, 4, 5); > } > > > foo(float __vector(4), float __vector(4)*): > movlps QWORD PTR [rdi+8], xmm0 # 11 [c=4 l=3] > sse_movlhps/4 > ret # 19 [c=0 l=1] simple_return_internal Indeed. Btw, I checked and diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 72acd5bde5e..6cd0d932bd9 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -11045,7 +11045,7 @@ }) (define_insn "sse_movhlps" - [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,m") + [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,$m") (vec_select:V4SF (vec_concat:V8SF (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0") does _not_ fix the regression (but maybe I did it wrong).