https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #10)
> > 
> > I do wonder about the usefulness of the memory alternative on the
> > sse_movhlps pattern though, there's the sse_storehps pattern which
> > also models the store part more precisely as V2SFmode.  Is
> > sse_movhlps_exp ever invoked with a memory destination?
> > 
> 
> Like this?
> 
> typedef float v4sf __attribute__((vector_size(16)));
> void
> foo (v4sf a, v4sf* b)
> {
>     *b = __builtin_shufflevector (*b, a, 0, 1, 4, 5);
> }
> 
> 
> foo(float __vector(4), float __vector(4)*):
>         movlps  QWORD PTR [rdi+8], xmm0     # 11      [c=4 l=3] 
> sse_movlhps/4
>         ret       # 19        [c=0 l=1]  simple_return_internal

Indeed.  Btw, I checked and

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 72acd5bde5e..6cd0d932bd9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11045,7 +11045,7 @@
 })

 (define_insn "sse_movhlps"
-  [(set (match_operand:V4SF 0 "nonimmediate_operand"     "=x,v,x,v,m")
+  [(set (match_operand:V4SF 0 "nonimmediate_operand"     "=x,v,x,v,$m")
        (vec_select:V4SF
          (vec_concat:V8SF
            (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")

does _not_ fix the regression (but maybe I did it wrong).

Reply via email to