Hello,

I definitely don't expect the attached patch to be accepted, but I would like some advice on the direction to go, and a patch that passes the testsuite and does the optimization I want on a couple testcases seems like it may help start the conversation. This is the first time I even look at .md files...

The goal is to optimize: v8sf x; v4sf y=*(v4sf*)&x; so the compiler doesn't copy x to memory (yes, I know there is an intrinsic to do that).

If I understood Richard Guenther's comment in the PR, it can be optimized in the back-end. The only way I found to place this kind of transformation is with define_peephole2. And I couldn't figure out how to test if 2 memory operands correspond to the same address, with different types (so match_dup is unhappy), and for some reason the XEXP(*,0) comparison said yes on my test and no when using an unrelated piece of memory, but it looks like a nonsense test that is just lucky on a couple trivial examples.

Any help?


2012-05-02  Marc Glisse  <marc.gli...@inria.fr>
        PR target/53101

gcc/
        * config/i386/sse.md: New peephole2 for subvectors.

gcc/testsuite/
        * gcc.target/i386/pr53101.c: New test.


--
Marc Glisse
Index: gcc/testsuite/gcc.target/i386/pr53101.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr53101.c     (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr53101.c     (revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v4si __attribute__ ((vector_size (16)));
+typedef double v8si __attribute__ ((vector_size (32)));
+
+v4si
+avx_extract_v4si (v8si x)
+{
+  return *(v4si*)&x;
+}
+
+v2df
+avx_extract_v2df (v4df x __attribute((unused)), v4df y)
+{
+  return *(v2df*)&y;
+}
+
+/* { dg-final { scan-assembler-not "movdq" } } */
+/* { dg-final { scan-assembler-times "movapd" 1 } } */

Property changes on: gcc/testsuite/gcc.target/i386/pr53101.c
___________________________________________________________________
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md      (revision 187012)
+++ gcc/config/i386/sse.md      (working copy)
@@ -4104,10 +4104,34 @@
 
   emit_move_insn (operands[0], adjust_address (operands[1], SFmode, i*4));
   DONE;
 })
 
+;; This is how we receive accesses to the first half of a vector.
+(define_peephole2
+  [(set (match_operand:VI8F_256 3 "memory_operand")
+        (match_operand:VI8F_256 1 "register_operand"))
+   (set (match_operand:<ssehalfvecmode> 0 "register_operand")
+        (match_operand:<ssehalfvecmode> 2 "memory_operand"))]
+  "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))"
+  [(set (match_dup 0)
+        (vec_select:<ssehalfvecmode> (match_dup 1)
+                                     (parallel [(const_int 0) (const_int 
1)])))]
+)
+
+(define_peephole2
+  [(set (match_operand:VI4F_256 3 "memory_operand")
+        (match_operand:VI4F_256 1 "register_operand"))
+   (set (match_operand:<ssehalfvecmode> 0 "register_operand")
+        (match_operand:<ssehalfvecmode> 2 "memory_operand"))]
+  "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))"
+  [(set (match_dup 0)
+        (vec_select:<ssehalfvecmode> (match_dup 1)
+                                     (parallel [(const_int 0) (const_int 1)
+                                               (const_int 2) (const_int 3)])))]
+)
+
 (define_expand "avx_vextractf128<mode>"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
    (match_operand:V_256 1 "register_operand")
    (match_operand:SI 2 "const_0_to_1_operand")]
   "TARGET_AVX"

Reply via email to