Hello,I definitely don't expect the attached patch to be accepted, but I would like some advice on the direction to go, and a patch that passes the testsuite and does the optimization I want on a couple testcases seems like it may help start the conversation. This is the first time I even look at .md files...
The goal is to optimize: v8sf x; v4sf y=*(v4sf*)&x; so the compiler doesn't copy x to memory (yes, I know there is an intrinsic to do that).
If I understood Richard Guenther's comment in the PR, it can be optimized in the back-end. The only way I found to place this kind of transformation is with define_peephole2. And I couldn't figure out how to test if 2 memory operands correspond to the same address, with different types (so match_dup is unhappy), and for some reason the XEXP(*,0) comparison said yes on my test and no when using an unrelated piece of memory, but it looks like a nonsense test that is just lucky on a couple trivial examples.
Any help? 2012-05-02 Marc Glisse <marc.gli...@inria.fr> PR target/53101 gcc/ * config/i386/sse.md: New peephole2 for subvectors. gcc/testsuite/ * gcc.target/i386/pr53101.c: New test. -- Marc Glisse
Index: gcc/testsuite/gcc.target/i386/pr53101.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr53101.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr53101.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef double v2df __attribute__ ((vector_size (16))); +typedef double v4df __attribute__ ((vector_size (32))); +typedef double v4si __attribute__ ((vector_size (16))); +typedef double v8si __attribute__ ((vector_size (32))); + +v4si +avx_extract_v4si (v8si x) +{ + return *(v4si*)&x; +} + +v2df +avx_extract_v2df (v4df x __attribute((unused)), v4df y) +{ + return *(v2df*)&y; +} + +/* { dg-final { scan-assembler-not "movdq" } } */ +/* { dg-final { scan-assembler-times "movapd" 1 } } */ Property changes on: gcc/testsuite/gcc.target/i386/pr53101.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 187012) +++ gcc/config/i386/sse.md (working copy) @@ -4104,10 +4104,34 @@ emit_move_insn (operands[0], adjust_address (operands[1], SFmode, i*4)); DONE; }) +;; This is how we receive accesses to the first half of a vector. +(define_peephole2 + [(set (match_operand:VI8F_256 3 "memory_operand") + (match_operand:VI8F_256 1 "register_operand")) + (set (match_operand:<ssehalfvecmode> 0 "register_operand") + (match_operand:<ssehalfvecmode> 2 "memory_operand"))] + "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))" + [(set (match_dup 0) + (vec_select:<ssehalfvecmode> (match_dup 1) + (parallel [(const_int 0) (const_int 1)])))] +) + +(define_peephole2 + [(set (match_operand:VI4F_256 3 "memory_operand") + (match_operand:VI4F_256 1 "register_operand")) + (set (match_operand:<ssehalfvecmode> 0 "register_operand") + (match_operand:<ssehalfvecmode> 2 "memory_operand"))] + "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))" + [(set (match_dup 0) + (vec_select:<ssehalfvecmode> (match_dup 1) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] +) + (define_expand "avx_vextractf128<mode>" [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand") (match_operand:V_256 1 "register_operand") (match_operand:SI 2 "const_0_to_1_operand")] "TARGET_AVX"