Hi! I've noticed that my gather vectorization patch unfortunately regressed code quality of gcc.target/i386/avx2-i64gatherd256-2.c gcc.target/i386/avx2-i64gatherd256-3.c gcc.target/i386/avx2-i64gatherd256-4.c gcc.target/i386/avx2-i64gatherps256-3.c gcc.target/i386/avx2-i64gatherps256-4.c tests. The problem is that after the unification of the gather auto-vectorization and gather intrinsics nothing optimizes well the new vec_select of the first half of gather pattern's result, while the vec_select is a nop, register allocation often chooses to allocate the gather pattern result in a different vector register from the result of the following extraction of first half of it. This patch fixes the regression by adding two patterns for combiner. On some of the above tests it saves 2 instructions, one others one.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-01-02 Jakub Jelinek <ja...@redhat.com> * config/i386/sse.md (*avx2_gatherdi<mode>_3, *avx2_gatherdi<mode>_4): New patterns. --- gcc/config/i386/sse.md.jj 2011-11-07 20:32:09.000000000 +0100 +++ gcc/config/i386/sse.md 2011-11-07 20:52:54.000000000 +0100 @@ -12652,3 +12652,49 @@ (define_insn "*avx2_gatherdi<mode>_2" [(set_attr "type" "ssemov") (set_attr "prefix" "vex") (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "*avx2_gatherdi<mode>_3" + [(set (match_operand:<VEC_GATHER_SRCDI> 0 "register_operand" "=&x") + (vec_select:<VEC_GATHER_SRCDI> + (unspec:VI4F_256 + [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0") + (match_operator:<ssescalarmode> 7 "vsib_mem_operator" + [(unspec:P + [(match_operand:P 3 "vsib_address_operand" "p") + (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x") + (match_operand:SI 6 "const1248_operand" "n")] + UNSPEC_VSIBADDR)]) + (mem:BLK (scratch)) + (match_operand:<VEC_GATHER_SRCDI> 5 "register_operand" "1")] + UNSPEC_GATHER) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (clobber (match_scratch:VI4F_256 1 "=&x"))] + "TARGET_AVX2" + "v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %0|%0, %7, %5}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "vex") + (set_attr "mode" "<sseinsnmode>")]) + +(define_insn "*avx2_gatherdi<mode>_4" + [(set (match_operand:<VEC_GATHER_SRCDI> 0 "register_operand" "=&x") + (vec_select:<VEC_GATHER_SRCDI> + (unspec:VI4F_256 + [(pc) + (match_operator:<ssescalarmode> 6 "vsib_mem_operator" + [(unspec:P + [(match_operand:P 2 "vsib_address_operand" "p") + (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x") + (match_operand:SI 5 "const1248_operand" "n")] + UNSPEC_VSIBADDR)]) + (mem:BLK (scratch)) + (match_operand:<VEC_GATHER_SRCDI> 4 "register_operand" "1")] + UNSPEC_GATHER) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (clobber (match_scratch:VI4F_256 1 "=&x"))] + "TARGET_AVX2" + "v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6, %0|%0, %6, %4}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "vex") + (set_attr "mode" "<sseinsnmode>")]) Jakub