[PATCH][i386] Properly scale vec_construct cost

Richard Biener Tue, 21 Apr 2015 04:31:26 -0700

Hi,

currently vec_construct cost is simply TYPE_VECTOR_SUBPARTS / 2 + 1,
a reasonable estimate only of other target stmt costs are close to 1.
The idea was you need that many vector stmts thus the following patch
which should fix skewed costs for bdver2 for example with a
vec_stmt_cost of 6.


Fixing this gets important for a fix for PR62283 which will consider
building vectors up from parts during basic-block vectorization
and relies on the cost model to reject too expensive ones.
For example gcc.dg/vect/bb-slp-14.c will now be vectorized (with
the generic cost model and just SSE2) as

Cost model analysis:
  Vector inside of basic block cost: 2
  Vector prologue cost: 7
  Vector epilogue cost: 0
  Scalar cost of basic block: 10

.LFB7:
        .cfi_startproc
        subq    $24, %rsp
        .cfi_def_cfa_offset 32
        movl    in+12(%rip), %eax
        testl   %edi, %edi
        movd    in+4(%rip), %xmm0
        movd    in(%rip), %xmm1
        movl    %eax, 12(%rsp)
        movd    in+4(%rip), %xmm4
        movd    12(%rsp), %xmm3
        movl    %edi, 12(%rsp)
        punpckldq       %xmm4, %xmm1
        punpckldq       %xmm3, %xmm0
        punpcklqdq      %xmm0, %xmm1
        movd    12(%rsp), %xmm0
        movl    %esi, 12(%rsp)
        movd    12(%rsp), %xmm5
        paddd   .LC2(%rip), %xmm1
        movdqa  %xmm1, %xmm2
        psrlq   $32, %xmm1
        punpckldq       %xmm5, %xmm0
        punpcklqdq      %xmm0, %xmm0
        pmuludq %xmm0, %xmm2
        psrlq   $32, %xmm0
        pmuludq %xmm1, %xmm0
        pshufd  $8, %xmm2, %xmm1
        pshufd  $8, %xmm0, %xmm0
        punpckldq       %xmm0, %xmm1
        movaps  %xmm1, out(%rip)
        je      .L12

vs. the scalar variant

.LFB7:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    in(%rip), %edx
        movl    in+4(%rip), %eax
        movl    in+12(%rip), %ecx
        addl    $23, %edx
        imull   %edi, %edx
        leal    31(%rcx), %r8d
        movl    %edx, out(%rip)
        leal    142(%rax), %edx
        addl    $2, %eax
        imull   %edi, %eax
        imull   %esi, %edx
        movl    %eax, out+8(%rip)
        movl    %r8d, %eax
        imull   %esi, %eax
        testl   %edi, %edi
        movl    %edx, out+4(%rip)
        movl    %eax, out+12(%rip)
        je      .L12

Some excessive PRE across the conditional asm() keeps part
of the scalar computes live (yes, the cost model accounts
for that).  Previously we didn't vectorize the basic-block
because the loads from in[] could not be vectorized.  Now
we will build up a vector from the scalar loads.

The vectorized code is generated from

  <bb 2>:
  vect_cst_.19_43 = {x_10(D), y_13(D), x_10(D), y_13(D)};
  _3 = in[0];
  _5 = in[1];
  _8 = in[3];
  vect_cst_.16_47 = {_3, _5, _5, _8};
  vect_a0_4.15_42 = vect_cst_.16_47 + { 23, 142, 2, 31 };
  vect__11.18_44 = vect_a0_4.15_42 * vect_cst_.19_43;
  MEM[(unsigned int *)&out] = vect__11.18_44;

thus the code we generate for

  _3 = in[0];
  _5 = in[1];
  _8 = in[3];
  vect_cst_.16_47 = {_3, _5, _5, _8};

is quite bad.  It get's better for -mavx but I wonder where we
should try to optimize code generation for constructors...
(we can vectorize the loads by enhancing load permutation support,
of course - another vectorizer improvement I have some partial
patches for).

Well, anyway - below for the "obvoious" cost model patch.

Boostrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for trunk?

Thanks,
Richard.

2015-04-21  Richard Biener  <rguent...@suse.de>

        * config/i386/i386.c (ix86_builtin_vectorization_cost): Scale
        vec_construct cost by vec_stmt_cost.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c      (revision 222230)
+++ gcc/config/i386/i386.c      (working copy)
@@ -46731,7 +46731,7 @@ ix86_builtin_vectorization_cost (enum ve
 
       case vec_construct:
        elements = TYPE_VECTOR_SUBPARTS (vectype);
-       return elements / 2 + 1;
+       return ix86_cost->vec_stmt_cost * (elements / 2 + 1);
 
       default:
         gcc_unreachable ();

[PATCH][i386] Properly scale vec_construct cost

Reply via email to