https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56118
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #4)
> #include <x86intrin.h>
> __m128d f(){
> __m128d r;
> r[0]=1;
> r[1]=2;
> return r;
> }
>
> Currently, SLP vectorizes it with -fvect-cost-model=unlimited, but not by
> default because:
>
> Vector inside of basic block cost: 1
> Vector prologue cost: 1
> Vector epilogue cost: 0
> Scalar cost of basic block: 2
> r.c:4:9: note: not vectorized: vectorization is not profitable.
>
> And if r is initialized to {3,4} as in the initial testcase, we don't
> vectorize either:
>
> r.c:3:17: note: not vectorized: no vectype for stmt: # .MEM_2 = VDEF
> <.MEM_1(D)>
> rD.15637 = { 3.0e+0, 4.0e+0 };
> scalar_type: __m128dD.4386
> r.c:3:17: note: not vectorized: not enough data-refs in basic block.
If we fix that (trivial) we run into
t.c:3:15: note: === vect_slp_analyze_data_ref_dependences ===
t.c:3:15: note: can't determine dependence between r and BIT_FIELD_REF <r, 64,
0>
because we end up with a write-write dependence we can't analyze. Of course
in the end we do not need all dependences but only those for the code motion
we are going to perform.