> 
> Hi,
> 
> currently with -ftree-vectorize we generate for
> 
>   for (i=0; i<3; ++i)
>   # SFT.4346_507 = VDEF <SFT.4346_504(D)>
>   # SFT.4347_508 = VDEF <SFT.4347_505(D)>
>   # SFT.4348_509 = VDEF <SFT.4348_506(D)>
>     d[i] = 0.0;

Also Tomas' patch is supposed to catch this special case and convert it
into memset that should be subsequently optimized into assignment that
should be good enough (which reminds me that I forgot to merge the
memset part of stringop optimizations).

Perhaps this can be made a bit more generic and construct INIT_EXPRs for
small arrays directly from Tomas's pass (going from memset to assignment
works just in special cases).  Tomas, what is the status of your patch?
> 
>   for (j=0; j<n; ++j)
>     x[j] = d;
> 
> (that is, zero a small vector and use that to initialize an array
> of vectors)
> 
> <L266>:;
>   vect_cst_.4501_723 = { 0.0, 0.0 };
>   vect_p.4506_724 = (vector double *) &D.76822;
>   vect_p.4502_725 = vect_p.4506_724;
> 
>   # ivtmp.4508_728 = PHI <0(6), ivtmp.4508_729(11)>
>   # ivtmp.4507_726 = PHI <vect_p.4502_725(6), ivtmp.4507_727(11)>
>   # ivtmp.4461_601 = PHI <3(6), ivtmp.4461_485(11)>
>   # SFT.4348_612 = PHI <SFT.4348_506(D)(6), SFT.4348_509(11)>
>   # SFT.4347_611 = PHI <SFT.4347_505(D)(6), SFT.4347_508(11)>
>   # SFT.4346_610 = PHI <SFT.4346_504(D)(6), SFT.4346_507(11)>
>   # i_582 = PHI <0(6), i_118(11)>
> <L131>:;
>   # SFT.4346_507 = VDEF <SFT.4346_610>
>   # SFT.4347_508 = VDEF <SFT.4347_611>
>   # SFT.4348_509 = VDEF <SFT.4348_612>
>   *ivtmp.4507_726 = vect_cst_.4501_723;
>   i_118 = i_582 + 1;
>   ivtmp.4461_485 = ivtmp.4461_601 - 1;
>   ivtmp.4507_727 = ivtmp.4507_726 + 16B;
>   ivtmp.4508_729 = ivtmp.4508_728 + 1;
>   if (ivtmp.4508_729 < 1) goto <L171>; else goto <L263>;
> 
>   # i_722 = PHI <i_118(7)>
>   # ivtmp.4461_717 = PHI <ivtmp.4461_485(7)>
> <L263>:;
> 
>   # ivtmp.4461_706 = PHI <ivtmp.4461_715(10), 1(8)>
>   # SFT.4348_707 = PHI <SFT.4348_713(10), SFT.4348_509(8)>
>   # SFT.4347_708 = PHI <SFT.4347_712(10), SFT.4347_508(8)>
>   # SFT.4346_709 = PHI <SFT.4346_711(10), SFT.4346_507(8)>
>   # i_710 = PHI <i_714(10), 2(8)>
> <L260>:;
>   # SFT.4346_711 = VDEF <SFT.4346_709>
>   # SFT.4347_712 = VDEF <SFT.4347_708>
>   # SFT.4348_713 = VDEF <SFT.4348_707>
>   D.76822.D.44378.values[i_710] = 0.0;
>   i_714 = i_710 + 1;
>   ivtmp.4461_715 = ivtmp.4461_706 - 1;
>   if (ivtmp.4461_715 != 0) goto <L259>; else goto <L264>;
> 
> ...
> 
> and we are later not able to do constant propagation to the
> second loop which we can do if we first unroll such small loops.
> 
> As we also only vectorize innermost loops I believe doing a
> complete unrolling pass early will help in general (I pushed
> for this some time ago).

Did you run some benchmarks?

Honza
> 
> Thoughts?
> 
> Thanks,
> Richard.
> 
> -- 
> Richard Guenther <[EMAIL PROTECTED]>
> Novell / SUSE Labs

Reply via email to