https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Richard Biener from comment #1)
> > Hmm, so the difference is that we use loop vect for 'foo' but fail to do
> > that for 'bar' and BB vect succeeds.  Disabling loop vect but enabling BB
> > vect also produces optimal code for 'foo' (unrolling happens before):
> > 
> > foo:
> > .LFB0:
> >         .cfi_startproc
> >         vpmovzxwd       (%rsi), %ymm0
> >         vpmovzxwd       (%rdi), %ymm1
> >         vpaddd  %ymm1, %ymm0, %ymm0
> >         vmovdqu %ymm0, (%rdx)
> >         vzeroupper
> > 
> > the key difference in the vectorizer is that BB vect supports different
> > vector sizes in the same instance but the loop vectorizer can only use
> > a single vector size.
> Is there any plan for extending loop vectorizer to handle different vector
> sizes?

It's not an easy task - we're committing to vector types stmt-local and quite
early (vect_determine_vectorization_factor), the same is in principle true
for BB vect but there we know the vectorization factor beforehand (it's 1 - we
can't unroll a BB) and thus see to tweak the vector size instead of failing.

What would need to be done is determine the output vector type in
vectorizable_conversion based on the input vector types.  But then that
would need to be another phase of vectorizable_* calls since the
final vectorization factor would not be set.  The whole thing is related
to vector size iteration where the idea would be to somehow compute for
each stmt a set of input & output vector types that the target supports
and then somehow select sets that we want to send to costing.

As said - a lot of work, sth that might be easier when we got rid of the
SLP vs. non-SLP duality.

Reply via email to