[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

amker at gcc dot gnu.org Thu, 12 May 2016 08:04:55 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848


--- Comment #8 from amker at gcc dot gnu.org ---
(In reply to amker from comment #7)
> (In reply to Jim Wilson from comment #6)
> > Testing the vcond_mask* patch with make check gave 6 regressions for both
> > armhf and aarch64.
> > 
> > FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error)
> > FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors)
> > FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
> > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler
> > error)
> > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess
> > errors)
> > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects  scan-tree-dump-times
> > vec
> > t "LOOP VECTORIZED" 2
> > 
> > The problem here looks like a flaw in the vcond* patterns.  They support int
> > and fp compare operands, but only int selection operands.  E.g. for 
> >   (A op B ? X : Y)
> > A and B can be either int or fp, but X and Y can only be int.  Adding the
> > vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't
> > before, and that exposes the problem.
> > 
> > The x86 port is the only port with vcond and vcond_mask patterns, and it
> > supports all four combinations if int/fp compare/select operands, so it
> > appears that aarch64 should also.
> > 
> > I will need time to figure out how to fix the vcond* problems before I can
> > formally submit the vcond_mask* patch.
> 
> Hi Jim,
> We have a patch which supports all vcond/vcondu patterns (AArch64 yet)
> including missing ones.  The patch also introduces vec_cmp&vcond_mask
> because it re-implements vcond/vcondu using these two patterns.  It will be
> ready for review shortly, but this issue itself needs vectorizer fix I think.

Hmm, supporting vcond_mask can save one cmlt instruction because it's
introduced in expand_vec_cond_expr when the input op0 is not a comparison.

Propagating _20 to both VEC_COND_EXPR in below can same the "not" and "cmlt"
instructions:

  _20 = vect__1.6_8 == { 0, 0, 0, 0 };
  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

Reply via email to