https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69868

            Bug ID: 69868
           Summary: vec_perm built-in is not handled by swap optimization
                    on powerpc64le
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: wschmidt at gcc dot gnu.org
          Reporter: wschmidt at gcc dot gnu.org
                CC: dje.gcc at gmail dot com, uweigand at gcc dot gnu.org
  Target Milestone: ---
            Target: powerpc64le-unknown-linux-gnu

The following test case, compiled with --std=c++11 -maltivec -O3 on
powerpc64le-unknown-linux-gnu, produces assembly code from which the endian
swaps have not been removed.  The problem is that swap optimization is not
smart enough to recognize the patterns produced by the vec_perm built-in. 
Although we do recognize a vperm instruction whose permute control vector is
loaded from the constant pool, the vec_perm built-in produces a sequence in
which the PCV is loaded and then complemented, which requires more work to get
right.

This provides an opportunity for further performance improvement, since swap
optimization should be able to perform the complement at compile time, swap the
results, and create this as a new constant to be loaded in the generated code. 
This is something we've wanted to do anyway, and doing it in the context of
swap optimization will catch these opportunities immediately after expand.

Opening this against myself as a reminder to fix this during next stage 1.

Test case:

#include <cstdlib>
#include <altivec.h>

using VecUC = vector unsigned char;
using VecUI = vector unsigned int;

void bar(VecUC *vpInput, VecUI *vpOut)
{
  VecUI v1 = {0,};
  VecUI vMask = { 0xffffff, 0xffffff,0xffffff,0xffffff};
  VecUI vShift = { 0xfffff, 0xffffff,0xffffff,0xffffff};
  VecUC vPermControl = { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 };

#define FOO(a,b,c)  v1 = (VecUI)vec_perm(vpInput[a], vpInput[b], vPermControl);
\
  v1 = vec_sr(v1, vShift);                     \
  v1 = vec_and(v1, vMask);                          \
  vpOut[c] = v1;

  FOO(0,0,0);
  FOO(0,0,1);
  FOO(0,1,2);
  FOO(0,1,3);
  FOO(1,0,4);
  FOO(1,0,5);
}

Reply via email to