https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69868
Bug ID: 69868
Summary: vec_perm built-in is not handled by swap optimization
on powerpc64le
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: wschmidt at gcc dot gnu.org
Reporter: wschmidt at gcc dot gnu.org
CC: dje.gcc at gmail dot com, uweigand at gcc dot gnu.org
Target Milestone: ---
Target: powerpc64le-unknown-linux-gnu
The following test case, compiled with --std=c++11 -maltivec -O3 on
powerpc64le-unknown-linux-gnu, produces assembly code from which the endian
swaps have not been removed. The problem is that swap optimization is not
smart enough to recognize the patterns produced by the vec_perm built-in.
Although we do recognize a vperm instruction whose permute control vector is
loaded from the constant pool, the vec_perm built-in produces a sequence in
which the PCV is loaded and then complemented, which requires more work to get
right.
This provides an opportunity for further performance improvement, since swap
optimization should be able to perform the complement at compile time, swap the
results, and create this as a new constant to be loaded in the generated code.
This is something we've wanted to do anyway, and doing it in the context of
swap optimization will catch these opportunities immediately after expand.
Opening this against myself as a reminder to fix this during next stage 1.
Test case:
#include <cstdlib>
#include <altivec.h>
using VecUC = vector unsigned char;
using VecUI = vector unsigned int;
void bar(VecUC *vpInput, VecUI *vpOut)
{
VecUI v1 = {0,};
VecUI vMask = { 0xffffff, 0xffffff,0xffffff,0xffffff};
VecUI vShift = { 0xfffff, 0xffffff,0xffffff,0xffffff};
VecUC vPermControl = { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 };
#define FOO(a,b,c) v1 = (VecUI)vec_perm(vpInput[a], vpInput[b], vPermControl);
\
v1 = vec_sr(v1, vShift); \
v1 = vec_and(v1, vMask); \
vpOut[c] = v1;
FOO(0,0,0);
FOO(0,0,1);
FOO(0,1,2);
FOO(0,1,3);
FOO(1,0,4);
FOO(1,0,5);
}