https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69868
Bug ID: 69868 Summary: vec_perm built-in is not handled by swap optimization on powerpc64le Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: wschmidt at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org CC: dje.gcc at gmail dot com, uweigand at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-unknown-linux-gnu The following test case, compiled with --std=c++11 -maltivec -O3 on powerpc64le-unknown-linux-gnu, produces assembly code from which the endian swaps have not been removed. The problem is that swap optimization is not smart enough to recognize the patterns produced by the vec_perm built-in. Although we do recognize a vperm instruction whose permute control vector is loaded from the constant pool, the vec_perm built-in produces a sequence in which the PCV is loaded and then complemented, which requires more work to get right. This provides an opportunity for further performance improvement, since swap optimization should be able to perform the complement at compile time, swap the results, and create this as a new constant to be loaded in the generated code. This is something we've wanted to do anyway, and doing it in the context of swap optimization will catch these opportunities immediately after expand. Opening this against myself as a reminder to fix this during next stage 1. Test case: #include <cstdlib> #include <altivec.h> using VecUC = vector unsigned char; using VecUI = vector unsigned int; void bar(VecUC *vpInput, VecUI *vpOut) { VecUI v1 = {0,}; VecUI vMask = { 0xffffff, 0xffffff,0xffffff,0xffffff}; VecUI vShift = { 0xfffff, 0xffffff,0xffffff,0xffffff}; VecUC vPermControl = { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 }; #define FOO(a,b,c) v1 = (VecUI)vec_perm(vpInput[a], vpInput[b], vPermControl); \ v1 = vec_sr(v1, vShift); \ v1 = vec_and(v1, vMask); \ vpOut[c] = v1; FOO(0,0,0); FOO(0,0,1); FOO(0,1,2); FOO(0,1,3); FOO(1,0,4); FOO(1,0,5); }