https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78243
Bug ID: 78243
Summary: incorrect byte offset in vextractuh with -mcpu=power9
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: meissner at gcc dot gnu.org
Reporter: acsawdey at gcc dot gnu.org
CC: wschmidt at linux dot vnet.ibm.com
Target Milestone: ---
Target: powerpc64*-*-*
The test case is:
gcc.c-torture/execute/pr68532.c
#define SIZE 128
unsigned short _Alignas (16) in[SIZE];
__attribute__ ((noinline)) int
test (unsigned short sum, unsigned short *in, int x)
{
for (int j = 0; j < SIZE; j += 8)
sum += in[j] * x;
return sum;
}
/home/sawdey/src/gcc/trunk2/build/gcc/xgcc
-B/home/sawdey/src/gcc/trunk2/build/gcc/
/home/sawdey/src/gcc/trunk2/trunk/gcc/testsuite/gcc.c-torture/execute/pr68532.c
-mcpu=power9 -Wl,-rpath=/tmp/lib64 -fno-diagnostics-show-caret
-fdiagnostics-color=never -O2 -O2 -ftree-vectorize -fno-vect-cost-model -lm -g
-S -o pr68532.s
asm output is:
vsplth 0,0,3
lxv 33,-16(4)
vperm 1,1,10,13
vperm 1,1,11,13
xxspltib 43,0
vperm 1,1,12,13
xxspltib 45,0
vmladduhm 0,0,1,13
vsum4shs 0,0,11
vsumsws 0,0,11
vextractuh 0,0,0
mfvsrd 9,32
add 3,3,9
The vsumsws produces a result in word 3 of v0, so the vextractuh 3rd argument
should be 14 to extract bytes 14,15 in order to produce the correct result.
Another question about this code would be why we produce all this complex
vector code but only use it for the first half of the computation, the latter
half uses this simple code:
.L2:
lhz 9,0(4)
addi 4,4,16
mullw 9,9,5
add 3,9,3
.LVL4:
rlwinm 3,3,0,0xffff
.LVL5:
.loc 1 10 0
bdnz .L2
It seems like we could just do this for 2 vector lengths and not need this
cleanup code.