https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78243
Bug ID: 78243 Summary: incorrect byte offset in vextractuh with -mcpu=power9 Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: meissner at gcc dot gnu.org Reporter: acsawdey at gcc dot gnu.org CC: wschmidt at linux dot vnet.ibm.com Target Milestone: --- Target: powerpc64*-*-* The test case is: gcc.c-torture/execute/pr68532.c #define SIZE 128 unsigned short _Alignas (16) in[SIZE]; __attribute__ ((noinline)) int test (unsigned short sum, unsigned short *in, int x) { for (int j = 0; j < SIZE; j += 8) sum += in[j] * x; return sum; } /home/sawdey/src/gcc/trunk2/build/gcc/xgcc -B/home/sawdey/src/gcc/trunk2/build/gcc/ /home/sawdey/src/gcc/trunk2/trunk/gcc/testsuite/gcc.c-torture/execute/pr68532.c -mcpu=power9 -Wl,-rpath=/tmp/lib64 -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -O2 -ftree-vectorize -fno-vect-cost-model -lm -g -S -o pr68532.s asm output is: vsplth 0,0,3 lxv 33,-16(4) vperm 1,1,10,13 vperm 1,1,11,13 xxspltib 43,0 vperm 1,1,12,13 xxspltib 45,0 vmladduhm 0,0,1,13 vsum4shs 0,0,11 vsumsws 0,0,11 vextractuh 0,0,0 mfvsrd 9,32 add 3,3,9 The vsumsws produces a result in word 3 of v0, so the vextractuh 3rd argument should be 14 to extract bytes 14,15 in order to produce the correct result. Another question about this code would be why we produce all this complex vector code but only use it for the first half of the computation, the latter half uses this simple code: .L2: lhz 9,0(4) addi 4,4,16 mullw 9,9,5 add 3,9,3 .LVL4: rlwinm 3,3,0,0xffff .LVL5: .loc 1 10 0 bdnz .L2 It seems like we could just do this for 2 vector lengths and not need this cleanup code.