[Bug target/78243] New: incorrect byte offset in vextractuh with -mcpu=power9

acsawdey at gcc dot gnu.org Mon, 07 Nov 2016 14:49:35 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78243


            Bug ID: 78243
           Summary: incorrect byte offset in vextractuh with -mcpu=power9
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: meissner at gcc dot gnu.org
          Reporter: acsawdey at gcc dot gnu.org
                CC: wschmidt at linux dot vnet.ibm.com
  Target Milestone: ---
            Target: powerpc64*-*-*

The test case is:

gcc.c-torture/execute/pr68532.c

#define SIZE 128
unsigned short _Alignas (16) in[SIZE];

__attribute__ ((noinline)) int
test (unsigned short sum, unsigned short *in, int x)
{
  for (int j = 0; j < SIZE; j += 8)
    sum += in[j] * x;
  return sum;
}

/home/sawdey/src/gcc/trunk2/build/gcc/xgcc
-B/home/sawdey/src/gcc/trunk2/build/gcc/
/home/sawdey/src/gcc/trunk2/trunk/gcc/testsuite/gcc.c-torture/execute/pr68532.c
-mcpu=power9 -Wl,-rpath=/tmp/lib64 -fno-diagnostics-show-caret
-fdiagnostics-color=never -O2 -O2 -ftree-vectorize -fno-vect-cost-model -lm -g
-S -o pr68532.s

asm output is:

        vsplth 0,0,3
        lxv 33,-16(4)
        vperm 1,1,10,13
        vperm 1,1,11,13
        xxspltib 43,0
        vperm 1,1,12,13
        xxspltib 45,0
        vmladduhm 0,0,1,13
        vsum4shs 0,0,11
        vsumsws 0,0,11
        vextractuh 0,0,0
        mfvsrd 9,32
        add 3,3,9

The vsumsws produces a result in word 3 of v0, so the vextractuh 3rd argument
should be 14 to extract bytes 14,15 in order to produce the correct result.

Another question about this code would be why we produce all this complex
vector code but only use it for the first half of the computation, the latter
half uses this simple code:

.L2:
        lhz 9,0(4)
        addi 4,4,16
        mullw 9,9,5
        add 3,9,3
.LVL4:
        rlwinm 3,3,0,0xffff
.LVL5:
        .loc 1 10 0
        bdnz .L2

It seems like we could just do this for 2 vector lengths and not need this
cleanup code.

[Bug target/78243] New: incorrect byte offset in vextractuh with -mcpu=power9

Reply via email to