https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118062

            Bug ID: 118062
           Summary: [15 regression]
                    c-c++-common/torture/vector-compare-1.c fails on arm /
                    MVE after gcc-15-5317-gf40010c198f
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: clyon at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: arm

After commit gcc-15-5317-gf40010c198f we have noticed that vector-compare-1.c
fails at execution when using the MVE vector extension on arm:

FAIL: c-c++-common/torture/vector-compare-1.c -O0  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O1  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O2  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -O3 -g  execution test
FAIL: c-c++-common/torture/vector-compare-1.c -Os  execution test

on GCC target arm-none-eabi configured with
--disable-multilib --with-mode=thumb --with-arch=armv8.1-m.main+mve.fp+fp.dp
--with-float=hard
running the testsuite with
-mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto

The problem occurs when comparing floats or doubles. For floats for instance,
the generated code looks like:

(input vectors are f0=(argc, 1, 2, 10) and f1=(0, 3, 2, -23)

        vmov    s15, r0 @ int            # move argc (==1) into s15
        vcvt.f32.s32    s15, s15         # convert it into floating-point
        vcmpe.f32       s15, #0          # compare against 0
        movs    r1, #0
        vmrs    APSR_nzcv, FPSCR
        push    {r4, r5, lr}
        it      gt
        movgt   r2, #-1                  # r2 = -1 (0xffffffff) if argc -gt 0
        mov     lr, #4
        it      le
        movle   r2, r1
        mov     r4, #0  @ movhi
        lsl     r2, r2, lr
        asr     r2, r2, lr
        bfi     r4, r2, #0, #4           # r4 = 0x0000000f
        vldr.64 d2, .L8
        vldr.64 d3, .L8+8                # d2/d3 (=q1 register) = {1, 1, 2, 10}
        vmov.i32        q2, #0xffffffff  @ v4si   # q2 = { -1, -1, -1, -1}
        vmov.i32        q0, #0  @ v4si            # q0 = { 0, 0, 0, 0}
        vmov    r5, s15
        vmsr    p0, r4  @ movhi           # p0 (predicate register) = 0x000f
(only 16 bits, 1 per byte)
        vpush.64        {d8, d9}
        vmov.32 q1[0], r5                 # insert argc as q1[0], so q1={argc,
1, 2, 10}
        vldr.64 d8, .L8+16
        vldr.64 d9, .L8+24                # d8/d9 (=q4 register) = {0, 3, 2,
-23}
        vpsel   q2, q2, q0                # select q2 = p0 (q2, q0) = (-1, 0,
0, 0) = ( argc > 0 ? -1 : 0, 0, 0, 0)

then a loop which compares pairs one by one:
1 > 3 ?      -> 0
2 > 2 ?      -> 0
10 > - 23 ?  -> -1

and compares the result with the corresponding element of q2
and fails on elem #3 because q2[3] = 0 but 10 > -23, so we expect -1.


In vector-compare-1.c.192t.loopdone we have:
  <bb 2> [local count: 215091964]:
  _1 = (float) argc_12(D);
  _2 = {_1, 1.0e+0, 2.0e+0, 1.0e+1};
  f0 = _2;
  f1 = { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 };
  _3 = _2 > { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 };
  _4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
  ifres = _4;


and in vector-compare-1.c.196t.veclower21 we have:
  <bb 2> [local count: 215091964]:
  _1 = (float) argc_12(D);
  _2 = {_1, 1.0e+0, 2.0e+0, 1.0e+1};
  f0 = _2;
  f1 = { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 };
  _28 = _1 > 0.0;
  _29 = (<unnamed-signed:4>) _28;
  _30 = -_29;
  _31 = (<signed-boolean:4>) _30;
  _3 = {_31};
  _4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
  ifres = _4;

which seems to forget about comparing elements 1, 2 and 3 of f0/f1 ?

Reply via email to