https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118062
Bug ID: 118062 Summary: [15 regression] c-c++-common/torture/vector-compare-1.c fails on arm / MVE after gcc-15-5317-gf40010c198f Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: clyon at gcc dot gnu.org Reporter: clyon at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Target: arm After commit gcc-15-5317-gf40010c198f we have noticed that vector-compare-1.c fails at execution when using the MVE vector extension on arm: FAIL: c-c++-common/torture/vector-compare-1.c -O0 execution test FAIL: c-c++-common/torture/vector-compare-1.c -O1 execution test FAIL: c-c++-common/torture/vector-compare-1.c -O2 execution test FAIL: c-c++-common/torture/vector-compare-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/torture/vector-compare-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: c-c++-common/torture/vector-compare-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: c-c++-common/torture/vector-compare-1.c -O3 -g execution test FAIL: c-c++-common/torture/vector-compare-1.c -Os execution test on GCC target arm-none-eabi configured with --disable-multilib --with-mode=thumb --with-arch=armv8.1-m.main+mve.fp+fp.dp --with-float=hard running the testsuite with -mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto The problem occurs when comparing floats or doubles. For floats for instance, the generated code looks like: (input vectors are f0=(argc, 1, 2, 10) and f1=(0, 3, 2, -23) vmov s15, r0 @ int # move argc (==1) into s15 vcvt.f32.s32 s15, s15 # convert it into floating-point vcmpe.f32 s15, #0 # compare against 0 movs r1, #0 vmrs APSR_nzcv, FPSCR push {r4, r5, lr} it gt movgt r2, #-1 # r2 = -1 (0xffffffff) if argc -gt 0 mov lr, #4 it le movle r2, r1 mov r4, #0 @ movhi lsl r2, r2, lr asr r2, r2, lr bfi r4, r2, #0, #4 # r4 = 0x0000000f vldr.64 d2, .L8 vldr.64 d3, .L8+8 # d2/d3 (=q1 register) = {1, 1, 2, 10} vmov.i32 q2, #0xffffffff @ v4si # q2 = { -1, -1, -1, -1} vmov.i32 q0, #0 @ v4si # q0 = { 0, 0, 0, 0} vmov r5, s15 vmsr p0, r4 @ movhi # p0 (predicate register) = 0x000f (only 16 bits, 1 per byte) vpush.64 {d8, d9} vmov.32 q1[0], r5 # insert argc as q1[0], so q1={argc, 1, 2, 10} vldr.64 d8, .L8+16 vldr.64 d9, .L8+24 # d8/d9 (=q4 register) = {0, 3, 2, -23} vpsel q2, q2, q0 # select q2 = p0 (q2, q0) = (-1, 0, 0, 0) = ( argc > 0 ? -1 : 0, 0, 0, 0) then a loop which compares pairs one by one: 1 > 3 ? -> 0 2 > 2 ? -> 0 10 > - 23 ? -> -1 and compares the result with the corresponding element of q2 and fails on elem #3 because q2[3] = 0 but 10 > -23, so we expect -1. In vector-compare-1.c.192t.loopdone we have: <bb 2> [local count: 215091964]: _1 = (float) argc_12(D); _2 = {_1, 1.0e+0, 2.0e+0, 1.0e+1}; f0 = _2; f1 = { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 }; _3 = _2 > { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 }; _4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; ifres = _4; and in vector-compare-1.c.196t.veclower21 we have: <bb 2> [local count: 215091964]: _1 = (float) argc_12(D); _2 = {_1, 1.0e+0, 2.0e+0, 1.0e+1}; f0 = _2; f1 = { 0.0, 3.0e+0, 2.0e+0, -2.3e+1 }; _28 = _1 > 0.0; _29 = (<unnamed-signed:4>) _28; _30 = -_29; _31 = (<signed-boolean:4>) _30; _3 = {_31}; _4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; ifres = _4; which seems to forget about comparing elements 1, 2 and 3 of f0/f1 ?