[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only for 256bit v

2025-02-21 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #7 from Luke Robison  ---
Andrew,

Perhaps you mean that setting -mcpu=neoverse-v1 overrides
-msve-vector-bits=scalable argument.  So I tried with `-march=armv9-a+sve
-msve-vector-bits=scalable`.  I still observe the same erroneous output, so I
still think there is an error here.

[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only for 256bit v

2025-02-21 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #6 from Luke Robison  ---
Andrew,

Thanks for taking a look.  I actually had not realized that
-msve-vector-bits=scalable is the only option guaranteed to produce correct
execution on machines with other vector sizes.  I need to make sure I include
that in a few places, thank you.

However, you and the documentation suggest that -msve-vector-bits=scalable
should take precedence over the value in neoversev1.h, yet I'm still seeing the
problem:


gcc -Wall -Wextra -O3 -fno-strict-aliasing -mcpu=neoverse-v1
-msve-vector-bits=scalable final.c

gcc:9 gives PASS: got 0x00bb 0x00aa as expected
gcc:10 gives PASS: got 0x00bb 0x00aa as expected
gcc:11 gives PASS: got 0x00bb 0x00aa as expected
gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00

[Bug tree-optimization/118976] New: Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors

2025-02-21 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

Bug ID: 118976
   Summary: Correctness Issue: SVE vectorization results in data
corruption when cpu has 128bit vectors
   Product: gcc
   Version: 14.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lrbison at amazon dot com
  Target Milestone: ---

Created attachment 60555
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60555&action=edit
Standalone Reproducer

Hello Team,

A customer came to me with a sha1 implementation that was producing corrupt
values on Graviton4 with -O3.

I isolated the problem to the generation of the trailing bytecount in
big-endian which is then included in the checksum.  The original code snippet
is here, and several variants of it can be found online with some googling

for (i = 0; i < 8; i++) {
finalcount[i] = (unsigned char)((context->count[(i >= 4 ? 0 : 1)]
 >> ((3-(i & 3)) * 8) ) & 255);  /* Endian independent */
}


I've attached a stand-alone reproducer in which the problematic function is
called finalcount_av.  I have found that gcc 11 and previous don't vectorize
and don't have the issue, while gcc 12.4 through gcc 14.2 produce corrupt
results.  Although trunk doesn't exhibit the problem, I believe this is because
of changed optimization weights rather than because the error was fixed.

It is also worth noting that the corruption only occurs in hardware with
128-bit SVE vectors.  On Graviton3 with 256-bit vectors the generated machine
code can exit early and not execute the problematic second half.

Here is a link to Compiler Explorer with the same function
https://godbolt.org/z/c99bMjene

Note that the value of NCOUNT can be set to either 2 or 4, with 4 preventing
the compiler from simply using the `rev` instruction on trunk.  Notably though
setting NCOUNT to 4 generates correct code in all versions I tested.

[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors

2025-02-21 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #2 from Luke Robison  ---
In particular I believe the error occurs because of the following sequence of
instructions.  Looking at line numbers form the compiler explorer output of
14.2

In the first block line 8:

index   z31.s, #-1, #-1

This generates a vector of {-1, -2, -3, -4, -5, -6, -7 -8} on 256-bit vector
machines, and only {-1, -2, -3, -4} on 128-bits.

Then for 128-bit machines, the vector is generated again for the second half on
line 20, and then manipulated into negative values {-4, -5, -6, -7}

index   z29.s, w3, #-1

But clearly the values should be {-5, -6, -7 -8}, and hence the resulting data
is shifted by one.

[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors

2025-02-21 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #3 from Luke Robison  ---
Sam,

No, -fno-strict-aliasing still produces incorrect results.

[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors

2025-02-21 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #4 from Luke Robison  ---
Apologies I forgot to include compile line and output:

gcc -fno-inline -O3 -Wall -fno-strict-aliasing  -mcpu=neoverse-v1 -o final
final.c

gcc:9 gives PASS: got 0x00bb 0x00aa as expected
gcc:10 gives PASS: got 0x00bb 0x00aa as expected
gcc:11 gives PASS: got 0x00bb 0x00aa as expected
gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00

[Bug tree-optimization/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only f

2025-02-24 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #12 from Luke Robison  ---
Tamar,

I'm happy to test as many flags as you can think of, just send them my way.

See below for detailed results, but I see that -fdisable-tree-cunroll does not
fix the problem, and I suspect that -march=armv8.4-a+sve must cause a similar
code path to -fno-vect-cost-model, since without a CPU to target, no cost model
is available.

In fact, with those flags, this problem affects gcc down to 8 as well.

CFLAGS="-fno-inline -O3 -Wall -fno-strict-aliasing  -march=armv8.4-a+sve "

gcc:8 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00
gcc:9 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00
gcc:10 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:11 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00



CFLAGS="-fno-inline -O3 -Wall -fno-strict-aliasing  -march=armv8.4-a+sve
-fdisable-tree-cunroll"
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:8 gives PASS: got 0x00bb 0x00aa as expected
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:9 gives PASS: got 0x00bb 0x00aa as expected
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:10 gives PASS: got 0x00bb 0x00aa as expected
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:11 gives PASS: got 0x00bb 0x00aa as expected
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00
cc1: note: disable pass tree-cunroll for functions in the range of [0,
4294967295]
gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb
0xaa00

[Bug tree-optimization/118976] [12 Regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only for 256bit

2025-03-06 Thread lrbison at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

--- Comment #20 from Luke Robison  ---
Richard,

Thank you for getting this merged and backported.  Although I initially didn't
observe this problem in gcc 11, I have since confirmed that with the right
flags  (-march=armv8.4-a+sve) it can be exposed as far back as gcc-8.  My
understanding is that versions 11 and 12 should still expect at least one more
release, should those branches receive a backport as well?