[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only for 256bit v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #7 from Luke Robison --- Andrew, Perhaps you mean that setting -mcpu=neoverse-v1 overrides -msve-vector-bits=scalable argument. So I tried with `-march=armv9-a+sve -msve-vector-bits=scalable`. I still observe the same erroneous output, so I still think there is an error here.
[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only for 256bit v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #6 from Luke Robison --- Andrew, Thanks for taking a look. I actually had not realized that -msve-vector-bits=scalable is the only option guaranteed to produce correct execution on machines with other vector sizes. I need to make sure I include that in a few places, thank you. However, you and the documentation suggest that -msve-vector-bits=scalable should take precedence over the value in neoversev1.h, yet I'm still seeing the problem: gcc -Wall -Wextra -O3 -fno-strict-aliasing -mcpu=neoverse-v1 -msve-vector-bits=scalable final.c gcc:9 gives PASS: got 0x00bb 0x00aa as expected gcc:10 gives PASS: got 0x00bb 0x00aa as expected gcc:11 gives PASS: got 0x00bb 0x00aa as expected gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00
[Bug tree-optimization/118976] New: Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 Bug ID: 118976 Summary: Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors Product: gcc Version: 14.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: lrbison at amazon dot com Target Milestone: --- Created attachment 60555 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60555&action=edit Standalone Reproducer Hello Team, A customer came to me with a sha1 implementation that was producing corrupt values on Graviton4 with -O3. I isolated the problem to the generation of the trailing bytecount in big-endian which is then included in the checksum. The original code snippet is here, and several variants of it can be found online with some googling for (i = 0; i < 8; i++) { finalcount[i] = (unsigned char)((context->count[(i >= 4 ? 0 : 1)] >> ((3-(i & 3)) * 8) ) & 255); /* Endian independent */ } I've attached a stand-alone reproducer in which the problematic function is called finalcount_av. I have found that gcc 11 and previous don't vectorize and don't have the issue, while gcc 12.4 through gcc 14.2 produce corrupt results. Although trunk doesn't exhibit the problem, I believe this is because of changed optimization weights rather than because the error was fixed. It is also worth noting that the corruption only occurs in hardware with 128-bit SVE vectors. On Graviton3 with 256-bit vectors the generated machine code can exit early and not execute the problematic second half. Here is a link to Compiler Explorer with the same function https://godbolt.org/z/c99bMjene Note that the value of NCOUNT can be set to either 2 or 4, with 4 preventing the compiler from simply using the `rev` instruction on trunk. Notably though setting NCOUNT to 4 generates correct code in all versions I tested.
[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #2 from Luke Robison --- In particular I believe the error occurs because of the following sequence of instructions. Looking at line numbers form the compiler explorer output of 14.2 In the first block line 8: index z31.s, #-1, #-1 This generates a vector of {-1, -2, -3, -4, -5, -6, -7 -8} on 256-bit vector machines, and only {-1, -2, -3, -4} on 128-bits. Then for 128-bit machines, the vector is generated again for the second half on line 20, and then manipulated into negative values {-4, -5, -6, -7} index z29.s, w3, #-1 But clearly the values should be {-5, -6, -7 -8}, and hence the resulting data is shifted by one.
[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #3 from Luke Robison --- Sam, No, -fno-strict-aliasing still produces incorrect results.
[Bug target/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #4 from Luke Robison --- Apologies I forgot to include compile line and output: gcc -fno-inline -O3 -Wall -fno-strict-aliasing -mcpu=neoverse-v1 -o final final.c gcc:9 gives PASS: got 0x00bb 0x00aa as expected gcc:10 gives PASS: got 0x00bb 0x00aa as expected gcc:11 gives PASS: got 0x00bb 0x00aa as expected gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00
[Bug tree-optimization/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #12 from Luke Robison --- Tamar, I'm happy to test as many flags as you can think of, just send them my way. See below for detailed results, but I see that -fdisable-tree-cunroll does not fix the problem, and I suspect that -march=armv8.4-a+sve must cause a similar code path to -fno-vect-cost-model, since without a CPU to target, no cost model is available. In fact, with those flags, this problem affects gcc down to 8 as well. CFLAGS="-fno-inline -O3 -Wall -fno-strict-aliasing -march=armv8.4-a+sve " gcc:8 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:9 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:10 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:11 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 CFLAGS="-fno-inline -O3 -Wall -fno-strict-aliasing -march=armv8.4-a+sve -fdisable-tree-cunroll" cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:8 gives PASS: got 0x00bb 0x00aa as expected cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:9 gives PASS: got 0x00bb 0x00aa as expected cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:10 gives PASS: got 0x00bb 0x00aa as expected cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:11 gives PASS: got 0x00bb 0x00aa as expected cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:12.4 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:13.3 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00 cc1: note: disable pass tree-cunroll for functions in the range of [0, 4294967295] gcc:14.2 gives ERROR: expected 0x00bb 0x00aa but got 0x00bb 0xaa00
[Bug tree-optimization/118976] [12 Regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only for 256bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976 --- Comment #20 from Luke Robison --- Richard, Thank you for getting this merged and backported. Although I initially didn't observe this problem in gcc 11, I have since confirmed that with the right flags (-march=armv8.4-a+sve) it can be exposed as far back as gcc-8. My understanding is that versions 11 and 12 should still expect at least one more release, should those branches receive a backport as well?