https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976
Bug ID: 118976 Summary: Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors Product: gcc Version: 14.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: lrbison at amazon dot com Target Milestone: --- Created attachment 60555 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60555&action=edit Standalone Reproducer Hello Team, A customer came to me with a sha1 implementation that was producing corrupt values on Graviton4 with -O3. I isolated the problem to the generation of the trailing bytecount in big-endian which is then included in the checksum. The original code snippet is here, and several variants of it can be found online with some googling for (i = 0; i < 8; i++) { finalcount[i] = (unsigned char)((context->count[(i >= 4 ? 0 : 1)] >> ((3-(i & 3)) * 8) ) & 255); /* Endian independent */ } I've attached a stand-alone reproducer in which the problematic function is called finalcount_av. I have found that gcc 11 and previous don't vectorize and don't have the issue, while gcc 12.4 through gcc 14.2 produce corrupt results. Although trunk doesn't exhibit the problem, I believe this is because of changed optimization weights rather than because the error was fixed. It is also worth noting that the corruption only occurs in hardware with 128-bit SVE vectors. On Graviton3 with 256-bit vectors the generated machine code can exit early and not execute the problematic second half. Here is a link to Compiler Explorer with the same function https://godbolt.org/z/c99bMjene Note that the value of NCOUNT can be set to either 2 or 4, with 4 preventing the compiler from simply using the `rev` instruction on trunk. Notably though setting NCOUNT to 4 generates correct code in all versions I tested.