https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

            Bug ID: 118976
           Summary: Correctness Issue: SVE vectorization results in data
                    corruption when cpu has 128bit vectors
           Product: gcc
           Version: 14.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lrbison at amazon dot com
  Target Milestone: ---

Created attachment 60555
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60555&action=edit
Standalone Reproducer

Hello Team,

A customer came to me with a sha1 implementation that was producing corrupt
values on Graviton4 with -O3.

I isolated the problem to the generation of the trailing bytecount in
big-endian which is then included in the checksum.  The original code snippet
is here, and several variants of it can be found online with some googling

    for (i = 0; i < 8; i++) {
        finalcount[i] = (unsigned char)((context->count[(i >= 4 ? 0 : 1)]
         >> ((3-(i & 3)) * 8) ) & 255);  /* Endian independent */
    }


I've attached a stand-alone reproducer in which the problematic function is
called finalcount_av.  I have found that gcc 11 and previous don't vectorize
and don't have the issue, while gcc 12.4 through gcc 14.2 produce corrupt
results.  Although trunk doesn't exhibit the problem, I believe this is because
of changed optimization weights rather than because the error was fixed.

It is also worth noting that the corruption only occurs in hardware with
128-bit SVE vectors.  On Graviton3 with 256-bit vectors the generated machine
code can exit early and not execute the problematic second half.

Here is a link to Compiler Explorer with the same function
https://godbolt.org/z/c99bMjene

Note that the value of NCOUNT can be set to either 2 or 4, with 4 preventing
the compiler from simply using the `rev` instruction on trunk.  Notably though
setting NCOUNT to 4 generates correct code in all versions I tested.

Reply via email to