https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102404

            Bug ID: 102404
           Summary: Loop vectorized with 32 byte vectors actually uses 16
                    byte vectors
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: freddie at witherden dot org
  Target Milestone: ---

Created attachment 51480
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51480&action=edit
Test case

Consider the loop on L11 of the attached file.  Compiling as:

❯ gcc -march=tigerlake -Ofast -mprefer-vector-width=512 -S -fopenmp test.c
-fopt-info
test.c:25:37: optimized: loop vectorized using 32 byte vectors
test.c:4:6: optimized: loop turned into non-loop; it never loops

which notes that (as requested) the loop has been vectorized using 32-byte
(zmm) vectors.  Inspecting the resulting assembly (also attached) we observe
that has actually ben unrolled by a factor of two and then vectorized using
16-byte (ymm) vectors.

As a point of comparison recent versions of Clang use 32-byte vectors for this
loop, resulting in code which is half the size.

Reply via email to