https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141

            Bug ID: 118141
           Summary: GCC miscompiles __builtin_convertvector() narrowing
                    operation on amd64 above -O1
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: richard.yao at alumni dot stonybrook.edu
  Target Milestone: ---

Here is a minimal program that was written to see what code the compiler would
generate to convert an AVX2 ymm register containing single precision floating
point numbers into a xmm register containing bfloat16 floating point numbers,
under the assumption that no subnormal numbers were passed:

https://godbolt.org/z/xhvc557xv

GCC trunk gives the following output:

bfloat16 value 0: 0x0000
bfloat16 value 1: 0x0000
bfloat16 value 2: 0x0000
bfloat16 value 3: 0x0000
bfloat16 value 4: 0x0000
bfloat16 value 5: 0x0000
bfloat16 value 6: 0x0000
bfloat16 value 7: 0x0000

GCC 14.2 gives the following output:

bfloat16 value 0: 0x0000
bfloat16 value 1: 0x0000
bfloat16 value 2: 0x178b
bfloat16 value 3: 0x0000
bfloat16 value 4: 0xc02f
bfloat16 value 5: 0x0000
bfloat16 value 6: 0x0000
bfloat16 value 7: 0x0000

Both are wrong. Clang gives the following output, which is correct:

bfloat16 value 0: 0x3f80
bfloat16 value 1: 0x4000
bfloat16 value 2: 0x4040
bfloat16 value 3: 0x4080
bfloat16 value 4: 0x40a0
bfloat16 value 5: 0x40c0
bfloat16 value 6: 0x40f9
bfloat16 value 7: 0x4100

https://godbolt.org/z/769W8Pzxx

Interestingly, if -O1 is used, GCC does not miscompile it. I assume this is a
middle end optimization issue since miscompilation appears to also occur on
arm64.

Reply via email to