https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141
Bug ID: 118141 Summary: GCC miscompiles __builtin_convertvector() narrowing operation on amd64 above -O1 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: richard.yao at alumni dot stonybrook.edu Target Milestone: --- Here is a minimal program that was written to see what code the compiler would generate to convert an AVX2 ymm register containing single precision floating point numbers into a xmm register containing bfloat16 floating point numbers, under the assumption that no subnormal numbers were passed: https://godbolt.org/z/xhvc557xv GCC trunk gives the following output: bfloat16 value 0: 0x0000 bfloat16 value 1: 0x0000 bfloat16 value 2: 0x0000 bfloat16 value 3: 0x0000 bfloat16 value 4: 0x0000 bfloat16 value 5: 0x0000 bfloat16 value 6: 0x0000 bfloat16 value 7: 0x0000 GCC 14.2 gives the following output: bfloat16 value 0: 0x0000 bfloat16 value 1: 0x0000 bfloat16 value 2: 0x178b bfloat16 value 3: 0x0000 bfloat16 value 4: 0xc02f bfloat16 value 5: 0x0000 bfloat16 value 6: 0x0000 bfloat16 value 7: 0x0000 Both are wrong. Clang gives the following output, which is correct: bfloat16 value 0: 0x3f80 bfloat16 value 1: 0x4000 bfloat16 value 2: 0x4040 bfloat16 value 3: 0x4080 bfloat16 value 4: 0x40a0 bfloat16 value 5: 0x40c0 bfloat16 value 6: 0x40f9 bfloat16 value 7: 0x4100 https://godbolt.org/z/769W8Pzxx Interestingly, if -O1 is used, GCC does not miscompile it. I assume this is a middle end optimization issue since miscompilation appears to also occur on arm64.