https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104344
Bug ID: 104344 Summary: Suboptimal -Os code for manually unrolled loop Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: charles.nicholson at gmail dot com Target Milestone: --- Manually copying the byte representation of a float to a uint32_t emits optimal code with -Os when the copy is performed in a loop. When the copy is hand-unrolled , -Os generates much larger code than -O3. It's unclear to me why -Os doesn't generate the same code; to my eye they are both equivalent and well-formed, and -O3 seems to agree. Apologies in advance if I'm simply misunderstanding the C11 Standard and what transformations are legal here. This happens at least on x64 on gcc 12.0.1, and on ARMv7-M on armgcc 10.2. Various architecture-specific flags like "-mcpu" and "-march" do not appear to make a difference. Code: ===== #include <stdint.h> _Static_assert(sizeof(uint32_t) == sizeof(float), ""); _Static_assert(sizeof(uint32_t) == 4, ""); uint32_t cast_through_char_unrolled(float f) { uint32_t u; char const *src = (char const *)&f; char *dst = (char *)&u; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; return u; } uint32_t cast_through_char_loop(float f) { uint32_t u; char const *src = (char const *)&f; char *dst = (char *)&u; for (int i = 0; i < 4; ++i) { *dst++ = *src++; } return u; } -Os output (flags: "--std=c11 -Wall -Wextra -Os") ======================================= cast_through_char_unrolled: movd eax, xmm0 xor edx, edx mov dl, al mov dh, ah xor ax, ax movzx edx, dx or eax, edx ret cast_through_char_loop: movd eax, xmm0 ret -O3 output (flags: "--std=c11 -Wall -Wextra -O3") ======================================= cast_through_char_unrolled: movd eax, xmm0 ret cast_through_char_loop: movd eax, xmm0 ret Godbolt example: ================ https://gcc.godbolt.org/z/bn9xq1b56 Gcc details follow, captured by adding "-v" to my command-line: =============================================================== Using built-in specs. COLLECT_GCC=/opt/compiler-explorer/gcc-snapshot/bin/gcc Target: x86_64-linux-gnu Configured with: ../gcc-trunk-20220202/configure --prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap --enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d --enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto --enable-plugins --enable-threads=posix --with-pkgversion=Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1 Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.1 20220202 (experimental) (Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1) COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s' '-masm=intel' '-S' '-std=c11' '-O3' '-Wall' '-Wextra' '-v' '-mtune=generic' '-march=x86-64' '-dumpdir' '/app/' /opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/12.0.1/cc1 -quiet -v -imultiarch x86_64-linux-gnu -iprefix /opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/ <source> -quiet -dumpdir /app/ -dumpbase output.c -dumpbase-ext .c -masm=intel -mtune=generic -march=x86-64 -g -O3 -Wall -Wextra -std=c11 -version -fdiagnostics-color=always -o /app/output.s GNU C11 (Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1) version 12.0.1 20220202 (experimental) (x86_64-linux-gnu) compiled by GNU C version 7.5.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 ignoring nonexistent directory "/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/include" ignoring duplicate directory "/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/include" ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu" ignoring duplicate directory "/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/include-fixed" ignoring nonexistent directory "/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/include /opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/include-fixed /usr/local/include /opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../include /usr/include/x86_64-linux-gnu /usr/include End of search list. GNU C11 (Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1) version 12.0.1 20220202 (experimental) (x86_64-linux-gnu) compiled by GNU C version 7.5.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 85eab4743b9643508f1adb2d853127bf COMPILER_PATH=/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/12.0.1/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/bin/ LIBRARY_PATH=/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../lib64/:/lib/x86_64-linux-gnu/:/lib/../lib64/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib64/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/lib/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s' '-masm=intel' '-S' '-std=c11' '-O3' '-Wall' '-Wextra' '-v' '-mtune=generic' '-march=x86-64' '-dumpdir' '/app/output.' Compiler returned: 0