https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104344
Bug ID: 104344
Summary: Suboptimal -Os code for manually unrolled loop
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: charles.nicholson at gmail dot com
Target Milestone: ---
Manually copying the byte representation of a float to a uint32_t emits optimal
code with -Os when the copy is performed in a loop. When the copy is
hand-unrolled , -Os generates much larger code than -O3.
It's unclear to me why -Os doesn't generate the same code; to my eye they are
both equivalent and well-formed, and -O3 seems to agree. Apologies in advance
if I'm simply misunderstanding the C11 Standard and what transformations are
legal here.
This happens at least on x64 on gcc 12.0.1, and on ARMv7-M on armgcc 10.2.
Various architecture-specific flags like "-mcpu" and "-march" do not appear to
make a difference.
Code:
=
#include
_Static_assert(sizeof(uint32_t) == sizeof(float), "");
_Static_assert(sizeof(uint32_t) == 4, "");
uint32_t cast_through_char_unrolled(float f) {
uint32_t u;
char const *src = (char const *)&f;
char *dst = (char *)&u;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
return u;
}
uint32_t cast_through_char_loop(float f) {
uint32_t u;
char const *src = (char const *)&f;
char *dst = (char *)&u;
for (int i = 0; i < 4; ++i) {
*dst++ = *src++;
}
return u;
}
-Os output (flags: "--std=c11 -Wall -Wextra -Os")
===
cast_through_char_unrolled:
movd eax, xmm0
xor edx, edx
mov dl, al
mov dh, ah
xor ax, ax
movzx edx, dx
or eax, edx
ret
cast_through_char_loop:
movd eax, xmm0
ret
-O3 output (flags: "--std=c11 -Wall -Wextra -O3")
===
cast_through_char_unrolled:
movd eax, xmm0
ret
cast_through_char_loop:
movd eax, xmm0
ret
Godbolt example:
https://gcc.godbolt.org/z/bn9xq1b56
Gcc details follow, captured by adding "-v" to my command-line:
===
Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/gcc-snapshot/bin/gcc
Target: x86_64-linux-gnu
Configured with: ../gcc-trunk-20220202/configure
--prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.1 20220202 (experimental)
(Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1)
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
'-masm=intel' '-S' '-std=c11' '-O3' '-Wall' '-Wextra' '-v' '-mtune=generic'
'-march=x86-64' '-dumpdir' '/app/'
/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/12.0.1/cc1
-quiet -v -imultiarch x86_64-linux-gnu -iprefix
/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/
-quiet -dumpdir /app/ -dumpbase output.c -dumpbase-ext .c -masm=intel
-mtune=generic -march=x86-64 -g -O3 -Wall -Wextra -std=c11 -version
-fdiagnostics-color=always -o /app/output.s
GNU C11
(Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1)
version 12.0.1 20220202 (experimental) (x86_64-linux-gnu)
compiled by GNU C version 7.5.0, GMP version 6.2.1, MPFR version 4.1.0,
MPC version 1.2.1, isl version isl-0.24-GMP
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/include"
ignoring duplicate directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/include"
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring duplicate directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/include-fixed"
ignoring nonexistent directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/include
/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-li