https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104344

            Bug ID: 104344
           Summary: Suboptimal -Os code for manually unrolled loop
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: charles.nicholson at gmail dot com
  Target Milestone: ---

Manually copying the byte representation of a float to a uint32_t emits optimal
code with -Os when the copy is performed in a loop. When the copy is
hand-unrolled , -Os generates much larger code than -O3.

It's unclear to me why -Os doesn't generate the same code; to my eye they are
both equivalent and well-formed, and -O3 seems to agree. Apologies in advance
if I'm simply misunderstanding the C11 Standard and what transformations are
legal here.

This happens at least on x64 on gcc 12.0.1, and on ARMv7-M on armgcc 10.2.
Various architecture-specific flags like "-mcpu" and "-march" do not appear to
make a difference.

Code:
=====
#include <stdint.h>

_Static_assert(sizeof(uint32_t) == sizeof(float), "");
_Static_assert(sizeof(uint32_t) == 4, "");

uint32_t cast_through_char_unrolled(float f) {
  uint32_t u;
  char const *src = (char const *)&f;
  char *dst = (char *)&u;
  *dst++ = *src++;
  *dst++ = *src++;
  *dst++ = *src++;
  *dst++ = *src++;
  return u;
}

uint32_t cast_through_char_loop(float f) {
  uint32_t u;
  char const *src = (char const *)&f;
  char *dst = (char *)&u;
  for (int i = 0; i < 4; ++i) {
    *dst++ = *src++;
  }
  return u;
}

-Os output (flags: "--std=c11 -Wall -Wextra -Os")
=======================================
cast_through_char_unrolled:
  movd eax, xmm0
  xor edx, edx
  mov dl, al
  mov dh, ah
  xor ax, ax
  movzx edx, dx
  or eax, edx
  ret

cast_through_char_loop:
  movd eax, xmm0
  ret


-O3 output (flags: "--std=c11 -Wall -Wextra -O3")
=======================================
cast_through_char_unrolled:
  movd eax, xmm0
  ret

cast_through_char_loop:
  movd eax, xmm0
  ret

Godbolt example:
================
https://gcc.godbolt.org/z/bn9xq1b56

Gcc details follow, captured by adding "-v" to my command-line:
===============================================================
Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/gcc-snapshot/bin/gcc
Target: x86_64-linux-gnu
Configured with: ../gcc-trunk-20220202/configure
--prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.1 20220202 (experimental)
(Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1)
 
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
'-masm=intel' '-S' '-std=c11' '-O3' '-Wall' '-Wextra' '-v' '-mtune=generic'
'-march=x86-64' '-dumpdir' '/app/'

/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/12.0.1/cc1
-quiet -v -imultiarch x86_64-linux-gnu -iprefix
/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/
<source> -quiet -dumpdir /app/ -dumpbase output.c -dumpbase-ext .c -masm=intel
-mtune=generic -march=x86-64 -g -O3 -Wall -Wextra -std=c11 -version
-fdiagnostics-color=always -o /app/output.s
GNU C11
(Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1)
version 12.0.1 20220202 (experimental) (x86_64-linux-gnu)
        compiled by GNU C version 7.5.0, GMP version 6.2.1, MPFR version 4.1.0,
MPC version 1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/include"
ignoring duplicate directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/include"
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring duplicate directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/include-fixed"
ignoring nonexistent directory
"/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:

/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/include

/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/include-fixed
 /usr/local/include
 /opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/../../include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C11
(Compiler-Explorer-Build-gcc-756eabacfcd767e39eea63257a026f61a4c4e661-binutils-2.36.1)
version 12.0.1 20220202 (experimental) (x86_64-linux-gnu)
        compiled by GNU C version 7.5.0, GMP version 6.2.1, MPFR version 4.1.0,
MPC version 1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 85eab4743b9643508f1adb2d853127bf
COMPILER_PATH=/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/12.0.1/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/x86_64-linux-gnu/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../libexec/gcc/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/bin/
LIBRARY_PATH=/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../lib64/:/lib/x86_64-linux-gnu/:/lib/../lib64/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib64/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../../x86_64-linux-gnu/lib/:/opt/compiler-explorer/gcc-trunk-20220202/bin/../lib/gcc/x86_64-linux-gnu/12.0.1/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
'-masm=intel' '-S' '-std=c11' '-O3' '-Wall' '-Wextra' '-v' '-mtune=generic'
'-march=x86-64' '-dumpdir' '/app/output.'
Compiler returned: 0

Reply via email to