https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115432
Bug ID: 115432 Summary: Building a program with -flto generates wrong code (missing the call to a function) unless -fno-strict-aliasing Product: gcc Version: 14.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: Eric.Diaz.Fernandez at eurid dot eu Target Milestone: --- Created attachment 58403 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58403&action=edit All the .i files and the output of the build. Summary: -flto generated code is wrong: resulting binary doesn't contain the call to an initialisation function (making the program crash, obviously) Using -fno-strict-aliasing disables the bug Tried both on: gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 / Ubuntu 24.04 Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.2.0-23ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) gcc (GCC) 14.1.1 20240522 / Arch gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/14.1.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://gitea.artixlinux.org/packages/gcc/issues --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.1.1 20240522 (GCC) The bug is an initialisation function missing from the generated assembly. The bug only occurrs when doing the build of the full program using LTO and only for one of the program (yadifad). The only difference I can think of between that program and the other ones (when initialisation is concerned) is that yadifad calls the dnscore_init_ex function in two locations and in two different manners: _ It can call it using a constant bitmask indicating only a minimal amount of subsystems are to be initialised. _ It can call it using a varibable bitmask indicating either a minimal amount or all of the subsystems are to be initialised. I've tried to make a sample case but it appears any modification makes the issue disappear. So all I have is the full build. I've read that you only want the .i files and I'm attaching the ones directly used here. I can't attach them all as the compressed archive is bigger than 2MB. The archive also contains the output of the build. One of the files compilation line: gcc -DHAVE_CONFIG_H -I. -I/home/guest/src/yadifa-2.6/sbin/yadifad -I../.. -D_THREAD_SAFE -D_REENTRANT -D_FILE_OFFSET_BITS=64 -I/tmp/ltonofat/sbin/yadifad -I/tmp/ltonofat/sbin/yadifad/include -I/home/guest/src/yadifa-2.6/sbin/yadifad/include -fno-ident -ansi -pedantic -Wall -Wno-unknown-pragmas -Werror=missing-field-initializers -std=gnu11 -mtune=native -DUSES_GCC -DPREFIX='"/usr/local"' -DSYSCONFDIR='"/usr/local/etc"' -DLOCALSTATEDIR='"/usr/local/var"' -DDATAROOTDIR='"/usr/local/share"' -DDATADIR='"/usr/local/share"' -DLOCALEDIR='"/usr/local/share/locale"' -DLOGDIR='"/usr/local/var/log/yadifa"' -DTCLDIR='""' -I/home/guest/src/yadifa-2.6/lib/dnscore/include -I../../lib/dnscore/include -I/home/guest/src/yadifa-2.6/lib/dnsdb/include -I../../lib/dnsdb/include -I/home/guest/src/yadifa-2.6/lib/dnstcl/include -I../../lib/dnstcl/include -O2 -g -save-temps -flto -DNDEBUG -m64 -m64 -MT main.o -MD -MP -MF $depbase.Tpo -c -o main.o /home/guest/src/yadifa-2.6/sbin/yadifad/main.c The program that has the wrongly generated code is linked with (removed most .o files to avoid too much bloat): gcc -D_THREAD_SAFE -D_REENTRANT -D_FILE_OFFSET_BITS=64 -I/tmp/ltonofat/sbin/yadifad -I/tmp/ltonofat/sbin/yadifad/include -I/home/guest/src/yadifa-2.6/sbin/yadifad/include -fno-ident -ansi -pedantic -Wall -Wno-unknown-pragmas -Werror=missing-field-initializers -std=gnu11 -mtune=native -DUSES_GCC -DPREFIX=\"/usr/local\" -DSYSCONFDIR=\"/usr/local/etc\" -DLOCALSTATEDIR=\"/usr/local/var\" -DDATAROOTDIR=\"/usr/local/share\" -DDATADIR=\"/usr/local/share\" -DLOCALEDIR=\"/usr/local/share/locale\" -DLOGDIR=\"/usr/local/var/log/yadifa\" -DTCLDIR=\"\" -I/home/guest/src/yadifa-2.6/lib/dnscore/include -I../../lib/dnscore/include -I/home/guest/src/yadifa-2.6/lib/dnsdb/include -I../../lib/dnsdb/include -I/home/guest/src/yadifa-2.6/lib/dnstcl/include -I../../lib/dnstcl/include -O2 -g -save-temps -flto -DNDEBUG -m64 -m64 -Wl,-z -Wl,stack-size=8388608 -Wl,-Bdynamic -o yadifad main.o (other objects) -Wl,-Bdynamic -Wl,-Bdynamic /tmp/ltonofat/lib/dnsdb/.libs/libdnsdb.a /tmp/ltonofat/lib/dnscore/.libs/libdnscore.a -lssl -lcrypto -lz -lc Based on the source (that you won't download) https://www.yadifa.eu/sites/default/files/releases/yadifa-2.6.6-11255.tar.gz The broken code can be reproduced with: ./autogen.sh ./configure CFLAGS="-O2 -g -save-temps -flto -Wall -Wextra -DNDEBUG" > configure.txt 2>&1 && make > make.txt 2>&1 && ./sbin/yadifad/yadifad Then a ./sbin/yadifad/yadifad and looking for a call to mt_output_stream_init will show the broken code. e.g. objdump -D ./sbin/yadifad/yadifad|grep -B 14 -A 10 "call.*mt_output_stream_init" Adding -fno-strict-aliasing fixes the LTO build but it seems to disable a lot of optimisations : functions are neatly separated when this flag is set. ================================================================================================================== Here is a detailed explanation of the problem: The function path that leads to the problem is main -> dnscore_init_ex -> stdstream_init The code that breaks is: static void stdstream_init( # 368 "/src/yadifa-2.6/lib/dnscore/src/dnscore.c" 3 4 _Bool # 368 "/src/yadifa-2.6/lib/dnscore/src/dnscore.c" bufferise) { output_stream tmp; output_stream tmp2; if(bufferise) { fd_output_stream_attach(&tmp, 1); buffer_output_stream_init(&tmp2, &tmp, 4096); mt_output_stream_init(&__termout__, &tmp2); } else { fd_output_stream_attach(&__termout__, 1); } if(bufferise) { fd_output_stream_attach(&tmp, 2); buffer_output_stream_init(&tmp2, &tmp, 4096); mt_output_stream_init(&__termerr__, &tmp2); } else { fd_output_stream_attach(&__termerr__, 2); } } Where fd_output_stream_attach is: ya_result fd_output_stream_attach(output_stream* stream_, int fd) { ; file_output_stream* stream = (file_output_stream*)stream_; stream->data.fd = fd; stream->vtbl = &file_output_stream_vtbl; return 0; } The stdstream_init assembly should thus start by initialising a part of the stack with the values 1 and &file_output_stream_vtbl but it does not. Instead it simply seems to skip it. Which is wrongly translated to this: aaa62: 0f b6 05 0d 02 0a 00 movzbl 0xa020d(%rip),%eax # 14ac76 <dnscore_tty_set> aaa69: 4c 8d 25 e0 ec 09 00 lea 0x9ece0(%rip),%r12 # 149750 <__termout__> aaa70: 84 c0 test %al,%al aaa72: 0f 85 14 17 00 00 jne ac18c <dnscore_init_ex+0x17bc> aaa78: 4c 8d 2d c1 ec 09 00 lea 0x9ecc1(%rip),%r13 # 149740 <__termerr__> aaa7f: 48 8d 5d b0 lea -0x50(%rbp),%rbx aaa83: 4c 8d 7d a0 lea -0x60(%rbp),%r15 aaa87: ba 00 10 00 00 mov $0x1000,%edx aaa8c: 81 4d 8c 00 00 00 02 orl $0x2000000,-0x74(%rbp) aaa93: 4c 89 fe mov %r15,%rsi aaa96: 48 89 df mov %rbx,%rdi aaa99: e8 d2 e0 ff ff call a8b70 <buffer_output_stream_init> aaa9e: 48 89 de mov %rbx,%rsi aaaa1: 4c 89 e7 mov %r12,%rdi aaaa4: e8 27 54 04 00 call efed0 <mt_output_stream_init.isra.0> aaaa9: 4c 89 fe mov %r15,%rsi aaaac: 48 89 df mov %rbx,%rdi aaaaf: ba 00 10 00 00 mov $0x1000,%edx aaab4: e8 b7 e0 ff ff call a8b70 <buffer_output_stream_init> aaab9: 48 89 de mov %rbx,%rsi aaabc: 4c 89 ef mov %r13,%rdi aaabf: e8 0c 54 04 00 call efed0 <mt_output_stream_init.isra.0> The same function used in another program of the same build looks like: 49c9: 48 8d 6c 24 70 lea 0x70(%rsp),%rbp 49ce: 4c 8d 64 24 60 lea 0x60(%rsp),%r12 49d3: ba 00 10 00 00 mov $0x1000,%edx 49d8: 4c 8d 2d e1 a9 02 00 lea 0x2a9e1(%rip),%r13 # 2f3c0 <file_output_stream_vtbl.lto_priv.0> 49df: 4c 89 e6 mov %r12,%rsi 49e2: 48 89 ef mov %rbp,%rdi 49e5: c7 44 24 60 01 00 00 movl $0x1,0x60(%rsp) 49ec: 00 49ed: 4c 89 6c 24 68 mov %r13,0x68(%rsp) 49f2: e8 d9 7f 01 00 call 1c9d0 <buffer_output_stream_init.isra.0> 49f7: 48 89 ee mov %rbp,%rsi 49fa: 48 8b 3c 24 mov (%rsp),%rdi 49fe: e8 ad 5c 01 00 call 1a6b0 <mt_output_stream_init.isra.0> 4a03: ba 00 10 00 00 mov $0x1000,%edx 4a08: 4c 89 e6 mov %r12,%rsi 4a0b: 48 89 ef mov %rbp,%rdi 4a0e: c7 44 24 60 02 00 00 movl $0x2,0x60(%rsp) 4a15: 00 4a16: 4c 89 6c 24 68 mov %r13,0x68(%rsp) 4a1b: e8 b0 7f 01 00 call 1c9d0 <buffer_output_stream_init.isra.0> 4a20: 48 89 ee mov %rbp,%rsi 4a23: 48 89 df mov %rbx,%rdi 4a26: e8 85 5c 01 00 call 1a6b0 <mt_output_stream_init.isra.0> ==================================================================================================================