https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990
Bug ID: 71990 Summary: Function multiversioning prohibits inlining Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, I'm trying to write a library that uses F16C instructions in certain places, and since they're not really universally accessible (and ld.so hardware capabilities seem to have been long abandoned), I've tried to use function multiversioning for it. However, trying to combine it with inlining seems to draw a blank; a very simplified example: klump:~> /usr/lib/gcc-snapshot/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7.0.0/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 20160707-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,java,go,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --enable-shared --enable-linker-build-id --disable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-7-snap-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-7-snap-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-7-snap-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --disable-werror --enable-checking=yes --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.0.0 20160707 (experimental) [trunk revision 238117] (Debian 20160707-1) klump:~> cat test.cc #include <stdio.h> __attribute__ ((target("default"))) inline int foo() { return 0; } __attribute__ ((target("avx"))) inline int foo() { return 1; } int bar() { int sum = 0; for (int i = 0; i < 100; ++i) { sum += foo(); } return sum; } int main(void) { printf("%d\n", bar()); } klump:~> /usr/lib/gcc-snapshot/bin/g++ -O2 -o test test.cc klump:~> nm --demangle test | egrep 'foo|bar' 0000000000400c40 i _Z3foov.ifunc() 0000000000400bf0 T bar() 0000000000400c20 W foo() 0000000000400c30 W foo() [clone .avx] 0000000000400c40 W foo() [clone .resolver] Of course, in reality, my foo() would do something more complicated, like call _cvtss_sh() or similar; this is a toy example. But it illustrates that the function multiversioning blocks inlining. If I compile with -mavx, the entire multiversioning goes away (only the AVX version is emitted), so I hoped that I could use target cloning on bar(): __attribute__ ((target_clones("avx", "default"))) int bar() { // same code... but unfortunately, no. There's a bar() clone for AVX emitted, but it still calls the resolving function for foo(); no inlining. So I really can't find any usable way of using this feature if your architecture switch is in inlined functions (in my case, convert to/from fp16).