Hi Guillem, Thanks for your helpful pointers.
On Sat, Apr 06, 2019 at 10:55:35PM +0200, Guillem Jover wrote: > If what you are interested in though is just a small subset of the > archive, another option that would benefit everyone and is perhaps > less cumbersome than having to jugle around with multiple archives > and package rebuilds/variants, is to make use of libc's hwcaps [H] > support, which means the dynamic linker will automatically load the > best optimized shared object for the current hardware. This of course > can complicate a bit the packaging, and bloat it, but if the performance > improvement is substantial, it might be a very good trade-off. > [H] man ld.so "NOTES" / "Hardware capabilities" This sounds like a nice feature. However, unfortunately, the "avx2" and "avx512" features I wanted didn't show up in the list... IIRC in my original post I presented a C++ example with Eigen (a header-only library). Reverse deps such as TensorFlow would benefit from this HWCAPS feature if ld.so supported amd64's avx2 and avx512. > Another option which requires upstream code changes (and ideally them > being complicit) is to add run-time selection for the more suitable > optimized functions, for example via the __target__ and __ifunc__ [I] > function __attribute__ (and __builtin_cpu_supports or __builtin_cpu_is), > or the __target_clone__ function __attribute__. Perhaps also of > interest is the __simd__ function __attribute__. > > [I] info gcc "Function Attributes"; > <https://sourceware.org/glibc/wiki/GNU_IFUNC> This compiler feature (which has been considered in the past) is a quite good solution for small projects. However this is not easy to enforce for projects like TensorFlow ...