Bottom line: the vectorisation provided -O3 can provide big speed ups to
some scientific programs, but it is ineffective on Debian because by
necessity it tells gcc to compile code for lowest common denominator CPU
which doesn't have the necessary instructions.
Ineffective on i386, but amd64 always has at least SSE2.
You can turn on -O3 (or -ftree-vectorize if you just want the
vectorization) in a single package with DEB_CFLAGS_MAINT_APPEND and
DEB_CXXFLAGS_MAINT_APPEND :
https://wiki.debian.org/HardeningWalkthrough#My_package_builds_with_optimisation_flags_other_than_-O2.2C_e.g._-Os
. However, given previous messages, please first check that your
package actually benefits from it.
There is or was also a "hwcaps" mechanism for having multiple versions
of a binary for different CPUs, but I've never tried to use it. For
pocl (ITP #676504) the speed difference between -march=corei7-avx and
plain amd64 is about 20%; I haven't measured it on i386, and other
packages may be very different.
--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/53883c5d.6020...@bham.ac.uk