https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279
--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So, if stmxcsr/vstmxcsr is too slow, perhaps we should change x86 sfp-machine.h #define FP_INIT_ROUNDMODE \ do { \ __asm__ __volatile__ ("%vstmxcsr\t%0" : "=m" (_fcw)); \ } while (0) #endif to do that only if not round-to-nearest. E.g. the fast-float library uses since last November: static volatile float fmin = __FLT_MIN__; float fmini = fmin; // we copy it so that it gets loaded at most once. return (fmini + 1.0f == 1.0f - fmini); as a quick test whether round-to-nearest is in effect or some other rounding mode. Most likely if this is done with -frounding-math it wouldn't even need the volatile stuff. Of course, if it isn't round-to-nearest, we would need to do the expensive {,v}stmxcsr.