https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279
--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, if stmxcsr/vstmxcsr is too slow, perhaps we should change x86 sfp-machine.h
#define FP_INIT_ROUNDMODE \
do { \
__asm__ __volatile__ ("%vstmxcsr\t%0" : "=m" (_fcw)); \
} while (0)
#endif
to do that only if not round-to-nearest.
E.g. the fast-float library uses since last November:
static volatile float fmin = __FLT_MIN__;
float fmini = fmin; // we copy it so that it gets loaded at most once.
return (fmini + 1.0f == 1.0f - fmini);
as a quick test whether round-to-nearest is in effect or some other rounding
mode.
Most likely if this is done with -frounding-math it wouldn't even need the
volatile stuff. Of course, if it isn't round-to-nearest, we would need to do
the expensive {,v}stmxcsr.