http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-11 11:36:29 UTC --- So to re-cap, for vector intrinsics we can use the vector GCC extensions while for the scalar intrinsics we would have to use target builtin folding. Not sure if it's worth the asymmetry (thus instead do everything with target builtin folding). Testcase that still needs to "work" (emit addsd) with -m32 -msse2 and without -mfpmath=sse: #include <emmintrin.h> double foo (double x) { return _mm_cvtsd_f64 (_mm_add_sd (_mm_set_sd (x), _mm_set_sd (1.0))); } Exposing the intrinsics internals to the compiler also would cause us to not "literally" emitting the code the user may have asked for (though RTL optimization already might do that - but RTL for example never re-associates FP, even with -ffast-math).