http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-08 12:52:32 UTC --- For _mm256_fmadd_pd and friends the only possibility is to fold the target builtins via targetm.fold_builtin to FMA_EXPR. Which is of course also possible for the simple add/muls (but getting rid of builtins is a good idea - they are quite heavy in compiler startup time).