https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119046
--- Comment #2 from ktkachov at gcc dot gnu.org --- (In reply to Tamar Christina from comment #1) > The late-combine pass was supposed to handle these. probably worth a look > into why it's not folding them in. Yeah you're right. It turns out that late-combine doesn't try combining the (vec_duplicate (vec_select ...)) expressions into the FMLAs. This is due to the can_move_insn check in late-combine: bool rtl_ssa::can_move_insn_p (insn_info *insn) { return (!control_flow_insn_p (insn->rtl ()) && !may_trap_p (PATTERN (insn->rtl ()))); } may_trap_p must return true for the V4SF modes involved here because compiling with -Ofast "fixes" this and I see the propagations. But in this case propagating the dups+selects into the FMA doesn't change trapping behaviour so I'd expect the combination to be done even without that flag.