https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119046

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Tamar Christina from comment #1)
> The late-combine pass was supposed to handle these. probably worth a look
> into why it's not folding them in.

Yeah you're right. It turns out that late-combine doesn't try combining the
(vec_duplicate (vec_select ...)) expressions into the FMLAs.

This is due to the can_move_insn check in late-combine:
bool
rtl_ssa::can_move_insn_p (insn_info *insn)
{
  return (!control_flow_insn_p (insn->rtl ())
          && !may_trap_p (PATTERN (insn->rtl ())));
}

may_trap_p must return true for the V4SF modes involved here because compiling
with -Ofast "fixes" this and I see the propagations.
But in this case propagating the dups+selects into the FMA doesn't change
trapping behaviour so I'd expect the combination to be done even without that
flag.

Reply via email to