https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I can see the difficulty here.  Note a strathegy for decomposition would be
to produce a monotonic permute of any two input vector permute by noting
elements needed in the final result.  If it's possible to aggregate all of them
then do that.  This transform isn't unique as elements not needed can be
select from either vector so it's likely the now present two permutes
will end up as four - two blends and two single input permutes where the
blends might not agree and are not CSEable unless we have a clever way
of encoding "don't care" and CSE of those.

But yeah, it should be still cheaper than two gathers and applicable to
a subset of all two input permutes.

Reply via email to