https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- I can see the difficulty here. Note a strathegy for decomposition would be to produce a monotonic permute of any two input vector permute by noting elements needed in the final result. If it's possible to aggregate all of them then do that. This transform isn't unique as elements not needed can be select from either vector so it's likely the now present two permutes will end up as four - two blends and two single input permutes where the blends might not agree and are not CSEable unless we have a clever way of encoding "don't care" and CSE of those. But yeah, it should be still cheaper than two gathers and applicable to a subset of all two input permutes.