https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846
--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #15) > Yeah, I have a patch that does this. The question is how to query the target > if the vector sizes share the same register set. Like we wouldn't want to go > to mmx register size. > > Doing this would also allow to execute the adds for 512 to 128 bit reduction > in parallel. I wonder if we just shouldn't have a target hook that does all that (emits the best reduction sequence given original vector mode and operation), which could return NULL/false or something to be expanded by the generic code. The middle-end doesn't have information about costs of the various permutations, preferences of vector types etc.