https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918
--- Comment #17 from Cory Fields <lists at coryfields dot com> --- (In reply to Jakub Jelinek from comment #16) > Optimal shuffle is next to impossible on architectures like x86 where you > have dozens of different permutation instructions and often you need not > just one, but 2, 3, 4 or 5 of them depending on exact ISA and permutation. > GCC has over 20k lines of source for choosing reasonable constant > permutations just on this architecture. > This PR is not about __builtin_shuffle emitting bad code, but about the > vector lshift + rshift ored not even trying to emit it as permutation and > comparing that to what one gets from those 3 operations if there is no > native rotate. > Though, sure, one could also derive from it that perhaps some constant > permutations would be in some cases best emitted as 2 shifts + or, guess we > don't try that among 3 insn cases yet. Thanks for the help, I certainly didn't mean to trivialize the work involved. I have a much better understanding of what's involved now. I'll have a look at the existing permutations and see if there's room for improvement on avx/avx2 in these specific cases.
