https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I'm not aware of vcompressb insn, only vcompressps and vcompresspd. Sure, one could just emit whatever we emit for __builtin_shuffle with (__v64qi) { 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56 } or similar perm, the question is if it will be faster that way or not.