https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> --- A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561909.html And record jakub comments in another thread On Tue, Jan 12, 2021 at 11:47:48AM +0100, Jakub Jelinek via Gcc-patches wrote: > > OTOH, perhaps some of the new testcases can be handled in x86 > > target_fold_builtin? In the long term, maybe target_fold_shuffle can > > be introduced to map __builtin_shufle to various target builtins, so > > the builtin can be processed further in target_fold_builtin. As > > pointed out below, vector insn patterns can be quite complex, and push > > RTL combiners to their limits, so perhaps they can be more efficiently > > handled by tree passes. > > My primary motivation was to generate good code from __builtin_shuffle here > and trying to find the best permutation and map it back from insns to > builtins would be a nightmare. > I'll see how many targets I need to modify to try the no middle-end > force_reg for CONST0_RTX case... For the folding, I think best would be to change _mm_unpacklo_epi8 and all the similar intrinsics for hardcoded specific permutations from using a builtin to just using __builtin_shuffle (together with verification that we emit as good or better code from it for each case of course), and keep __builtin_shuffle -> VEC_PERM_EXPR as the canonical form (with which the GIMPLE code can do any optimizations it wants).