[Bug target/98167] [x86] Failure to optimize operation on indentically shuffled operands into a shuffle of the result of the operation

crazylht at gmail dot com via Gcc-bugs Thu, 14 Jan 2021 02:45:43 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167


--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
A patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561909.html

And record jakub comments in another thread

On Tue, Jan 12, 2021 at 11:47:48AM +0100, Jakub Jelinek via Gcc-patches wrote:
> > OTOH, perhaps some of the new testcases can be handled in x86
> > target_fold_builtin? In the long term, maybe target_fold_shuffle can
> > be introduced to map __builtin_shufle to various target builtins, so
> > the builtin can be processed further in target_fold_builtin. As
> > pointed out below, vector insn patterns can be quite complex, and push
> > RTL combiners to their limits, so perhaps they can be more efficiently
> > handled by tree passes.
>
> My primary motivation was to generate good code from __builtin_shuffle here
> and trying to find the best permutation and map it back from insns to
> builtins would be a nightmare.
> I'll see how many targets I need to modify to try the no middle-end
> force_reg for CONST0_RTX case...

For the folding, I think best would be to change _mm_unpacklo_epi8
and all the similar intrinsics for hardcoded specific permutations
from using a builtin to just using __builtin_shuffle (together with
verification that we emit as good or better code from it for each case of
course), and keep __builtin_shuffle -> VEC_PERM_EXPR as the canonical
form (with which the GIMPLE code can do any optimizations it wants).

[Bug target/98167] [x86] Failure to optimize operation on indentically shuffled operands into a shuffle of the result of the operation

Reply via email to