https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #7) > perm_cost is very low in x86 backend, and it maybe ok for 128-bit vectors, > pshufb/shufps are avaible for most cases. > But for 256/512-bit vectors, when the permuation is cross-lane, the cost > could be higher. One solution is increase perm_cost when vector size is more > than 128 since vperm is most likely used instead of > vblend/vpblend/vpshuf/vshuf. Furthermore, if we can get indices in the backend when calculating vec_perm cost, we can check if the permutation is cross-lane or not, and set cost more accurately for 256/512-bit vector permutation.