https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102056
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Actually VEC_PERM_EXPR of the same size is not optimized either: vector char g1(vector char a) { vector char t= __builtin_shuffle(a,(vector char){7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8,}); vector long long t1 = (vector long long)t; return __builtin_shuffle(t, (vector char){8,9,10,11,12,13,14,15,0,1,2,3,4,5,6,7}); return (vector char)t1; }