https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
--- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- More common case is typedef int v8si __attribute__((vector_size(32))); v8si foo1 (v8si a, v8si b) { v8si c = __builtin_shufflevector (a, b, 0, 1, 2, 11, 4, 5, 6, 15); v8si d = __builtin_shufflevector (b, a, 0, 1, 2, 11, 4, 5, 6, 15); return c * d; } Redudant vector permutation is not optimized off.