https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
--- Comment #13 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- (In reply to Richard Biener from comment #12) > Btw, I see we actually materialize a permute before the splat: > > t.c:14:24: note: node 0x5b311c0 (max_nunits=1, refcnt=2) vector(2) double > t.c:14:24: note: op: VEC_PERM_EXPR > t.c:14:24: note: stmt 0 _1 = *k_50; > t.c:14:24: note: stmt 1 _1 = *k_50; > t.c:14:24: note: stmt 2 _1 = *k_50; > t.c:14:24: note: stmt 3 _1 = *k_50; > t.c:14:24: note: lane permutation { 0[3] 0[2] 0[1] 0[0] } > t.c:14:24: note: children 0x5b30fc0 > t.c:14:24: note: node 0x5b30fc0 (max_nunits=2, refcnt=1) vector(2) double > t.c:14:24: note: op template: _1 = *k_50; > t.c:14:24: note: stmt 0 _1 = *k_50; > t.c:14:24: note: stmt 1 _1 = *k_50; > t.c:14:24: note: stmt 2 _1 = *k_50; > t.c:14:24: note: stmt 3 _1 = *k_50; > t.c:14:24: note: load permutation { 0 0 0 0 } > > this is because vect_optimize_slp_pass::get_result_with_layout doesn't > seem to consider load permutations? Yeah. That's because, if to_layout_i != from_layout_i, the caller is asking for a different layout from the one that we picked for the load. If we wanted to change the load permutation in-place for the given to_layout_i, we'd need to duplicate the load too, which didn't seem like a good trade-off.