https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115282
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Priority|P3 |P1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target|powerpc64-linux-gnu |powerpc64*-linux-gnu Status|NEW |ASSIGNED --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Ah, this is probably a case where we need to split because CSE causes us to associate operations differently so SLP build for the whole thing fails. The three-vector permute issue will go away when I manage to finish the load part of the full SLP enablement. It also fails on LE. It's the node 0x39913f0 (max_nunits=4, refcnt=2) vector(4) unsigned int op template: _14 = in[_13]; stmt 0 _14 = in[_13]; load permutation { 6 } note. We split the 8-group into 6 and two times 1 element. This needs an intermediate (interleaving) permute and indeed the load part will fix it. I suggest to leave this failing until then. The loop is still vectorized but using non-SLP full interleaving until then.