https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93771
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Richard Biener from comment #2) > Confirmed. I'm not sure if we should try to "fix" SLP here or rather > appropriately optimize > > v2df tem1 = *(v2df *)&t[0]; > v2df tem2 = *(v2df *)&t[2]; > __builtin_shuffle (tem1, tem2 (v2di) { 0, 3 }); > > which the user could write itself. forwprop does some related transforms > splitting loads in "Rewrite loads used only in BIT_FIELD_REF extractions to > component-wise loads." I was thinking about originally filing the bug that way but I decided against it; though I don't remember my reasoning besides I saw SLP not doing it for unrelated loads.