https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Last reconfirmed| |2020-08-05 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Blocks| |53947 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Yes, this is a known limitation in that for basic-block SLP we do not perform if-conversion. Instead the basic-block SLP code sees <bb 2> [local count: 1073741824]: _1 = *pd_17(D); _2 = *pc_19(D); _3 = *pb_20(D); _4 = *pa_21(D); if (_3 < _4) goto <bb 11>; [50.00%] else goto <bb 3>; [50.00%] <bb 11> [local count: 536870912]: goto <bb 4>; [100.00%] <bb 3> [local count: 536870913]: <bb 4> [local count: 1073741824]: # iftmp.20_23 = PHI <_1(3), _2(11)> *dst_22(D) = iftmp.20_23; _5 = MEM[(const unsigned int *)pd_17(D) + 4B]; _6 = MEM[(const unsigned int *)pc_19(D) + 4B]; _7 = MEM[(const unsigned int *)pb_20(D) + 4B]; _8 = MEM[(const unsigned int *)pa_21(D) + 4B]; ... which also rips apart the memory groups (we're slowly relaxing another limitation that the basic-block SLP code operates on a single basic-block at a time but for data refs this restriction will prevail). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations