https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
Bug ID: 102054 Summary: slightly worse code as PRE on some code got disabled for loop vectorization Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- This is a test case reduced from SPEC2017 bmk 541.leela_r source FastBoard.cpp, when I was investigating the O2 vectorization degradation on SPEC2017 run. It's an issue similar to PR100794, but which is only applied at O2 and fixed by re-running pcom at O2. This one is applied for O3 vectorization as well. TEST CASE: class FastBoard { public: static const int NBR_SHIFT = 4; static const int MAXBOARDSIZE = 19; static const int MAXSQ = ((MAXBOARDSIZE + 2) * (MAXBOARDSIZE + 2)); enum square_t { BLACK = 0, WHITE = 1, EMPTY = 2, INVAL = 3 }; bool self_atari(int color, int vertex); protected: int m_dirs[4]; square_t m_square[MAXSQ]; int nbr_libs[20]; }; bool FastBoard::self_atari(int color, int vertex) { int nbr_libs_cnt = 0; nbr_libs[nbr_libs_cnt++] = vertex; for (int k = 0; k < 20; k++) { int ai = vertex + m_dirs[k]; if (m_square[ai] == FastBoard::EMPTY) { bool found = false; for (int i = 0; i < nbr_libs_cnt; i++) { if (nbr_libs[i] == ai) { found = true; break; } } if (!found) { if (nbr_libs_cnt > 1) return false; nbr_libs[nbr_libs_cnt++] = ai; } } } return true; } Options: -mcpu=power9 -Ofast (or -O2 -ftree-vectorize) etc. With -fno-tree-loop-vectorize, it passes down the vertex_11 for nbr_libs[0]. <bb 3> [local count: 1014686026]: # prephitmp_26 = PHI <pretmp_28(5), vertex_11(D)(10)> # ivtmp.17_27 = PHI <ivtmp.17_3(5), ivtmp.17_8(10)> if (ai_15 == prephitmp_26) goto <bb 8>; [5.50%] else goto <bb 4>; [94.50%] <bb 4> [local count: 958878295]: if (ivtmp.17_27 != _31) goto <bb 5>; [93.84%] else goto <bb 11>; [6.16%] <bb 5> [local count: 899822494]: ivtmp.17_3 = ivtmp.17_27 + 4; _21 = (void *) ivtmp.17_3; pretmp_28 = MEM[(int *)_21]; goto <bb 3>; [100.00%] Without -fno-tree-loop-vectorize, it has the below IRs instead, always do the load before ai comparison. <bb 4> [local count: 1014686026]: # ivtmp.12_27 = PHI <ivtmp.12_28(5), ivtmp.12_26(3)> ivtmp.12_28 = ivtmp.12_27 + 4; _22 = (void *) ivtmp.12_28; _3 = MEM[(int *)_22]; if (_3 == ai_15) goto <bb 8>; [5.50%] else goto <bb 5>; [94.50%] <bb 5> [local count: 958878295]: if (ivtmp.12_28 != _30) goto <bb 4>; [93.84%] else goto <bb 10>; [6.16%]