On Wed, Nov 14, 2012 at 06:24:22PM +0400, Yuri Rumyantsev wrote: > I looked through your patch that looks good enough although it likely > must be improved to get better vectorization for AVX-2. One general > issue is that you introduced a new pass to undo if-conversion leading > to one restriction on if-conversion that prohibited to chain the > conditions: > /* Avoid creating mask loads/stores if we'd need to chain > conditions, to make it easier to undo them. */
You can easily remove the condition from the patch for testing, the undo pass will do something, but even for single condition the current undo pass is not very good, it handles each masked load or store separately, while better would be perhaps to just attempt to undo what if-conversion did for the whole bb in that case, try to find the basic block boundaries (basically old PHI nodes all should end up being conditional moves at the start of the join bb, and masked loads/stores for the same condition before that plus all statements that feed only one side of the cond_exprs could be moved into a bb. Have you found loops where the current (or even one with the above mentioned extra condition removed) results in vectorization that actually helps runtime significantly (the testcases I've tried with AVX on SandyBridge CPU were usually a wash)? > I assume that you can do it without undo but simply creating a copy of > handled loop and restoring/deletion it in case of fail or success > (such approach is used by many compilers for software-pipelining loops > aka modulo scheduling). Is it difficult to implement in gcc framework > or you simply missed it. Also the current implementation is base on > if-conversion although predication is more preferable for it and can > allow us to vectorize more loop patterns. What is your opinion? The problem is that this saving/restoring would be across multiple passes, and doesn't need to be in the original set of basic blocks, the vectorizer often copies the original loop into another place for the unvectorized alternative. Another possibility is to do radical changes to how the vectorizer works wrt. if-conversion and pattern recognition. Instead of doing if-conversion in the IL and pattern recognition by adding stmt sequences to stmt info of individual original statements, we could perhaps on any such change duplicate the whole loop somewhere on the side (with new SSA_NAMEs in it as the current pattern recognizer has, and with some mapping from new to old SSA_NAMEsthat are used after the loop) and perform vectorization analysis on the loop on the side if it exists (but put the vectorized stmts later on into the original loop). Jakub