Re: [RFC PATCH] Masked load/store vectorization

Jakub Jelinek Thu, 15 Nov 2012 02:26:03 -0800

On Wed, Nov 14, 2012 at 06:24:22PM +0400, Yuri Rumyantsev wrote:
> I looked through your patch that looks good enough although it likely
> must be improved to get better vectorization for AVX-2. One general
> issue is that you introduced a new pass to undo if-conversion leading
> to one restriction on if-conversion that prohibited to chain the
> conditions:
>  /* Avoid creating mask loads/stores if we'd need to chain
>     conditions, to make it easier to undo them.  */


You can easily remove the condition from the patch for testing, the undo
pass will do something, but even for single condition the current undo pass
is not very good, it handles each masked load or store separately, while
better would be perhaps to just attempt to undo what if-conversion did for
the whole bb in that case, try to find the basic block boundaries (basically
old PHI nodes all should end up being conditional moves at the start of the
join bb, and masked loads/stores for the same condition before that plus all
statements that feed only one side of the cond_exprs could be moved into a
bb.

Have you found loops where the current (or even one with the above mentioned
extra condition removed) results in vectorization that actually helps
runtime significantly (the testcases I've tried with AVX on SandyBridge CPU
were usually a wash)?

> I assume that you can do it without undo but simply creating a copy of
> handled loop and restoring/deletion it in case of fail or success
> (such approach is used by many compilers for software-pipelining loops
> aka modulo scheduling). Is it difficult to implement in gcc framework
> or you simply missed it. Also the current implementation is base on
> if-conversion although predication is more preferable for it and can
> allow us to vectorize more loop patterns. What is your opinion?

The problem is that this saving/restoring would be across multiple passes,
and doesn't need to be in the original set of basic blocks, the vectorizer
often copies the original loop into another place for the unvectorized
alternative.

Another possibility is to do radical changes to how the vectorizer works
wrt. if-conversion and pattern recognition.  Instead of doing if-conversion
in the IL and pattern recognition by adding stmt sequences to stmt info of
individual original statements, we could perhaps on any such change
duplicate the whole loop somewhere on the side (with new SSA_NAMEs in it
as the current pattern recognizer has, and with some mapping from new to old
SSA_NAMEsthat are used after the loop) and perform vectorization analysis
on the loop on the side if it exists (but put the vectorized stmts later on
into the original loop).

        Jakub

Re: [RFC PATCH] Masked load/store vectorization

Reply via email to