Masked vector deficiencies

Andrew Stubbs Tue, 03 Mar 2020 07:45:25 -0800

Hi all,

Up until now the AMD GCN port has been using exclusively 64-lane vectorswith masking for smaller sizes.

This works quite well, where it works, but there remain many test cases(and no doubt some real code) that refuse to vectorize because thenumber of iterations (or SLP equivalent) are smaller than thevectorization factor.

My question is: are there any plans to fill in these missing cases? Or,is relying on masking alone just not feasible?

I've dabbled in the vectorizer code, of course, but I can't claim tohave much of a feel for it as a whole. I may be able to help with theeffort in future, but for now I'm struggling to judge what's even needed.

For GCN the vectorization is quite important as scalar code is slow, andadding vectorization is usually cheap. The architecture can do anyvector size between 1 and 64 lanes (not just powers of two), so beingsmaller than the vectorization factor really ought not be a problem.

To fix this, I've been considering adding extra vector sizes (probably2, 4, 8, 16, 32) where the backend would take care of the masking.Asside from reductions and permutations the changes would be somewhattrivial, but the explosion in the number of generated patterns would beenormous, and it still won't allow arbitrary size vectors.


Thank you for your time; I'm trying to decide where my efforts should lie.

Andrew

Masked vector deficiencies

Reply via email to