http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-24 08:39:16 UTC --- Not exactly, -fif-convert-loop-stores is apparently a language changing option, only a subset of valid C/C++ programs is valid with it. With V*GATHER* insns, and, as I found out during the weekend, with VMASKMOVP[SD] and VPMASKMOV[DQ] instructions too we can handle both conditional loads and conditional stores. So testcases like: float a[N], b[N], c[N], d[N], e[N], g[N]; void f6 (void) { int i; for (i = 0; i < N; i++) e[i] = a[i] < b[i] ? c[i] : d[i]; } void f7 (float *p, float *q) { int i; for (i = 0; i < N; i++) e[i] = a[i] < b[i] ? p[i] : q[i]; } void f8 (void) { int i; for (i = 0; i < N; i++) { float f = c[i] + d[i]; if (a[i] < b[i]) e[i] = f; } } void f9 (void) { int i; for (i = 0; i < N; i++) { float f = c[i] * d[i]; if (a[i] < b[i]) e[i] = f; else g[i] = f; } } should be vectorizable (and even with -mavx). Haven't checked if any other CPU (PPC, ARM, ...) doesn't have anything similar. In fact, f6 ought to be vectorizable always, we could easily find out that for any i that can appear in the loop (0 through 999) c[i] (nor d[i]) will not trap or fault. The question is if the same is true for extern float c[N]; instead (e.g. if the actual definition would be then float c[N / 2];, I'd hope that it is invalid C though), but for f7 you already can't know if p resp. q are valid pointers at all, are correctly aligned, and whether e.g. p[i] or q[i] don't point beyond end of an mmapped region. So f7/f8/f9 are only vectorizable using these v*maskmov* instructions (or f7 using v*gather*, but that would be unnecessary additional overhead). I've verified that SNB CPUs don't require any alignment and don't fault on completely invalid addresses with zero mask. The question is how to represent this in the IL, and IMHO it should be something that is either present solely during the vectorization (i.e. pattern recognizer like thing), or that we convert the IL into right before the vectorizer (e.g. during ifcvt), but convert it back to the original multiple BBs IL either at the end of the vectorizer or in a pass right after the vectorizer. For the conditional loads we could perhaps represent them by COND_EXPRs with some flag on the gimple which would allow memory instead of SSA_NAMEs in one or both of the then/else operands or a new tree code, for conditional stores we'd need a new tree code.