On 06/09/2014 03:13 AM, Evgeny Stupachenko wrote: > + /* First we apply one operand permutation to the part where > + elements stay not in their respective lanes. */ > + dcopy = *d; > + if (which == 2) > + dcopy.op0 = dcopy.op1 = d->op1; > + else > + dcopy.op0 = dcopy.op1 = d->op0; > + dcopy.one_operand_p = true; > + > + for (i = 0; i < nelt; ++i) > + { > + unsigned e = d->perm[i]; > + if (which == 2) > + dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e);
This is wrong for which == 1. For both cases this simplifies to dcopy.perm[i] = e & (nelt - 1); > + > + for (i = 0; i < nelt; ++i) > + { > + unsigned e = d->perm[i]; > + if (which == 2) > + dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e); > + else > + dcopy1.perm[i] = ((e < nelt) ? i : e); > + } This is known to be a blend, so you know the value of E. Simplifies to dcopy1.perm[i] = (e >= nelt ? nelt + i : i); r~