On 06/09/2014 03:13 AM, Evgeny Stupachenko wrote:
> + /* First we apply one operand permutation to the part where
> + elements stay not in their respective lanes. */
> + dcopy = *d;
> + if (which == 2)
> + dcopy.op0 = dcopy.op1 = d->op1;
> + else
> + dcopy.op0 = dcopy.op1 = d->op0;
> + dcopy.one_operand_p = true;
> +
> + for (i = 0; i < nelt; ++i)
> + {
> + unsigned e = d->perm[i];
> + if (which == 2)
> + dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e);
This is wrong for which == 1. For both cases this simplifies to
dcopy.perm[i] = e & (nelt - 1);
> +
> + for (i = 0; i < nelt; ++i)
> + {
> + unsigned e = d->perm[i];
> + if (which == 2)
> + dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e);
> + else
> + dcopy1.perm[i] = ((e < nelt) ? i : e);
> + }
This is known to be a blend, so you know the value of E.
Simplifies to
dcopy1.perm[i] = (e >= nelt ? nelt + i : i);
r~