https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> So for set with T == int and N == 32 we could generate
>
> vmovd %edi, %xmm1
> vpbroadcastd %xmm1, %ymm1
> vpcmpeqd .LC0(%rip), %ymm1, %ymm2
> vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> ret
>
> .LC0:
> .long 0
> .long 1
> .long 2
> .long 3
> .long 4
> .long 5
> .long 6
> .long 7
>
> aka, with GCC generic vectors
>
> V setg (V v, int idx, T val)
> {
> V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
> v = (v & ~mask) | (valv & mask);
> return v;
> }
Botched this up, corrected is
V setg (V v, int idx, T val)
{
V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
V valv = (V){val, val, val, val, val, val, val, val};
V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
v = (v & ~mask) | (valv & mask);
return v;
}
which produces
vmovd %edi, %xmm1
vmovd %esi, %xmm2
vpbroadcastd %xmm1, %ymm1
vpbroadcastd %xmm2, %ymm2
vpcmpeqd .LC0(%rip), %ymm1, %ymm1
vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
with AVX2, so one more vmovd/vpbroadcastd (as expected). With -mavx512vl
this even becomes
vpbroadcastd %edi, %ymm1
vpcmpd $0, .LC0(%rip), %ymm1, %k1
vpbroadcastd %esi, %ymm0{%k1}
for the extract case we really need to compute a variable permute mask
which looks harder and possibly more expensive than the spill/load,
so the set case looks more important to tackle (tackling it will still
eventually improve initial RTL generation by avoiding stack assignments
for locals)
> There's ongoing patch iteration on the ml adding variable index vec_set
> expanders for powerpc (and the related middle-end changes). The question
> is whether optabs can try many things or the target should have the choice
> (probably better).
>
> Eventually there's a more efficient way to generate {0, 1, 2, 3...}.