https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #2) > So for set with T == int and N == 32 we could generate > > vmovd %edi, %xmm1 > vpbroadcastd %xmm1, %ymm1 > vpcmpeqd .LC0(%rip), %ymm1, %ymm2 > vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > ret > > .LC0: > .long 0 > .long 1 > .long 2 > .long 3 > .long 4 > .long 5 > .long 6 > .long 7 > > aka, with GCC generic vectors > > V setg (V v, int idx, T val) > { > V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv); > v = (v & ~mask) | (valv & mask); > return v; > } Botched this up, corrected is V setg (V v, int idx, T val) { V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; V valv = (V){val, val, val, val, val, val, val, val}; V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); v = (v & ~mask) | (valv & mask); return v; } which produces vmovd %edi, %xmm1 vmovd %esi, %xmm2 vpbroadcastd %xmm1, %ymm1 vpbroadcastd %xmm2, %ymm2 vpcmpeqd .LC0(%rip), %ymm1, %ymm1 vpblendvb %ymm1, %ymm2, %ymm0, %ymm0 with AVX2, so one more vmovd/vpbroadcastd (as expected). With -mavx512vl this even becomes vpbroadcastd %edi, %ymm1 vpcmpd $0, .LC0(%rip), %ymm1, %k1 vpbroadcastd %esi, %ymm0{%k1} for the extract case we really need to compute a variable permute mask which looks harder and possibly more expensive than the spill/load, so the set case looks more important to tackle (tackling it will still eventually improve initial RTL generation by avoiding stack assignments for locals) > There's ongoing patch iteration on the ml adding variable index vec_set > expanders for powerpc (and the related middle-end changes). The question > is whether optabs can try many things or the target should have the choice > (probably better). > > Eventually there's a more efficient way to generate {0, 1, 2, 3...}.