On Thu, Sep 24, 2020 at 3:27 PM Richard Biener
<[email protected]> wrote:
>
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo <[email protected]> wrote:
> >
> > Hi Segher,
> >
> > The attached two patches are updated and split from
> > "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple
> > [PR79251]"
> > as your comments.
> >
> >
> > [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change
> > rs6000_expand_vector_set param
> >
> > This one is preparation work of fix lvsl&lvsr arg mode and
> > rs6000_expand_vector_set
> > parameter support for both constant and variable index input.
> >
> >
> > [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in
> > expander [PR79251]
> >
> > This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.
>
> I'll just comment that
>
> xxperm 34,34,33
> xxinsertw 34,0,12
> xxperm 34,34,32
Btw, on x86_64 the following produces sth reasonable:
#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V setg (V v, int idx, T val)
{
V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
v = (v & ~mask) | (valv & mask);
return v;
}
vmovd %edi, %xmm1
vpbroadcastd %xmm1, %ymm1
vpcmpeqd .LC0(%rip), %ymm1, %ymm2
vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
ret
I'm quite sure you could do sth similar on power?
> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate. I'm not fluend in
> ppc assembly but
>
> rlwinm 6,6,2,28,29
> mtvsrwz 0,5
> lvsr 1,0,6
> lvsl 0,0,6
>
> possibly computes the shift masks for r33/r32? though
> I do not see those registers mentioned...
>
> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ... the building blocks are
> already there with vec_set at constant zero position
> plus vec_perm_const for the rotates.
>
> But well, I did ask this question. Multiple times.
>
> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).
>
> So I don't think vec_set with a variable index is the
> best approach.
> Xionghu - you said even without the patch the stack
> storage is eventually elided but
>
> addi 9,1,-16
> rldic 6,6,2,60
> stxv 34,-16(1)
> stwx 5,9,6
> lxv 34,-16(1)
>
> still shows stack(?) store/load with a bad STLF penalty.
>
> Richard.
>
> >
> > Thanks,
> > Xionghu