On 7/7/25 06:20, Kyrylo Tkachov wrote:
> External email: Use caution opening links or attachments
>
>
> Hi all,
>
> The intent of the patch is similar to previous in the series.
> Make more use of BSL2N when we have DImode operands in SIMD regs,
> but still use the GP instructions when that's where the operands are.
> Compared to the previous patches there are a couple of complications:
> * The operands are a bit more complex and get rejected by RTX costs during
> combine. This is fixed by adding some costing logic to aarch64_rtx_costs.
>
> * The GP split sequence requires two temporaries instead of just one.
> I've marked operand 1 to be an input/output earlyclobber operand to give
> the second temporary together with the earlyclobber operand 0. This means
> that operand is marked with "+" even for the "w" alternatives as the modifier
> is global, but I don't see another way out here. Suggestions welcome.
>
> With these fixed for the testcase we generate:
> bsl2n_gp: // unchanged scalar output
> orr x1, x2, x1
> and x0, x0, x2
> orn x0, x0, x1
> ret
>
> bsl2n_d:
> bsl2n z0.d, z0.d, z1.d, z2.d
> ret
>
> compared to the previous:
> bsl2n_gp:
> orr x1, x2, x1
> and x0, x0, x2
> orn x0, x0, x1
> ret
>
> bsl2n_d:
> orr v1.8b, v2.8b, v1.8b
> and v0.8b, v2.8b, v0.8b
> orn v0.8b, v0.8b, v1.8b
> ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
> Thanks,
> Kyrill
Hi Kyrill

I think in the GPR variant some overlap in operands is possible, like "[ 
&r       , &r        , r1        , r01 ; *              ] #".

In aarch64_bsl2n_rtx_form_p() shouldn't there be a check for one 
parameter being the same on both sides (the select)?

Otherwise looks good to me (but I cannot approve as I am neither 
reviewer or approver).

Remi
>
> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
>
> gcc/
>
>          * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_unpreddi): New
>          define_insn_and_split.
>          * config/aarch64/aarch64.cc (aarch64_bsl2n_rtx_form_p): Define.
>          (aarch64_rtx_costs): Use the above. Cost BSL2N ops.
>
> gcc/testsuite/
>
>          * gcc.target/aarch64/sve2/bsl2n_d.c: New test.

Reply via email to