On 7/7/25 06:20, Kyrylo Tkachov wrote: > External email: Use caution opening links or attachments > > > Hi all, > > The intent of the patch is similar to previous in the series. > Make more use of BSL2N when we have DImode operands in SIMD regs, > but still use the GP instructions when that's where the operands are. > Compared to the previous patches there are a couple of complications: > * The operands are a bit more complex and get rejected by RTX costs during > combine. This is fixed by adding some costing logic to aarch64_rtx_costs. > > * The GP split sequence requires two temporaries instead of just one. > I've marked operand 1 to be an input/output earlyclobber operand to give > the second temporary together with the earlyclobber operand 0. This means > that operand is marked with "+" even for the "w" alternatives as the modifier > is global, but I don't see another way out here. Suggestions welcome. > > With these fixed for the testcase we generate: > bsl2n_gp: // unchanged scalar output > orr x1, x2, x1 > and x0, x0, x2 > orn x0, x0, x1 > ret > > bsl2n_d: > bsl2n z0.d, z0.d, z1.d, z2.d > ret > > compared to the previous: > bsl2n_gp: > orr x1, x2, x1 > and x0, x0, x2 > orn x0, x0, x1 > ret > > bsl2n_d: > orr v1.8b, v2.8b, v1.8b > and v0.8b, v2.8b, v0.8b > orn v0.8b, v0.8b, v1.8b > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > Ok for trunk? > Thanks, > Kyrill Hi Kyrill
I think in the GPR variant some overlap in operands is possible, like "[ &r , &r , r1 , r01 ; * ] #". In aarch64_bsl2n_rtx_form_p() shouldn't there be a check for one parameter being the same on both sides (the select)? Otherwise looks good to me (but I cannot approve as I am neither reviewer or approver). Remi > > Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > > gcc/ > > * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_unpreddi): New > define_insn_and_split. > * config/aarch64/aarch64.cc (aarch64_bsl2n_rtx_form_p): Define. > (aarch64_rtx_costs): Use the above. Cost BSL2N ops. > > gcc/testsuite/ > > * gcc.target/aarch64/sve2/bsl2n_d.c: New test.