Hi all, The intent of the patch is similar to previous in the series. Make more use of BSL2N when we have DImode operands in SIMD regs, but still use the GP instructions when that's where the operands are. Compared to the previous patches there are a couple of complications: * The operands are a bit more complex and get rejected by RTX costs during combine. This is fixed by adding some costing logic to aarch64_rtx_costs.
* The GP split sequence requires two temporaries instead of just one. I've marked operand 1 to be an input/output earlyclobber operand to give the second temporary together with the earlyclobber operand 0. This means that operand is marked with "+" even for the "w" alternatives as the modifier is global, but I don't see another way out here. Suggestions welcome. With these fixed for the testcase we generate: bsl2n_gp: // unchanged scalar output orr x1, x2, x1 and x0, x0, x2 orn x0, x0, x1 ret bsl2n_d: bsl2n z0.d, z0.d, z1.d, z2.d ret compared to the previous: bsl2n_gp: orr x1, x2, x1 and x0, x0, x2 orn x0, x0, x1 ret bsl2n_d: orr v1.8b, v2.8b, v1.8b and v0.8b, v2.8b, v0.8b orn v0.8b, v0.8b, v1.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_unpreddi): New define_insn_and_split. * config/aarch64/aarch64.cc (aarch64_bsl2n_rtx_form_p): Define. (aarch64_rtx_costs): Use the above. Cost BSL2N ops. gcc/testsuite/ * gcc.target/aarch64/sve2/bsl2n_d.c: New test.
0007-aarch64-Use-BSL2N-for-DImode-operands.patch
Description: 0007-aarch64-Use-BSL2N-for-DImode-operands.patch