Re: [PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Kyrylo Tkachov
haw >>> ; Alex Coplan ; Andrew >>> Pinski >>> Subject: [PATCH] aarch64: Improve popcountti2 with SVE >>> >>> Hi all, >>> >>> The TImode popcount sequence can be slightly improved with SVE. >>> If we generate: >>> ld

Re: [PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Kyrylo Tkachov >> Sent: Monday, July 7, 2025 10:38 AM >> To: GCC Patches >> Cc: Richard Sandiford ; Richard Earnshaw >> ; Alex Coplan ; Andrew >> Pinski >> Subject: [PATCH] aar

RE: [PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Tamar Christina
> -Original Message- > From: Kyrylo Tkachov > Sent: Monday, July 7, 2025 10:38 AM > To: GCC Patches > Cc: Richard Sandiford ; Richard Earnshaw > ; Alex Coplan ; Andrew > Pinski > Subject: [PATCH] aarch64: Improve popcountti2 with SVE > > Hi all, > >

[PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Kyrylo Tkachov
Hi all, The TImode popcount sequence can be slightly improved with SVE. If we generate: ldr q31, [x0] ptrue p7.b, vl16 cnt z31.d, p7/m, z31.d addp d31, v31.2d fmov x0, d31 ret instead of: h128: ldr q31, [x0] cnt v31.16b, v31.16b addv b31, v31.16b fmov w0, s31 ret we use the ADDP instruction for