subject:"\[PATCH v4 09\/10\] util\/bufferiszero\: Add simd acceleration for aarch64"

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Richard Henderson

On 2/15/24 08:46, Alexander Monakov wrote: Right, so we can pick the cheapest reduction method, and if I'm reading Neoverse-N1 SOG right, SHRN is marginally cheaper than ADDV (latency 2 instead of 3), and it should be generally preferable on other cores, no? Fair. For that matter, cannot UQXT

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Alexander Monakov

On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/14/24 22:47, Alexander Monakov wrote: > > > > On Wed, 14 Feb 2024, Richard Henderson wrote: > > > >> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > >> double-check with the compiler flags for __ARM_NEON and don't

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Richard Henderson

On 2/14/24 22:47, Alexander Monakov wrote: On Wed, 14 Feb 2024, Richard Henderson wrote: Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Alexander Monakov

On Wed, 14 Feb 2024, Richard Henderson wrote: > Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > double-check with the compiler flags for __ARM_NEON and don't bother with > a runtime check. Otherwise, model the loop after the x86 SSE2 function, > and use VADDV to reduc

[PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Richard Henderson

Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function, and use VADDV to reduce the four vector comparisons. Signed-off-by: Richard He

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

[PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

5 matches

Site Navigation

Mail list logo

Footer information