On Wed, Oct 2, 2019 at 4:17 PM Ard Biesheuvel <[email protected]> wrote:
> Expose the accelerated NEON ChaCha routine directly as a symbol
> export so that users of the ChaCha library can use it directly.
Eric had some nice code for ChaCha for certain ARM cores that lived in
Zinc as chacha20-unrolled-arm.S. This code became active for certain
cores where NEON was bad and for cores with no NEON. The condition for
it was:
switch (read_cpuid_part()) {
case ARM_CPU_PART_CORTEX_A7:
case ARM_CPU_PART_CORTEX_A5:
/* The Cortex-A7 and Cortex-A5 do not perform well with the NEON
* implementation but do incredibly with the scalar one and use
* less power.
*/
break;
default:
chacha20_use_neon = elf_hwcap & HWCAP_NEON;
}
...
for (;;) {
if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && chacha20_use_neon &&
len >= CHACHA20_BLOCK_SIZE * 3 && simd_use(simd_context)) {
const size_t bytes = min_t(size_t, len, PAGE_SIZE);
chacha20_neon(dst, src, bytes, ctx->key, ctx->counter);
ctx->counter[0] += (bytes + 63) / 64;
len -= bytes;
if (!len)
break;
dst += bytes;
src += bytes;
simd_relax(simd_context);
} else {
chacha20_arm(dst, src, len, ctx->key, ctx->counter);
ctx->counter[0] += (len + 63) / 64;
break;
}
}
It's another instance in which the generic code was totally optimized
out of Zinc builds.
Did these changes make it into the existing tree?