Hi Ard,

On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> @@ -42,24 +42,24 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 
> *src,
>  {
>       u8 buf[CHACHA_BLOCK_SIZE];
>  
> -     while (bytes >= CHACHA_BLOCK_SIZE * 4) {
> -             chacha_4block_xor_neon(state, dst, src, nrounds);
> -             bytes -= CHACHA_BLOCK_SIZE * 4;
> -             src += CHACHA_BLOCK_SIZE * 4;
> -             dst += CHACHA_BLOCK_SIZE * 4;
> -             state[12] += 4;
> -     }
> -     while (bytes >= CHACHA_BLOCK_SIZE) {
> -             chacha_block_xor_neon(state, dst, src, nrounds);
> -             bytes -= CHACHA_BLOCK_SIZE;
> -             src += CHACHA_BLOCK_SIZE;
> -             dst += CHACHA_BLOCK_SIZE;
> -             state[12]++;
> +     while (bytes > CHACHA_BLOCK_SIZE) {
> +             unsigned int l = min(bytes, CHACHA_BLOCK_SIZE * 4U);
> +
> +             chacha_4block_xor_neon(state, dst, src, nrounds, l);
> +             bytes -= l;
> +             src += l;
> +             dst += l;
> +             state[12] += DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE);
>       }
>       if (bytes) {
> -             memcpy(buf, src, bytes);
> -             chacha_block_xor_neon(state, buf, buf, nrounds);
> -             memcpy(dst, buf, bytes);
> +             const u8 *s = src;
> +             u8 *d = dst;
> +
> +             if (bytes != CHACHA_BLOCK_SIZE)
> +                     s = d = memcpy(buf, src, bytes);
> +             chacha_block_xor_neon(state, d, s, nrounds);
> +             if (d != dst)
> +                     memcpy(dst, buf, bytes);
>       }
>  }
>  

Shouldn't this be incrementing the block counter after chacha_block_xor_neon()?
It might be needed by the library API.

Also, even with that fixed, this patch is causing the self-tests (both the
chacha20poly1305_selftest(), and the crypto API tests for chacha20-neon,
xchacha20-neon, and xchacha12-neon) to fail when I boot a kernel in QEMU.  This
doesn't happen on real hardware (Raspberry Pi 2), and I don't see any other bugs
in this patch, so I'm not sure what the problem is.  Did you run the self-tests
on every platform you tested this on?

- Eric

Reply via email to