To revive this...

On Fri, Aug 10, 2018 at 08:27:58AM +0200, Stephan Mueller wrote:
> Am Donnerstag, 9. August 2018, 21:40:12 CEST schrieb Eric Biggers:
> 
> Hi Eric,
> 
> >     while (bytes >= CHACHA20_BLOCK_SIZE) {
> >             chacha20_block(state, stream);
> > -           crypto_xor(dst, (const u8 *)stream, CHACHA20_BLOCK_SIZE);
> > +           crypto_xor(dst, stream, CHACHA20_BLOCK_SIZE);
> 
> If we are at it, I am wondering whether we should use crypto_xor. At this 
> point we exactly know that the data is CHACHA20_BLOCK_SIZE bytes in length 
> which is divisible by u32. Hence, shouldn't we disregard crypto_xor in favor 
> of a loop iterating in 32 bits words? crypto_xor contains some checks for 
> trailing bytes which we could spare.

crypto_xor() here is fine.  It already meets the conditions for the inlined
version that XOR's a long at a time:

        if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
            __builtin_constant_p(size) &&
            (size % sizeof(unsigned long)) == 0) {
                unsigned long *d = (unsigned long *)dst;
                unsigned long *s = (unsigned long *)src;

                while (size > 0) {
                        *d++ ^= *s++;
                        size -= sizeof(unsigned long);
                }
        }

And regardless, it's better to optimize crypto_xor() once, than all the callers.

> 
> >             bytes -= CHACHA20_BLOCK_SIZE;
> >             dst += CHACHA20_BLOCK_SIZE;
> >     }
> >     if (bytes) {
> >             chacha20_block(state, stream);
> > -           crypto_xor(dst, (const u8 *)stream, bytes);
> > +           crypto_xor(dst, stream, bytes);
> 
> Same here.

'bytes' need not be a multiple of sizeof(u32) or sizeof(long), and 'dst' can
have any alignment...  So we should just call crypto_xor() which does the right
thing, and is intended to do so as efficiently as possible.

> 
> > @@ -1006,14 +1006,14 @@ static void _crng_backtrack_protect(struct
> > crng_state *crng, used = 0;
> >     }
> >     spin_lock_irqsave(&crng->lock, flags);
> > -   s = &tmp[used / sizeof(__u32)];
> > +   s = (__u32 *) &tmp[used];
> 
> As Yann said, wouldn't you have the alignment problem here again?
> 
> Somehow, somebody must check the provided input buffer at one time.
> 

I guess we should just explicitly align the 'tmp' buffers in _get_random_bytes()
and extract_crng_user().

- Eric

Reply via email to