Re: [PATCH net-next v6 19/23] zinc: Curve25519 ARM implementation

2018-10-05 Thread D. J. Bernstein
For the in-order ARM Cortex-A8 (the target for this code), adjacent multiply-add instructions forward summands quickly. A simple in-order dot-product computation has no latency problems, while interleaving computations, as suggested in this thread, creates problems. Also, on this microarchitecture,

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-17 Thread D. J. Bernstein
Eric Biggers writes: > If (more likely) you're talking about things like "use this NEON > implementation > on Cortex-A7 but this other NEON implementation on Cortex-A53", it's up the > developers and community to test different CPUs and make appropriate > decisions, > and yes it can be very usefu

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-15 Thread D. J. Bernstein
Eric Biggers writes: > You'd probably attract more contributors if you followed established > open source conventions. SUPERCOP already has thousands of implementations from hundreds of contributors. New speed records are more likely to appear in SUPERCOP than in any other cryptographic software c

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-15 Thread D. J. Bernstein
Eric Biggers writes: > I've also written a scalar ChaCha20 implementation (no NEON instructions!) > that > is 12.2 cpb on one block at a time on Cortex-A7, taking advantage of the free > rotates; that would be useful for the single permutation used to compute > XChaCha's subkey, and also for the e