On Sat, Aug 08, 2020 at 10:07:51AM -0700, Andy Lutomirski wrote: > > > On Aug 8, 2020, at 8:29 AM, George Spelvin <l...@sdf.org> wrote: > > > > > And apparently switching to the fastest secure PRNG currently > > in the kernel (get_random_u32() using ChaCha + per-CPU buffers) > > would cause too much performance penalty. > > Can someone explain *why* the slow path latency is particularly relevant > here? What workload has the net code generating random numbers in a place > where even a whole microsecond is a problem as long as the amortized cost is > low? (I'm not saying I won't believe this matters, but it's not obvious to > me that it matters.)
It really depends on what is being done and how we want that code to continue to be used. Often it will be a matter of queue sizes or concurrency in certain contexts under a lock. For illustration let's say you want to use randoms to choose a sequence number for a SYN cookie (it's not what is done today), a 40G NIC can deliver 60Mpps or one packet every 16 ns. If you periodically add 1us you end up queuing 60 extra packets in a queue when this happens. This might or might not be acceptable. Regarding concurrency, you could imagine some code picking a random in a small function running under a spinlock. Frequently adding 1us there can be expensive. To be clear, I'm not *that* much worried about *some* extra latency, however I'm extremely worried about the risks of increase once some security researcher considers the PRNG not secure enough again (especially once it starts to be used for security purposes after having being claimed secure) and we replace it with another one showing a higher cost and longer tail to amortize the cost. And given that we're speaking about replacing a non-secure PRNG with *something*, it doesn't seem really wise to start to introduce new constraints on the data path when there are probably a large enough number of possibilities which do not require these constraints. Last point, we must not rule out the possibility than some clever researcher will later come up with new time-based attacks consisting in observing latencies of packet processing, explaining how they can count the number of connections reaching a given host and reveal certain useful information, and that we're then forced to slow down all of them to proceed in constant time. Just my two cents, Willy