Willy Tarreau <w...@1wt.eu> wrote:
> On Sun, Aug 09, 2020 at 06:30:17PM +0000, George Spelvin wrote:
> > Even something simple like buffering 8 TSC samples, and adding them
> > at 32-bit offsets across the state every 8th call, would make a huge
> > difference.
> 
> Doing testing on real hardware showed that retrieving the TSC on every
> call had a non negligible cost, causing a loss of 2.5% on the accept()
> rate and 4% on packet rate when using iptables -m statistics. However
> I reused your idea of accumulating old TSCs to increase the uncertainty
> about their exact value, except that I retrieve it only on 1/8 calls
> and use the previous noise in this case. With this I observe the same
> performance as plain 5.8. Below are the connection rates accepted on
> a single core :
> 
>         5.8           5.8+patch     5.8+patch+tsc
>    192900-197900   188800->192200   194500-197500  (conn/s)
> 
> This was on a core i7-8700K. I looked at the asm code for the function
> and it remains reasonably light, in the same order of complexity as the
> original one, so I think we could go with that.
> 
> My proposed change is below, in case you have any improvements to suggest.

As this relates to networking, you could also hook perturbation into rx/tx
softirq processing.  E.g. once for each new napi poll round or only once
for each softnet invocation, depending on cost.

IIRC the proposed draft left a unused prandom_seed() stub around, you could
re-use that to place extra data to include in the hash in percpu data.

Reply via email to