>> This case should be quite similar with pkgten, if you got improvement with >> pktgen, usually it was also the same for UDP, could you please try to disable >> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? >> Currently >> the most significant tests would be like this AFAICT: >> >> Host->VM 4.12 4.13 >> TCP: >> UDP: >> pktgen: >> >> Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's >> patch should >> work since we have seen positive number for that, you can also temporarily >> skip >> net-next as well. > > Here are the requested numbers, averaged over numerous runs -- guest is > 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread > pinned to other unique host CPUs. tso, gso, gro, ufo disabled on host > taps / guest virtio-net devs as requested: > > Host->VM 4.12 4.13 > TCP: 9.92Gb/s 6.44Gb/s > UDP: 5.77Gb/s 6.63Gb/s > pktgen: 1572403pps 1904265pps > > UDP/pktgen both show improvement from 4.12->4.13. More interesting, > however, is that I am seeing the TCP regression for the first time from > host->VM. I wonder if the combination of CPU binding + disabling of one > or more of tso/gso/gro/ufo is related. > >> >> If you see UDP and pktgen are aligned, then it might be helpful to continue >> the other two cases, otherwise we fail in the first place. >
I continued running many iterations of these tests between 4.12 and 4.13.. My throughput findings can be summarized as: VM->VM case: UDP: roughly equivalent TCP: Consistent regression (5-10%) VM->Host Both UDP and TCP traffic are roughly equivalent. Host->VM UDP+pktgen: improvement (5-10%), but inconsistent TCP: Consistent regression (25-30%) Host->VM UDP and pktgen seemed to show improvement in some runs, and in others seemed to mirror 4.12-level performance. The TCP regression for VM->VM is no surprise, we started with that. It's still consistent, but smaller in this specific environment. The TCP regression in Host->VM is interesting because I wasn't seeing it consistently before binding CPUs + disabling tso/gso/gro/ufo. Also interesting because of how large it is -- By any chance can you see this regression on x86 with the same configuration?