On Sat, Feb 17, 2024 at 09:21:50AM -1000, Richard Henderson wrote: > On 2/16/24 23:49, Alexander Monakov wrote: > > > > On Fri, 16 Feb 2024, Richard Henderson wrote: > > > > > Benchmark each acceleration function vs an aligned buffer of zeros. > > > > > > Signed-off-by: Richard Henderson <richard.hender...@linaro.org> > > > --- > > > + > > > +static void test(const void *opaque) > > > +{ > > > + size_t len = 64 * KiB; > > > > This exceeds L1 cache capacity, so the performance ceiling of L2 cache > > throughput is easier to hit with a suboptimal implementation. It also > > seems to vastly exceed typical buffer sizes in Qemu. > > > > When preparing the patch we mostly tested at 8 KiB. The size decides > > whether the branch exiting the loop becomes perfectly predictable in > > the microbenchmark, e.g. at 128 bytes per iteration it exits on the > > 63'rd iteration, which Intel predictors cannot track, so we get > > one mispredict per call. > > > > (so perhaps smaller sizes like 2 or 4 KiB are better) > > Fair. I've adjusted to loop over 1, 4, 16, 64 KiB. > > # Start of bufferiszero tests > # buffer_is_zero #0: 1KB 49227.29 MB/sec > # buffer_is_zero #0: 4KB 137461.28 MB/sec > # buffer_is_zero #0: 16KB 224220.41 MB/sec > # buffer_is_zero #0: 64KB 142461.00 MB/sec > # buffer_is_zero #1: 1KB 45423.59 MB/sec > # buffer_is_zero #1: 4KB 91409.69 MB/sec > # buffer_is_zero #1: 16KB 123819.94 MB/sec > # buffer_is_zero #1: 64KB 71173.75 MB/sec > # buffer_is_zero #2: 1KB 35465.03 MB/sec > # buffer_is_zero #2: 4KB 56110.46 MB/sec > # buffer_is_zero #2: 16KB 68852.28 MB/sec > # buffer_is_zero #2: 64KB 39043.80 MB/sec
Totally nit-picking, but it would be easier to read with a little alignment and blanks lines: # buffer_is_zero #0: 1KB 49227.29 MB/sec # buffer_is_zero #0: 4KB 137461.28 MB/sec # buffer_is_zero #0: 16KB 224220.41 MB/sec # buffer_is_zero #0: 64KB 142461.00 MB/sec # buffer_is_zero #1: 1KB 45423.59 MB/sec # buffer_is_zero #1: 4KB 91409.69 MB/sec # buffer_is_zero #1: 16KB 123819.94 MB/sec # buffer_is_zero #1: 64KB 71173.75 MB/sec # buffer_is_zero #2: 1KB 35465.03 MB/sec # buffer_is_zero #2: 4KB 56110.46 MB/sec # buffer_is_zero #2: 16KB 68852.28 MB/sec # buffer_is_zero #2: 64KB 39043.80 MB/sec With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|