It was pointed out to me that my compare.py script printed the times the
wrong way around, so I fixed that and also changed it to print output in
MiB/s -- always easier to reason about a benchmark when "bigger is
better"!

** Description changed:

  [impact]
  glibc 2.32 contained a number of improvements to the memcpy routines for 
server-grade AArch64 implementations (in particular, graviton2 & graviton3). 
They should be backported to focal, as the LTS releases are by far the most 
used on servers.
  
  [test case]
- Download the "bench.tar.gz" attachment from this report. It has a README 
+ Download the "bench.tar.gz" attachment from this report. It has a README
  that explains what to do, but here it is for reference:
  
  benchmark for testing arm64 memcpy improvements in SRU
  
  This is a benchmark that was derived from the memcpy benchmarks in glibc but 
altered to benchmark the public 'memcpy' symbol and be linked to the
  installed libc.
  
  To use this there are 5 steps:
  
  1. build -- just run "make test"
  2. run before upgrade -- "make bench-before"
  3. upgrade libc6 package -- depends on what is being tested!
  4. run again -- "make bench-after"
  5. compare -- "make compare"
  
  It produces output like this:
  
-    length |   before |    after |    delta
- ----------|----------|----------|----------
-     32768 |   125995 |   133696 |   -6.11%
-     65536 |   133349 |   140856 |   -5.63%
-    131072 |   139653 |   146419 |   -4.84%
-    262144 |   145441 |   152353 |   -4.75%
-    524288 |   191951 |   199856 |   -4.12%
-   1048576 |   240515 |   256623 |   -6.70%
+    length | before (MiB/s) |  after (MiB/s) |    delta
+ ----------|----------------|----------------|----------
+     32768 |         233.74 |         248.03 |    6.11%
+     65536 |         443.72 |         468.69 |    5.63%
+    131072 |         853.71 |         895.08 |    4.84%
+    262144 |        1640.93 |        1718.91 |    4.75%
+    524288 |        2501.80 |        2604.83 |    4.12%
+   1048576 |        3896.77 |        4157.74 |    6.70%
  
  On graviton2 systems, this should show an improvement of at least
  several percent. On other arm64 systems (raspberry pis of various
  vintage, thunderx2, xgene, etc etc) no significant regression should be
  seen.
  
  [regression potential]
  Rebuilding glibc is always a little risky (toolchain bugs and 
incompatibilities between the old and new versions can be surprising). But the 
autopkgtests and some manual general testing can help here.
  
  For this specific change, there is a potential risk that the new memcpy
  implementation could be used on a system where it is not in fact the
  fastest. We should run the test case not only on the systems where it is
  expected to help, but other systems such as the RPi4 and the launchpad
  build farm to ensure performance is not regressed there.

** Attachment removed: "bench.tar.gz"
   
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1951032/+attachment/5566380/+files/bench.tar.gz

** Attachment added: "benchmark with fixed comparison script"
   
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1951032/+attachment/5566659/+files/bench.tar.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1951032

Title:
  AArch64: Backport memcpy improvements

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1951032/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to