On Sun, 6 Sep 2015 10:08:06 -0700 Matt Turner <[email protected]> wrote:
> On Sun, Sep 6, 2015 at 8:27 AM, Oded Gabbay <[email protected]> wrote: > > reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills) > > > > Before After Change > > -------------------------------------------- > > L1 182.05 210.22 +15.47% > > L2 180.6 208.92 +15.68% > > M 180.52 208.22 +15.34% > > There's no variation between L1, L2, and M -- as a follow on, it might > be interesting to experiment with unrolling the loop a bit. Looked > like the other patches in this series show the same behavior. In fact it might be a defect in the lowlevel-blt-bench. The benchmark picks arbitrary sizes of the working sets and only labels them as 'L1', 'L2' and 'M'. The 'M' benchmark works with two 1920x1080 32bpp buffers, which means that only around ~16MB of memory is touched at most. Modern processors already have the last level cache of a comparable size and https://en.wikipedia.org/wiki/POWER8 mentions 16MB of L4 cache. One the other hand, ~200 MPix/s is a relatively low bandwidth by the high performance desktop/server standards and should not be a problem for the automatic hardware prefetch. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
