Re: [Pixman] [PATCH 1/4] vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER

Siarhei Siamashka Thu, 17 Sep 2015 09:06:44 -0700

On Sun, 6 Sep 2015 10:08:06 -0700
Matt Turner <[email protected]> wrote:

> On Sun, Sep 6, 2015 at 8:27 AM, Oded Gabbay <[email protected]> wrote:
> > reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)
> >
> >                 Before          After           Change
> >               --------------------------------------------
> > L1              182.05          210.22         +15.47%
> > L2              180.6           208.92         +15.68%
> > M               180.52          208.22         +15.34%
> 
> There's no variation between L1, L2, and M -- as a follow on, it might
> be interesting to experiment with unrolling the loop a bit. Looked
> like the other patches in this series show the same behavior.

In fact it might be a defect in the lowlevel-blt-bench. The benchmark
picks arbitrary sizes of the working sets and only labels them as 'L1',
'L2' and 'M'. The 'M' benchmark works with two 1920x1080 32bpp buffers,
which means that only around ~16MB of memory is touched at most. Modern
processors already have the last level cache of a comparable size and
https://en.wikipedia.org/wiki/POWER8 mentions 16MB of L4 cache.

One the other hand, ~200 MPix/s is a relatively low bandwidth by the
high performance desktop/server standards and should not be a problem
for the automatic hardware prefetch.

-- 
Best regards,
Siarhei Siamashka
_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 1/4] vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER

Reply via email to