On Sun, 6 Sep 2015 18:27:08 +0300 Oded Gabbay <[email protected]> wrote:
> This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all > the functions it calls (combine1, combine4 and > core_combine_over_u_pixel_vmx). > > The optimization is done by removing use of expand_alpha_1x128 and > expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from > pixman_combine32.h. > > Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores, > 3.4GHz, RHEL 7.2 ppc64le gave the following results: > > reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills) > > Before After Change > -------------------------------------------- > L1 182.05 210.22 +15.47% > L2 180.6 208.92 +15.68% > M 180.52 208.22 +15.34% > HT 130.17 178.97 +37.49% > VT 145.82 184.22 +26.33% > R 104.51 129.38 +23.80% > RT 48.3 61.54 +27.41% > Kops/s 430 504 +17.21% > > Signed-off-by: Oded Gabbay <[email protected]> > --- > pixman/pixman-vmx.c | 80 > ++++++++++++----------------------------------------- > 1 file changed, 18 insertions(+), 62 deletions(-) > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c > index a9bd024..d9fc5d6 100644 > --- a/pixman/pixman-vmx.c > +++ b/pixman/pixman-vmx.c > @@ -646,19 +643,10 @@ static force_inline uint32_t > combine1 (const uint32_t *ps, const uint32_t *pm) > { > uint32_t s = *ps; > + uint32_t a = ALPHA_8(*pm); pm is dereferenced before checked for NULL. > > if (pm) > - { > - vector unsigned int ms, mm; > - > - mm = unpack_32_1x128 (*pm); > - mm = expand_alpha_1x128 (mm); > - > - ms = unpack_32_1x128 (s); > - ms = pix_multiply (ms, mm); > - > - s = pack_1x128_32 (ms); > - } > + UN8x4_MUL_UN8(s, a); > > return s; > } Thanks, pq
pgpTP2m4Vnfgw.pgp
Description: OpenPGP digital signature
_______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
