On Tue, 22 Sep 2015 16:03:22 +0300 Oded Gabbay <[email protected]> wrote:
> On Tue, Sep 22, 2015 at 3:59 PM, Pekka Paalanen <[email protected]> wrote: > > On Mon, 21 Sep 2015 14:22:53 +0300 > > Oded Gabbay <[email protected]> wrote: > > > >> On Thu, Sep 10, 2015 at 7:16 PM, Siarhei Siamashka > >> <[email protected]> wrote: > >> > > >> > On Thu, 10 Sep 2015 12:27:18 +0300 > >> > Oded Gabbay <[email protected]> wrote: > >> > > >> > > On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay <[email protected]> > >> > > wrote: > >> > > > > >> > > > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka > >> > > > <[email protected]> wrote: > >> > > > > Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz, > >> > > > > Gentoo ppc (32-bit userland) gave the following results: > >> > > > > > >> > > > > before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07 > >> > > > > after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48 > >> > > > > > >> > > > > Signed-off-by: Siarhei Siamashka <[email protected]> > >> > > > > --- > >> > > > > pixman/pixman-vmx.c | 54 > >> > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > > > 1 files changed, 54 insertions(+), 0 deletions(-) > >> > > > > > >> > > > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c > >> > > > > index a9bd024..9e551b3 100644 > >> > > > > --- a/pixman/pixman-vmx.c > >> > > > > +++ b/pixman/pixman-vmx.c > >> > > > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_8888 > >> > > > > (pixman_implementation_t *imp, > >> > > > > } > >> > > > > > >> > > > > static void > >> > > > > +vmx_composite_over_n_8888 (pixman_implementation_t *imp, > >> > > > > + pixman_composite_info_t *info) > >> > > > > +{ > >> > > > > + PIXMAN_COMPOSITE_ARGS (info); > >> > > > > + uint32_t *dst_line, *dst; > >> > > > > + uint32_t src, ia; > >> > > > > + int i, w, dst_stride; > >> > > > > + vector unsigned int vdst, vsrc, via; > >> > > > > + > >> > > > > + src = _pixman_image_get_solid (imp, src_image, > >> > > > > dest_image->bits.format); > >> > > > > + > >> > > > > + if (src == 0) > >> > > > > + return; > >> > > > > + > >> > > > > + PIXMAN_IMAGE_GET_LINE ( > >> > > > > + dest_image, dest_x, dest_y, uint32_t, dst_stride, > >> > > > > dst_line, 1); > >> > > > > + > >> > > > > + vsrc = (vector unsigned int){src, src, src, src}; > >> > > > > + via = negate (splat_alpha (vsrc)); > >> > > > If we will use the over function (see my next comment), we need to > >> > > > remove the negate() from the above statement, as it is done in the > >> > > > over function. > >> > > > > >> > > > > + ia = ALPHA_8 (~src); > >> > > > > + > >> > > > > + while (height--) > >> > > > > + { > >> > > > > + dst = dst_line; > >> > > > > + dst_line += dst_stride; > >> > > > > + w = width; > >> > > > > + > >> > > > > + while (w && ((uintptr_t)dst & 15)) > >> > > > > + { > >> > > > > + uint32_t d = *dst; > >> > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); > >> > > > > + *dst++ = d; > >> > > > > + w--; > >> > > > > + } > >> > > > > + > >> > > > > + for (i = w / 4; i > 0; i--) > >> > > > > + { > >> > > > > + vdst = pix_multiply (load_128_aligned (dst), via); > >> > > > > + save_128_aligned (dst, pix_add (vsrc, vdst)); > >> > > > > >> > > > Instead of the above two lines, I would simply use the over function > >> > > > in vmx, which does exactly that. So: > >> > > > vdst = over(vsrc, via, load_128_aligned(dst)) > >> > > > save_128_aligned (dst, vdst); > >> > > > > >> > > > I prefer this as it reuses an existing function which helps > >> > > > maintainability, and using it has no impact on performance. The only difference between these variants is that my variant of the patch explicitly moved the calculation of the negated alpha out of the loop and is a more accurate representation of the assembly code, which is expected to be generated. Your variant relies on the compiler to do this job properly. As far as the readability/maintainability is concerned, there is no significant difference. There is the same number of lines of code. Also my variant of the patch is handling the main loop and the leading/trailing pixels in a more uniform way. Just compare via = negate (splat_alpha (vsrc)); with ia = ALPHA_8 (~src); and vdst = pix_multiply (load_128_aligned (dst), via); save_128_aligned (dst, pix_add (vsrc, vdst)); with uint32_t d = *dst; UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); *dst++ = d; > >> > > > > + dst += 4; > >> > > > > + } > >> > > > > + > >> > > > > + for (i = w % 4; --i >= 0;) > >> > > > > + { > >> > > > > + uint32_t d = dst[i]; > >> > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); > >> > > > > + dst[i] = d; > >> > > > > + } > >> > > > > + } > >> > > > > +} > >> > > > > + > >> > > > > +static void > >> > > > > vmx_composite_over_8888_8888 (pixman_implementation_t *imp, > >> > > > > pixman_composite_info_t *info) > >> > > > > { > >> > > > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP > >> > > > > (vmx_8888_8888_normal_OVER, > >> > > > > > >> > > > > static const pixman_fast_path_t vmx_fast_paths[] = > >> > > > > { > >> > > > > + PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, > >> > > > > vmx_composite_over_n_8888), > >> > > > > + PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, > >> > > > > vmx_composite_over_n_8888), > >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, > >> > > > > vmx_composite_over_8888_8888), > >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, > >> > > > > vmx_composite_over_8888_8888), > >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, > >> > > > > vmx_composite_over_8888_8888), > >> > > > > -- > >> > > > > 1.7.8.6 > >> > > > > > > > >> > >> I took your advice and run benchmarks with the *non* trimmed traces. > >> It takes 144 minutes to run a benchmark on POWER8, 3.4GHz 8 Cores (a > >> raw machine, not VM). > >> > >> I compared with and without this patch and I got: > >> > >> image ocitysmap 659.69 -> 611.71 : 1.08x > >> speedup > >> image xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup > >> > >> So I guess we can merge this patch, because I prefer the non-trimmed > >> results over the trimmed ones. > >> The low-level-blt giving significant improvement is also a good sign. > >> > >> Therefore: > >> Reviewed-by: Oded Gabbay <[email protected]> > > > > Hi, > > > > this is only touching pixman-vmx.c and I don't see anything to complain > > about here, so as long as you're happy, > > Acked-by Pekka Paalanen <[email protected]> > > > > Did you overrule your own comments? :-) > > Oh, right. > Now I see I had made some comments above about using existing helper function. > > Siarhei, > > Do you object that I will replace the lines above with call to the > existing helper function, before pushing this patch ? I have no objections. Both variants are perfectly fine for me and you are the one, who is taking care of the powerpc code. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
