On Tue, 22 Jan 2013 13:10:54 -0000, Siarhei Siamashka <[email protected]> wrote:
Just one thing looks a bit odd.src_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change M 57.0 0.2 89.2 0.5 100.0% +56.4%89.2 MPix/s * 32bpp = ~357 MB/ssrc_0565_0565 Before After Mean StdDev Mean StdDev Confidence Change M 90.7 0.4 133.5 0.7 100.0% +47.1%133.5 MPix/s * 16bpp = ~267 MB/s Seems to be a much less efficient use of memory bandwidth here compared to src_8888_8888?
I think what you're seeing here is the speed difference between the word- aligned and misaligned code paths, because the M test cycles over many starting X positions for the source buffer, but always uses X=1 for the destination buffer. For 32bpp, all pixel positions are word-aligned, but for 16bpp this will result in half the runs being misaligned. I tried tweaking the M test to force alignment or misalignment, to get some comparative timings for src_0565_0565: Aligned: 169.8 Mpix/s (rather closer to 2* the 32bpp result) Unaligned: 108.6 Mpix/s I'm open to suggestions as to how to improve the misaligned case. Early in development, I compared the speed of doing LDM followed by in- register shuffling with either ORR or PKH instructions against using lots of unaligned LDRs, and the LDRs came out fastest by a small margin, which is why that's what's used in my patch. Ben _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
