On Thu, 23 Apr 2015 20:28:58 +0100 "Ben Avison" <[email protected]> wrote:
> On Thu, 23 Apr 2015 13:10:10 +0100, Pekka Paalanen <[email protected]> > wrote: > > > Affine-bench differs from lowlevel-blt-bench in the following: > > - does not test different sized operations fitting to specific caches, > > destination is always 1920x1080 > > - allows defining the affine transformation parameters > > - carefully computes operation extents to hit the COVER_CLIP fast paths > [...] > > did I capture all the special features affine-bench has over > > lowlevel-blt-bench? I see llbb could use a transform too, and > > was looking at why extending that would be unwanted. > > Yes, there's some support in lowlevel-blt-bench for scaled plots, but > it's limited to a single scale factor - it's the smallest expressable > increment larger than unity, corresponding to an oh-so-slight size > reduction, and is applied only in the X axis. The fact that lowlevel-blt > bench doesn't attempt anything more than that means it can make some > simplifications: > > * the source and destination buffers can be the same size as for the > unscaled case > * automatically satisfies COVER_CLIP_NEAREST (although when I was > analysing the flags yesterday, I was reminded that before I removed the > 8*pixman_fixed_e fudge factor, lowlevel-blt-bench's bilinear operations > were incorrectly calculated to *not* satisfy COVER_CLIP_BILINEAR) > * no need to add translation offsets into the transform matrix > * one pixel row in = one pixel row out greatly simplifies the > calculations about what can fit in L1 and L2 caches - this is why I > deliberately only tested the memory-constrained case in affine-bench. > For example, in a 90-degree rotation, each new output pixel in a row > will require reading from a different source cacheline. > > When I was writing the ARMv6 scaled fetchers, I became aware that they > were going to take quite different code paths in the enlargement vs > reduction cases, as well as when a vertical scaling factor was involved. > I needed to be able to benchmark all these combinations in order to > select the best prefetch distances, if nothing else. I also realised > there was currently no way to benchmark other common affine transforms > that I might want to address in future such as reflections (including > those used by the reflect repeat type) or even simple rotations, so with > the difficulties of ensuring COVER_CLIP_BILINEAR too I just decided it > would be easier to write a new benchmarker. Yes, very good. Also forgot to say, that I didn't bother changing the CLI for affine-bench to use llbb's pattern parser. I figured it wouldn't be that important. We already use the same operator and format name lookup tables anyway. > I've reviewed your version, looks fine to me. A very minor point: I'm not > sure it's worth making a copy of the transform struct at the start of > bench() because we mostly only use a pointer to the struct thereafter, so > you might as well have kept using bi->transform. Thank you, I'll take that as R-b. I make the copy for the sake of the API. I wanted to use const arguments as much as possible to avoid confusion later on about what values were actually changed or not. I assumed it wouldn't make any difference to the benchmark results. Thanks, pq _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
