Siarhei Siamashka wrote:

Going forward, we need to also add support for separable bilinear
scaling (first horizontal interpolation for single scanlines to
temporary buffers in L1 cache, then vertical interpolation of these
buffers to get the final result). Unless I misunderstood something,
Soeren thinks that it's going to be universally better. I think that
both direct and separable scaling methods are going to be useful for
the platforms with wide SIMD. Working with two source scanlines and
providing results directly is good for extreme downscaling. Separable
processing is good for extreme upscaling. There must be a backend
dependent crossover point at a certain scaling factor.

If by "downscaling" you mean making the picture smaller, this is the harder one, and the one that requires more than two source scanlines. This should be apparent if you imagine a downscale smaller than 1/2, since the resulting number of scan lines is less than 1/2 the original, if each of them only depends on 2 then there are some scanlines of the original that did not contribute to the resulting image.

Attempting to do this is why current cairo downscaling produces very noisy images.

Also both upscaling and downscaling can be sped up by using a 2-pass method. It is far more important for downscaling but helps both. A monkey wrench in this however is that hardware does support 4-input bilinear interpolation and so you often get the fastest results by using this for upscaling even though it is doing some redundant work. That is no help for downscaling however unless you use mipmaps.

I don't think rectangle sources help affine transforms if you plan to do 2-pass. An affine transform can be split into 3 parts, this can be figured out so the resulting matricies multiply back to the original):

1. Either the identity or a swap of x and y axis, chosen to make the determinant of the matrix in step 2 as large as possible

2. A transform that only moves pixels vertically (a is 1 and c is 0)

3. A transform that only moves pixels horizontally (b is 0 and d is 1)

By using step 1 to decide between two versions of step 2 (one which samples vertically from the source rather than horizontally) then you have a two-pass algorithm. But each of them only needs a 1xn or nx1 sample of input pixels to produce a 1xn or nx1 output section.

There is also a three-pass version (often called Catmull-Rom) that produces less blurring for a 45 degree rotation because the intermediate images are larger. This is done by a horizontal, vertical, and then another horizontal pass. However I have found the 2-pass version works fine and it is what is used by Nuke and nobody has complained.

Note that horizontal/vertical can be swapped in all this discussion, which is where knowledge of cache lines/etc is going to be more important.
_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to