Søren Sandmann wrote:

Here is another proposal, but I'm not sure it's really better:

- The combiners are made to return a buffer. The returned buffer is
  expected to contain the combined result and may be any of the passed
  src/mask/dest buffers. Almost all combiners will continue to combine
  into the dest buffer, which they will also return. But the SRC
  combiner will simply return the source buffer without any memcpy()ing.

This sounds exactly like how Nuke (compositing software I wrote that is used in special effects) works, which is doing compositing of often hundreds of steps on the CPU quite fast.

Work is done per scanline. The final destination first allocates or locates a buffer to write the scanline to. It then calls the last step of the compositing, passing it the buffer. This last step then returns a pointer to the resulting scanline. This may be in the passed buffer, or other memory (typically a pointer to a source buffer). As you point out, if the final step wants to put the result in the buffer it must do a memcpy if the returned pointer is not to the buffer:

  float* buffer = framebuf + y*w;
  float* result = final_step(buffer, w);
  if (result != buffer) memcpy(buffer, result, w * sizeof(*buffer));

The compositing step recursively calls input steps. In most cases the output buffer it got is passed to the input, so the final step does not need to allocate any temporary buffer. The input may write it's result to the output buffer, and then the final step replaces it with it's calculation. In most cases this requires no actual thought and works perfectly. Here is a pseudo code version of a step that adds 1 to every pixel:

    float* add1(float* buffer, int w) {
      float* source = input_step(buffer, w);
      for (int i = 0; i < w; i++)
        buffer[i] = source[i] + 1;
      return buffer;
    }

Note that a no-op in the middle does not need to do a memcpy. Instead it just returns the buffer returned from the input:

    float* noop(float* buffer, int w) {
      return input_step(buffer, w);
    }

Sometimes a step needs to use a second buffer. These are allocated on the stack using alloca (except on Windows, sigh...). The most common reason is that a step merges the output of two or more input steps:

    float* add_two_inputs(float* buffer, int w) {
      float* a = input_a(buffer, w);
      float temp[w];
      float* b = input_b(temp, w);
      for (int i = 0; i < w; i++)
        buffer[i] = a[i] + b[i];
      return buffer;
    }

I think some things you may think need different buffers can reuse them:

    int* convert_565_to_888(int* buffer, int w) {
      short* a = input((short*)buffer, w);
      for (int i = w; i--; ) // note it must go backwards!
        buffer[i] = pixel_565_to_888(a[i]);
      return buffer;
    }

In any case this scheme greatly reduces the amount of copying, and (probably more important) the amount of memory allocation/free being done. I would recommend it for cairo.
_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to