The lookup table will cause ispc to emit gather instructions, because the
index of the lookup is varying. There's not really anything you can do
about it, and unless you're targeting AVX2 and above, there's not hardware
support for that on SSEx and AVX. Also, having the alpha channel
interleaved with rgb will make the lanes interdependent on each other,
which is not particularly simd friendly.

If you're not afraid of a little bit of approximations, you may consider
replacing sRGB_to_linear with a square and linear_to_sRGB with a square
root (that is, this approximates a 2.2 power function, which itself
approximates the actual sRGB function.)


On Fri, 20 Dec 2019 at 15:24, Ben Harper <[email protected]> wrote:

> I'm trying to write a function using ISPC, which takes as input:
>
>    - A scanline of a glyph, as rendered by Freetype. This is a uint8
>    alpha mask.
>    - A target RGBA (uint8 x 4) surface.
>    - A 3-element color (RGB) to draw onto the surface, masked by the
>    glyph's alpha channel
>
> and does the following:
>
>    - Composite the glyph scanline onto the target surface, using the OVER
>    operator, and with gamma-correct blending (ie transform source and
>    destination from sRGB to linear float, then perform the blending in linear
>    space, then transform back to sRGB, and write out to target surface).
>
> This goal here is straightforward - it's just the final step needed to
> consume Freetype output, and show it on the screen.
>
> Here is a gist of my ISPC function:
> https://gist.github.com/bmharper/c5d194dd04b79f8db55de60edff53ae0
>
> I'm compiling with: ispc --target=avx2-i32x4 --opt=fast-math
>
> It feels like I'm doing this wrong. I get a bunch of gather/scatter
> warnings, and the generated code of the inner loop feels a little too long,
> and I get the feeling I could build it quite a bit better if I hand-crafted
> it.
>
> These are the ispc compiler warnings:
>
> *blend.ispc:21:17: **Performance Warning**: Conversion from unsigned int
> to float is slow. Use "int" if possible*
>
> float alpha = glyph[alphaChan] / 255.0f;
>
>                 ^^^^^^^^^^^^^^^^
>
>
> *blend.ispc:22:41: **Performance Warning**: Gather required to load
> value.*
>
> dst[i] = float_to_srgb8((1 - alpha) * sRGBToLinear[dst[i]] + alpha *
> color[i & 3]);
>
>                                         ^^^^^^^^^^^^^^^^^^^^
>
>
> *blend.ispc:22:72: **Performance Warning**: Gather required to load
> value.*
>
> dst[i] = float_to_srgb8((1 - alpha) * sRGBToLinear[dst[i]] + alpha *
> color[i & 3]);
>
>
> ^^^^^^^^^^^^
>
> Can anybody suggest a better paradigm?
>
> Thanks,
> Ben
>
> --
> You received this message because you are subscribed to the Google Groups
> "Intel SPMD Program Compiler Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ispc-users/9018ddcd-16a2-4f41-94df-5777319d0b23%40googlegroups.com
> <https://groups.google.com/d/msgid/ispc-users/9018ddcd-16a2-4f41-94df-5777319d0b23%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ispc-users/CA%2BxE7YUj2%3DiJ%3DRVwVq8LkE81444DpXDNFBXURuvd4W%3D-yTwvjQ%40mail.gmail.com.

Reply via email to