So, could have a deep look at the patches here. They're pretty neat. I'll
just recommend documenting the subtle computations in
ft_smooth_slow_spans() a little better, and avoid branches altogether, by
using bit twiddling to perform saturated addition instead (removing
branches from loops is always best for performance). I.e. something like
the following:
/* This function averages inflated spans in direct rendering mode.
* It assumes that coverage spans are rendered in a SCALE*SCALE
* inflated pixel space, and computes the contribution of each
* span 'sub-pixel' to the target bitmap's pixel. I.e.:
*
* If (x, y) are a pixel coordinates in inflated space, then
* (xt := x/SCALE, yt := y/SCALE) are the pixel coordinates in the target
* bitmap, where '/' denotes integer division.
*
* Let's define GRIDSIZE := SCALE * SCALE, then if `c` is the 8-bit
coverage
* for (x, y) in inflated space, then its contribution to (xt, yt) would
be
* ct := c // GRIDSIZE, where '//' denotes division of real numbers (i.e.
* without truncation to a lower fixed or floating point precision).
*
* Since these can only be stored on 8-bit target bitmap pixels, there
are
* at least two ways to approximate the sum:
*
* 1) Compute `ct := FLOOR(c // GRIDSIZE)`, which means that if all
* pixels in inflated space have full coverage (i.e. value 255),
then
* their contribution sums will be GRIDSIZE * FLOOR(255 /
GRIDSIZE),
* which will be 252 (for SCALE == 2), or 240 (for SCALE == 4).
*
* A later passe will be needed to scale the values to the 0..255
* range.
*
* 2) Compute `ct := ROUND(c // GRIDSIZE)`, in which case the total
* contribution sum may reach 256 for both `SCALE == 2` and
* `SCALE == 4`, which cannot be stored in an 8-bit pixel byte of
the
* target bitmap. To deal with this, perform saturated arithmetic
to
* ensure that the value never goes over 255. This avoids an
* additional rescaling step, and is implemented below.
*/
static void
ft_smooth_slow_spans( int y,
int count,
const FT_Span* spans,
TOrigin* target )
{
unsigned char* dst = target->origin - ( y / SCALE ) * target->pitch;
unsigned int x;
for ( ; count--; spans++ )
{
unsigned coverage = (spans->coverage + GRIDSIZE / 2) / GRIDSIZE;
for ( x = 0; x < spans->len; x++ )
{
/* The following performs a saturated addition of d[0] + coverage */
unsigned char* d = &dst[(spans->x + x) / SCALE];
unsigned int sum = d[0] + coverage;
d[0] = (FT_Byte)(d | -(sum >> 8));
}
}
}
Here's a Compiler Explorer link <https://godbolt.org/z/TiyjEi> that
compares the two implementations.
Can you tell me how to actually test that the code works as expected though?
Thanks
- David
Le mar. 23 juin 2020 à 20:16, David Turner <[email protected]> a écrit :
>
>
> Le mar. 23 juin 2020 à 05:42, Alexei Podtelezhnikov <[email protected]>
> a écrit :
>
>> Hi again,
>>
>> The oversampling is implemented though inflating the outline and then
>> averaging the increased number of cells using FT_RASTER_FLAG_DIRECT
>> mechanism. The first two patches set the stage by splitting the code
>> paths for LCD rendering out of the way and trying
>> FT_RASTER_FLAG_DIRECT for FT_RENDER_MODE_LCD. The third one implements
>> oversampling by replacing the normal rendering with oversampling if
>> SCALE is 2 or 4 (as opposed to 1). Again the proposal is to have it as
>> FT_RENDER_MODE_SLOW eventually. The slightly complicated averaging of
>> cells is due to 255/4+255/4+255/4+255/4 = 252 instead of 255, so we
>> have to do rounding, yet avoid overflowing.
>>
>> Thanks, I'll take a look at your patches.
>
> However, please don't call it FT_RENDER_MODE_SLOW, the fact that it is
> slow is an implementation detail, and we could very well replace this with
> a different algorithm in the future (maybe slow, maybe not). So something
> like FT_RENDER_MODE_OVERLAPPED_OUTLINES seems more appropriate, since it
> describes why you would want to use this mode, instead of what its
> performance profile is :-)
>
> Comments?
>>
>> Alexei
>>
>