> On May 24, 2020, at 5:41 PM, Eli Zaretskii <[email protected]> wrote:
>
>>> I almost understand (and agree), sans one part: the "arbitrary parts"
>>> of what you wrote. If we want to produce a ligature out of "ffi", the
>>> shaper will get "fii" and nothing more. Which part here is arbitrary?
>>
>> Sending "ffi" alone is an arbitrary decision. The font might have kerning
>> between "ffi" and what comes before and after it, but you won't get it. The
>> font might not have a ligature for "ffi" at all, but using kerning instead,
>> so you will get kerning between "ffi" glyphs and not other glyphs which is
>> arbitrary. It might be a cursive font that changes glyph shapes based on
>> surrounding glyphs, and you will get that for "ffi" and not elsewhere which
>> is arbitrary.
>>
>> That is just plain wrong, there is no way around it.
>
> So, to make sure I understand the correct solution: you are saying
> that all the text to be displayed should go through the shaper, is
> that right?
>
> If so, how large should be the chunks of text to be passed to the
> shaper in any one call, in order to have a correct result? Would it
> be enough to pass whitespace-separated words one by one? or do we need
> to send entire physical lines (up to the terminating newline
> character)? or maybe an entire paragraph? What is the recommendation
> here?
In general the safest is to pass the whole paragraph of text and the start and
length of each item (item being a run with same font, direction, script, and
language).
This, for example, ensures that HarfBuzz can do basic Arabic-like shaping
across item boundaries e.g. if you break items in the middle of an Arabic word
(due to font change, for example), you still get the initial/medial/final forms
across the boundary as appropriate. Or to put a combining mark at the start of
a paragraph on a dotted circle as it otherwise has no base.
If this is not possible, then you can try to pass enough context, like reach
back and forward to first character that is not a combining mark. This may or
may not be enough.
Shaping space-delimited words is orthogonal to that, context is better be
always provided.
Some fonts do have OpenType lookups that interact with space (e.g. kerning
pairs involving space, or even substitutions involving space), so shaping words
independently will give suboptimal result. You can use HarfBuzz API to find out
if the font has OpenType layout rules involving space, or decide to live with
this limitation. Firefox does this check as it wants to cache individualizing
ideal shaped words when possible, and Chrome used to do that to but I think
they now make sure to retain enough information to avoid unnecessary reshaping
so such a word cache is not needed.
Regards,
Khaled
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz