> On May 22, 2020, at 9:32 PM, Eli Zaretskii <[email protected]> wrote:
> 
> Hi,
> 
> This is a bit off-topic, but I thought it could be appropriate to ask
> here, since we have here some of the best experts on this subject.
> 
> We are discussing support for ligatures in Emacs, specifically when
> using HarfBuzz as the shaping engine.  See the discussion from
> 
>  https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html
> 
> The current support for producing ligatures works in the same way as
> complex text shaping for scripts that require that, like Arabic and
> Khmer: the sequences of characters that can be displayed as ligatures
> are identified in advance with suitable regular expressions, and the
> display engine then passes these sequences to hb_shape to produce the
> ligatures.
> 
> This works well for scripts that require complex shaping, because such
> scripts generally have well-defined rules for the sequences of
> codepoints that need shaping.  My original thoughts were that
> ligatures could be supported in the same way, based on the assumption
> that the list of possible ligatures is finite and can be stored in a
> suitable data stricture in advance.

I might be stating the obvious, but what Emacs is doing is a very outdated view 
of text layout. The schism between so called complex text and simple text does 
not actually exist. There are script-specific shaping rules that layout engines 
know and apply, and there are additional/complementary rules provided by the 
font that layout engines also apply.

For all applications care about, they have text with certain properties and 
fonts, and they hand them to the layout engine and get back positioned glyphs. 
Any attempt to second guess the layout engine and classify the text into parts 
that need or do not need shaping is futile.

Fonts can, and do, provide any number of arbitrary glyph interactions (not just 
ligatures), and the only reliable way to know that is to shape and check the 
output.

I think I already said this before, but Emacs should indiscriminately give all 
the text to HarfBuzz (or any other text layout engine it additionally supports) 
and give up on trying to pre-classify text, and is what pretty much any other 
sensible application is doing already. There are many ways to solve potential 
performance issues that does not involve compromising on the text layout.

> However, I'm being told that this assumption is false, and that each
> font defines ligatures from any number of arbitrary combinations of
> characters, and therefore the exhaustive list of the ligatures is in
> practice infinite and cannot be provided in advance.

That is true.

>                                                        The only way of
> doing this right, I'm told, is to either (a) query the font to get the
> list of all the ligatures it supports, or (b) assume any combination
> of characters can produce a ligature, and therefore we need to pass
> all the characters intended for display through hb_shape.  The latter
> in particular is in stark contrast to how the current Emacs display
> code is designed and implemented.

(a) is not realistically possible as doing it properly has pretty much the same 
cost as shaping the text. So your only reliable option is (b).

Regards,
Khaled
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to