More on Tibetan, or rather: ligatures

Oliver Corff via Sun, 21 Jan 2024 13:26:59 -0800

Hi,

Deri already followed up the conversation that was prompted by Tom's
questions regarding Tibetan.


I'll attempt to steer the conversation away from Tibetan towards a more
generic technical issue: processing ligatures (that's what Tom's
problems boil down to).

If we take the Tibetan syllable  རྒྱ, romanized as rgya, with the
components superscript r, baseform ga, subscript y, then what *looks*
like a single glyph is in reality a sequence of three (!) elements:

1. U+0F62 "RA" (but with the ability to change shape when combined; in
   contrast to U+0F6A which looks absolutely the same in the character
   table but does *not* enter into ligatures),
2. U+0F92 "-GA", i.e. subjoined form of base letter U+0F42, and finally
3. U+0Fb1 "-ya", subjoined form of U+0F61 YA.

All stacked vertically in one place. The same "TTT" (tiny Tibetan tower)
can have an additional layer on top (for the vowels e, i, o) or below
(for vowel u). Likewise, there is a base vowel sign for these four
(absent any of these, the vowel a is assumed), but the correct height of
the vowel glyph is taken care of by the font. It is also possible to
have one canonical vowel in the character table but a whole series of
vowel glyphs of different height in a private area of the font, not
necessarily user-accessible.

I haven't inspected the internal structure of the Tibetan fonts I use on
my machine, but the syllable rgya is displayed properly when copied into
a shell prompt, and e.g. in vim the key sequence g a reveals the
composition and the code points. So I assume the font does all the
shaping work, via its lookup tables.

Now the question which is not language-specific: In how far can groff
access these font-internal lookup tables? It appears that the "naive"
approach does not trigger the ligature mechanism in the font, as
demonstrated by Tom's and Deri's examples.

Is it possible that every \[u0Fxx] is (perhaps invisibly) isolated, akin
to putting every character in {f}{f}{l} if you want to make sure in TeX
that no ligature will spring into action?

I tried to test this hypothesis by making a minimal document, ff.roff

.P
ff \" generates ligature in PDF file
\[u0066]\[u0066] \" I hoped to see something like ff, but get an error
message

Yet instead of producing the letter "f", \[u0066] generates an error
message:  "warning: special character '\f' not defined"

Where is my mistake?

I then tried the basic Latin range with other letters, like \[u0041],
but get the message: "warning: special character '\A' not defined"

Which looks as if the character code is translated correctly but the
backslash "special character" component is newly introduced.

Or is there a lower floor for the \[uxxxx] notation which I am not aware of?

So, when typesetting "ff" or "ffi" in groff, will groff build or not
build the ligature and request the glyph [ff] or [ffi] from the font, or
could the font do that based on its own knowledge of ligatures via the
appropriate lookup table?

In other words, for a working implementation of Tibetan in groff, should
I write a series conditional character substitutions, or is there a way
send the characters to the device in such a way that the device and font
know, here comes a ligature?

Either way I am fine - a) accessing the font lookup table, or b)
implement a comprehensive set of ligatures in groff.

Best regards,

Oliver.



--
Dr. Oliver Corff
mailto:oliver.co...@email.de

More on Tibetan, or rather: ligatures

Reply via email to