Thanks Richard for the pointer. I wish I had seen Jonathan's post. However, it never appeared in the digest I received from the list (nor to me directly) so I never saw it. To be fair, the following is from the HarfBuzz tutorial on the "Why do I need a shaping engine?" page: "For example, in Tamil, when the letter "TTA" (ட) letter is followed by "U" (உ), the pair must be replaced by the single glyph "டு". The sequence of Unicode characters "டஉ" needs to be substituted with a single "டு" glyph from the font." So maybe that needs an edit.
I converted my UTF-8 string to be [0xE0, 0xAE, 0x88, 0xE0, 0xAE, 0x9F, 0xE0, 0xAF, 0x81] and I finally got back the correct glyph identifiers. So thank you all for your responses. I'm sure I'll have more questions as this project evolves. -----Original Message----- From: Richard Wordingham <[email protected]> Sent: April 11, 2019 12:16 PM To: [email protected] Cc: Paul Daughetee <[email protected]> Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs On Thu, 11 Apr 2019 18:03:10 +0000 Paul Daughetee <[email protected]> wrote: > டு [...] > is the ligature formed by the codepoints corresponding to the glyphs ட > and உ. No! You already have been told by Jonathan Kew. டு is the codepoint sequence <U+0B9F TAMIL LETTER TTA, U+0BC1 TAMIL VOWEL SIGN U>; it is **not** the ligature of ட <U+0B9F TAMIL LETTER TTA> and உ <u+0B89 TAMIL LETTER U> . If you don't believe me, paste them into Word and use alt/X to convert the characters to their codepoints. Richard. _______________________________________________ HarfBuzz mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/harfbuzz
