Thanks Richard for the pointer. I wish I had seen Jonathan's post. However, it 
never appeared in the digest I received from the list (nor to me directly) so I 
never saw it. To be fair, the following is from the HarfBuzz tutorial on the 
"Why do I need a shaping engine?" page:  "For example, in Tamil, when the 
letter "TTA" (ட) letter is followed by "U" (உ), the pair must be replaced by 
the single glyph "டு". The sequence of Unicode characters "டஉ" needs to be 
substituted with a single "டு" glyph from the font." So maybe that needs an 
edit.

I converted my UTF-8 string to be [0xE0, 0xAE, 0x88, 0xE0, 0xAE, 0x9F, 0xE0, 
0xAF, 0x81] and I finally got back the correct glyph identifiers. So thank you 
all for your responses. I'm sure I'll have more questions as this project 
evolves.

-----Original Message-----
From: Richard Wordingham <[email protected]> 
Sent: April 11, 2019 12:16 PM
To: [email protected]
Cc: Paul Daughetee <[email protected]>
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex 
glyphs

On Thu, 11 Apr 2019 18:03:10 +0000
Paul Daughetee <[email protected]> wrote:

>  டு  [...]
> is the ligature formed by the codepoints corresponding to the glyphs ட 
> and உ.

No!  You already have been told by Jonathan Kew.

டு is the codepoint sequence <U+0B9F TAMIL LETTER TTA, U+0BC1 TAMIL VOWEL SIGN 
U>; it is **not** the ligature of ட <U+0B9F TAMIL LETTER
TTA> and உ <u+0B89 TAMIL LETTER U> .  If you don't believe me, paste
them into Word and use alt/X to convert the characters to their codepoints.

Richard.
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to