On Wed, Oct 31, 2018 at 11:28:11AM +0000, Laurent CRUAU wrote: > Hello there, > > I am pretty new to harfbuzz but anyway I had not been into trouble for long > using arabic shaping until recently. > And now I am submitted something weird with very few Arabic strings (the vast > majority of them do not cause any problem). > > I use HB v1.0.1 on Ubuntu 16, using the regular ArialTTF mscorefont. I also > tried HB v2.0.2. on an embedded target and got the same issue. > > Consider the following utf16 string: > "\x8D\xFE" "\xDF\xFE" "\xB4\xFE" "\xE0\xFE" "\x8E\xFE" "\xE1\xFE" "\x20\x00" > "\xCB\xFE" "\xE0\xFE" "\xF4\xFE" "\xDC\xFE" "\xE2\xE" > Or the following UTF8: > "\xEF\xBA\x8D\xEF\xBB\x9F\xEF\xBA\xB4\xEF\xBB\xA0\xEF\xBA\x8E\xEF\xBB\xA1\x20\xEF\xBB\x8B\xEF\xBB\xA0\xEF\xBB\xB4\xEF\xBB\x9C\xEF\xBB\xA2\x00";
How did you get the string? It uses Arabic Presentation Forms, and though it is technically valid Unicode text, that is not usually the kind of input HarfBuzz should be taking. > After shaping has been performed, the following string is counted for 11 > glyphs (i.e. w/ hb_buffer_len). The number of output glyphs does not have to be the same as the number of input characters. If there are ligatures then the number of glyphs can be less, and if there are any decompositions, then the number of glyphs can be more. In general your code should not make any assumptions about the number of glyphs based on the number of input characters. To match output glyphs with input characters, you should use the cluster field of glyph info. Regards, Khaled _______________________________________________ HarfBuzz mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/harfbuzz
