Hello there,
I am pretty new to harfbuzz but anyway I had not been into trouble for long
using arabic shaping until recently.
And now I am submitted something weird with very few Arabic strings (the vast
majority of them do not cause any problem).
I use HB v1.0.1 on Ubuntu 16, using the regular ArialTTF mscorefont. I also
tried HB v2.0.2. on an embedded target and got the same issue.
Consider the following utf16 string:
"\x8D\xFE" "\xDF\xFE" "\xB4\xFE" "\xE0\xFE" "\x8E\xFE" "\xE1\xFE" "\x20\x00"
"\xCB\xFE" "\xE0\xFE" "\xF4\xFE" "\xDC\xFE" "\xE2\xE"
Or the following UTF8:
"\xEF\xBA\x8D\xEF\xBB\x9F\xEF\xBA\xB4\xEF\xBB\xA0\xEF\xBA\x8E\xEF\xBB\xA1\x20\xEF\xBB\x8B\xEF\xBB\xA0\xEF\xBB\xB4\xEF\xBB\x9C\xEF\xBB\xA2\x00";
After shaping has been performed, the following string is counted for 11 glyphs
(i.e. w/ hb_buffer_len).
The strange thing is that some arabic speaking persons have told me that
VISUALLY, we still have 12 glyphs. And I can confirm this myself if I paste
this string in an online UTF8/16 decoder. I can move through 12 characters...
Is there some implicit fusion at stake there, or some information I should grab
somewhere to match the visuals ?
I did not mention I played with a lot of HB options to configure shaping and I
hope I have forgot something important. (hb_buffer_set_flags,
hb_buffer_set_unicode_funcs(...get_default()) etc...)
Cheers,
Laurent
Here is my test snippet:
/*----------------------------------------------------------------------------
*
* HarfBuzz arabic shaping text
*
*----------------------------------------------------------------------------*/
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <harfbuzz/hb.h>
#include <harfbuzz/hb-ft.h>
#define ARIAL_TTF ("/usr/share/fonts/truetype/msttcorefonts/Arial.ttf")
#define UTF16_TEST
static const char utf8_content[] =
"\xEF\xBA\x8D\xEF\xBB\x9F\xEF\xBA\xB4\xEF\xBB\xA0\xEF\xBA\x8E\xEF\xBB\xA1\x20\xEF\xBB\x8B\xEF\xBB\xA0\xEF\xBB\xB4\xEF\xBB\x9C\xEF\xBB\xA2\x00";
static const char utf16le_content[] = "\x8D\xFE" "\xDF\xFE" "\xB4\xFE"
"\xE0\xFE" "\x8E\xFE" "\xE1\xFE" "\x20\x00" "\xCB\xFE" "\xE0\xFE" "\xF4\xFE"
"\xDC\xFE" "\xE2\xE" "\x0\x0";
int main( int argc, char** argv )
{
/*data*/
hb_font_t* font;
hb_buffer_t* buffer;
hb_script_t script;
FT_Library flib;
FT_Face face;
int found;
int ret;
/*code*/
ret = -1;
font = NULL;
buffer = NULL;
found = 0;
script = HB_SCRIPT_INVALID;
if( FT_Init_FreeType(&flib) )
{ printf("unable to initialize freetype library\n");
goto main_exit;
}
if( FT_New_Face(flib, ARIAL_TTF, 0, &face) )
{ printf("cannot create face\n");
goto main_exit;
}
font = hb_ft_font_create(face, NULL);
if( !font )
{ printf("uanble to create font\n");
goto main_exit;
}
buffer = hb_buffer_create();
if( !buffer )
{ printf("uanble to create buffer\n");
goto main_exit;
}
// Assign text segment to buffer and examine its properties
#ifdef UTF16_TEST
hb_buffer_add_utf16(buffer, (const uint16_t*)utf16le_content, 12, 0, 12);
#else
hb_buffer_add_utf8(buffer, utf8_content, -1, 0, -1);
#endif
hb_buffer_guess_segment_properties(buffer);
// Get script type of text
script = hb_buffer_get_script(buffer); //Do not check here but Arabic
script IS detected
hb_buffer_set_direction(buffer, HB_DIRECTION_RTL);
hb_buffer_set_language(buffer, hb_language_from_string("ar", -1));
hb_shape(font, buffer, NULL, 0);
printf("SHAPED !\n");
printf("got %d characters as a result\n", hb_buffer_get_length(buffer) );
ret = 0;
main_exit:
//test only, free another day
exit(ret);
}
This email and its content belong to Ingenico Group. The enclosed information
is confidential and may not be disclosed to any unauthorized person. If you
have received it by mistake do not forward it and delete it from your system.
Cet email et son contenu sont la propri?t? du Groupe Ingenico. L'information
qu'il contient est confidentielle et ne peut ?tre communiqu?e ? des personnes
non autoris?es. Si vous l'avez re?u par erreur ne le transf?rez pas et
supprimez-le.
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz