Hello there,

I am pretty new to harfbuzz but anyway I had not been into trouble for long 
using arabic shaping until recently.
And now I am submitted something weird with very few Arabic strings (the vast 
majority of them do not cause any problem).

I use HB v1.0.1 on Ubuntu 16, using the regular ArialTTF mscorefont. I also 
tried HB v2.0.2. on an embedded target and got the same issue.

Consider the following utf16 string:
"\x8D\xFE" "\xDF\xFE" "\xB4\xFE" "\xE0\xFE" "\x8E\xFE" "\xE1\xFE" "\x20\x00" 
"\xCB\xFE" "\xE0\xFE" "\xF4\xFE" "\xDC\xFE" "\xE2\xE"
Or the following UTF8:
"\xEF\xBA\x8D\xEF\xBB\x9F\xEF\xBA\xB4\xEF\xBB\xA0\xEF\xBA\x8E\xEF\xBB\xA1\x20\xEF\xBB\x8B\xEF\xBB\xA0\xEF\xBB\xB4\xEF\xBB\x9C\xEF\xBB\xA2\x00";

After shaping has been performed, the following string is counted for 11 glyphs 
(i.e. w/ hb_buffer_len).
The strange thing is that some arabic speaking persons have told me that 
VISUALLY, we still have 12 glyphs. And I can confirm this myself if I paste 
this string in an online UTF8/16 decoder. I can move through 12 characters...

Is there some implicit fusion at stake there, or some information I should grab 
somewhere to match the visuals ?

I did not mention I played with a lot of HB options to configure shaping and I 
hope I have forgot something important. (hb_buffer_set_flags, 
hb_buffer_set_unicode_funcs(...get_default()) etc...)

Cheers,
Laurent


Here is my test snippet:

/*----------------------------------------------------------------------------
*
* HarfBuzz arabic shaping text
*
*----------------------------------------------------------------------------*/

#include <stdio.h>
#include <string.h>
#include <wchar.h>

#include <harfbuzz/hb.h>
#include <harfbuzz/hb-ft.h>

#define ARIAL_TTF ("/usr/share/fonts/truetype/msttcorefonts/Arial.ttf")

#define UTF16_TEST


static const char utf8_content[] = 
"\xEF\xBA\x8D\xEF\xBB\x9F\xEF\xBA\xB4\xEF\xBB\xA0\xEF\xBA\x8E\xEF\xBB\xA1\x20\xEF\xBB\x8B\xEF\xBB\xA0\xEF\xBB\xB4\xEF\xBB\x9C\xEF\xBB\xA2\x00";

static const char utf16le_content[] = "\x8D\xFE" "\xDF\xFE" "\xB4\xFE" 
"\xE0\xFE" "\x8E\xFE" "\xE1\xFE" "\x20\x00" "\xCB\xFE" "\xE0\xFE" "\xF4\xFE" 
"\xDC\xFE" "\xE2\xE" "\x0\x0";

int main( int argc, char** argv )
{
/*data*/
    hb_font_t*      font;
    hb_buffer_t*    buffer;
    hb_script_t     script;
    FT_Library      flib;
    FT_Face         face;
    int             found;
    int             ret;


/*code*/
    ret     = -1;
    font    = NULL;
    buffer  = NULL;
    found   = 0;
    script  = HB_SCRIPT_INVALID;

    if( FT_Init_FreeType(&flib) )
    {   printf("unable to initialize freetype library\n");
        goto main_exit;
    }

    if( FT_New_Face(flib, ARIAL_TTF, 0, &face) )
    {   printf("cannot create face\n");
        goto main_exit;
    }

    font = hb_ft_font_create(face, NULL);
    if( !font )
    {   printf("uanble to create font\n");
        goto main_exit;
    }

    buffer = hb_buffer_create();
    if( !buffer )
    {   printf("uanble to create buffer\n");
        goto main_exit;
    }

    // Assign text segment to buffer and examine its properties
#ifdef UTF16_TEST
    hb_buffer_add_utf16(buffer, (const uint16_t*)utf16le_content, 12, 0, 12);
#else
    hb_buffer_add_utf8(buffer, utf8_content, -1, 0, -1);
#endif
    hb_buffer_guess_segment_properties(buffer);

    // Get script type of text
    script = hb_buffer_get_script(buffer);   //Do not check here but Arabic 
script IS detected

    hb_buffer_set_direction(buffer, HB_DIRECTION_RTL);
    hb_buffer_set_language(buffer, hb_language_from_string("ar", -1));

    hb_shape(font, buffer, NULL, 0);
    printf("SHAPED !\n");


    printf("got %d characters as a result\n", hb_buffer_get_length(buffer) );

    ret = 0;

main_exit:
  //test only, free another day
    exit(ret);
}
This email and its content belong to Ingenico Group. The enclosed information 
is confidential and may not be disclosed to any unauthorized person. If you 
have received it by mistake do not forward it and delete it from your system. 
Cet email et son contenu sont la propri?t? du Groupe Ingenico. L'information 
qu'il contient est confidentielle et ne peut ?tre communiqu?e ? des personnes 
non autoris?es. Si vous l'avez re?u par erreur ne le transf?rez pas et 
supprimez-le.
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to