Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

Paul Daughetee Thu, 11 Apr 2019 11:03:46 -0700

I agree, the font question does seem to be irrelevant. I was just responding to 
Cody’s comments. However, I’m still lost on getting the correct ligature back 
from the HarfBuzz shaping engine when I give it a simple Tamil word comprised 
of the Tamil characters ii, tta and u. According to Google, this word “ஈடு” 
translates to the verb “compensate.” “ஈடு” is the two glyphs ஈ and டு , the 
latter of which is the ligature formed by the codepoints corresponding to the 
glyphs ட and உ.


So is it a question of enabling the correct font features? Is there something 
beyond the basic examples that’s required to get the shaper to return the 
ligature for the tta and u consonant-vowel combination? Do I have a basic 
misunderstanding of what HarfBuzz does?

Here’s a bit of the code I’m using. It’s derived from the example found in git 
here: tangrams/harfbuzz-example 
(https://github.com/tangrams/harfbuzz-example/tree/a267a0032aa429b2f86959a9f083c607c506bed7).

In that last loop in FDHBShaper, I understand that the glyph id’s are NOT 
Unicode code points but are the id’s assigned in the font. What I’m getting 
back (the output) are the same id’s that correspond to my input. Should I be 
getting two glyph id’s back (ஈ and டு ) for the three I input ( ஈ, ட and உ ) to 
the shaper?

#pragma once

#include "hb.h"
#include "hb-ft.h"
#include <vector>

using namespace std;

typedef struct {
       unsigned char* buffer;
       unsigned int width;
       unsigned int height;
       float bearing_x;
       float bearing_y;
} Glyph;

typedef struct {
       std::string data;
       std::string language;
       hb_script_t script;
       hb_direction_t direction;
       const char* c_data() { return data.c_str(); };
} HBText;

namespace HBFeature {
       const hb_tag_t KernTag = HB_TAG('k', 'e', 'r', 'n'); // kerning 
operations
       const hb_tag_t LigaTag = HB_TAG('l', 'i', 'g', 'a'); // standard 
ligature substitution
       const hb_tag_t CligTag = HB_TAG('c', 'l', 'i', 'g'); // contextual 
ligature substitution
       const hb_tag_t PstsTag = HB_TAG('p', 's', 't', 's'); // ? ligature 
substitution

       static hb_feature_t LigatureOff = { LigaTag, 0, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t LigatureOn = { LigaTag, 1, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t KerningOff = { KernTag, 0, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t KerningOn = { KernTag, 1, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t CligOff = { CligTag, 0, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t CligOn = { CligTag, 1, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t PstsOff = { PstsTag, 0, 0, 
std::numeric_limits<unsigned int>::max() };
       static hb_feature_t PstsOn = { PstsTag, 1, 0, 
std::numeric_limits<unsigned int>::max() };
}

class FDHBShaper
{
public:
       FDHBShaper(const string& fontFile);
       virtual ~FDHBShaper();

       void init();
       void initText(HBText& text);
       void addFeature(hb_feature_t feature);

private:

       FT_Library lib;
       FT_Face* face;

       hb_font_t* font;
       hb_buffer_t* buffer;
       vector<hb_feature_t> features;
};

FDHBShaper::FDHBShaper(const string& fontFile)
{
       FT_Error error = FT_Init_FreeType(&lib);
       assert(!error);

       float size = 50;
       face = new FT_Face;

       error = FT_New_Face(lib, fontFile.c_str(), 0, face);
       assert(!error);
}

FDHBShaper::~FDHBShaper()
{
       hb_buffer_destroy(buffer);
       hb_font_destroy(font);

       FT_Done_Face(*face);
       delete face;
}

void FDHBShaper::addFeature(hb_feature_t feature)
{
       features.push_back(feature);
}

void FDHBShaper::init()
{
       font = hb_ft_font_create(*face, NULL);
       buffer = hb_buffer_create();

       hb_buffer_allocation_successful(buffer);
}

void FDHBShaper::initText(HBText& text)
{
       hb_buffer_reset(buffer);

       hb_buffer_set_direction(buffer, text.direction);
       hb_buffer_set_script(buffer, text.script);
       hb_buffer_set_language(buffer, 
hb_language_from_string(text.language.c_str(), text.language.size()));
       size_t length = text.data.size();

       hb_buffer_add_utf8(buffer, text.c_data(), length, 0, length);

       hb_shape(font, buffer, features.empty() ? NULL : &features[0], 
features.size());

       unsigned int glyphCount;
       hb_glyph_info_t *glyphInfo = hb_buffer_get_glyph_infos(buffer, 
&glyphCount);
       hb_glyph_position_t *glyphPos = hb_buffer_get_glyph_positions(buffer, 
&glyphCount);

       for (unsigned int i = 0; i < glyphCount; ++i)
       {
              hb_codepoint_t glyphid = glyphInfo[i].codepoint;
       }
}

From: Behdad Esfahbod <[email protected]>
Sent: April 11, 2019 8:58 AM
To: Bobby de Vos <[email protected]>
Cc: Paul Daughetee <[email protected]>; Cody Planteen 
<[email protected]>; [email protected]
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex 
glyphs

What you say seems irrelevant to me. Jonathan is correct.

On Thu, Apr 11, 2019, 11:34 AM Bobby de Vos 
<[email protected]<mailto:[email protected]>> wrote:

Paul,

You don't need to convert the Google Tamil font to OpenType, Google has already 
done that at

https://github.com/googlei18n/noto-fonts/tree/master/phaseIII_only/unhinted/otf/NotoSansTamil

However, I don't think those fonts will solve your issue. The list of shapers 
that you mention are different technologies to specify the complex shaping 
(such ligatures, positioning, sub forms, half foms, etc). Indeed, OpenType is 
one such technology. SIL Graphite and Apple Advanced Typography (AAT) are other 
technologies to do this.

TrueType fonts can contain OpenType shaping instructions. You do not have to 
have an OpenType font format to use OpenType shaping.

TrueType fonts have quadratic Bézier curves for their glyphs. Fonts in the 
OpenType font format can use the same quadratic Bézier curves, or cubic Bézier 
curves. The OTF files I mentioned above have cubic Bézier curves.

https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Fonts

If I have mis-understood your situation, and/or made any errors if what I 
wrote, I apologize.

Bobby
On 2019-04-10 3:25 p.m., Paul Daughetee wrote:
Thanks for the quick response. I’m a licensed user of FontCreator Professional 
Edition from High-Logic and have the most recent update to version 11.5 
installed.  The correct ligature is displayed when I type the tta and u Tamil 
characters into the test string edit box in the OpenType Designer dialog. In 
the box just below the test string the two characters are displayed unless I 
check either the _shaper or psts feature check box. If one of those is checked, 
then the correct ligature is displayed. So I guess Google did get the Tamil 
font right but I cannot seem to get HarfBuzz to return a single glyph id when 
presented with a buffer containing the tta and u Tamil characters. I’ve tried 
adding various features when calling hb_shape but that doesn’t seem to change 
anything.

I noticed that when I list shapers using a call to hb_shape_list_shapers, the 
only shaper listed is “ot”. So I guess my next try will be to convert the 
Google Tamil true type font to an open type font and see if that makes any 
difference. If it does, I guess I’ll be having a “duh” moment.

From: Cody Planteen <[email protected]><mailto:[email protected]>
Sent: April 10, 2019 12:38 PM
To: Paul Daughetee <[email protected]><mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex 
glyphs

It's possible your font isn't doing what you think it should be. You can test 
this theory with the tool High-Logic FontCreator for Windows. I believe there 
is a free evaluation. You can open up your font, then go to Font -> OpenType 
Designer. In this dialog, you can enter your test string and see what glyphs 
come out.

https://www.high-logic.com/font-editor/fontcreator


On Wed, Apr 10, 2019 at 1:19 PM Paul Daughetee 
<[email protected]<mailto:[email protected]>> wrote:
Let me give you a little more info. I just recently built and installed vcpkg 
and used it to install HarfBuzz on Windows 10. It installed version 2.3.1-3 of 
the static libraries for Window x86. I linked my app to the HarfBuzz library 
and its dependencies. I added code to my app to capture single words that I 
could send to be processed by HarfBuzz as they were typed by the user. I 
installed Google’s NotoSansTamil true type font after verifying that it 
properly defined substitutions for the ligature that is formed by the Tamil 
consonant “tta” when paired with a vowel such as “u” or “I”. After processing a 
UTF-8 string containing the consonant and the vowel “tta” and “u” [0xE0, 0xAE, 
0x9F, 0xE0, 0xAE, 0x89], the hb_glyph_info_t object I get back has tow glyph 
indices, the same indices as the “tta” and “u” (17, 10) rather than the index 
for the “ttauvowelsign” (116) ligature I expected. My code is virtually 
identical to the examples found in the HarfBuzz wiki and to several examples 
found in git. Any help here would be greatly appreciated.

From: Behdad Esfahbod <[email protected]<mailto:[email protected]>>
Sent: April 8, 2019 1:47 PM
To: Paul Daughetee <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex 
glyphs

On Mon, Apr 8, 2019 at 4:12 PM Paul Daughetee 
<[email protected]<mailto:[email protected]>> wrote:
I’m new to HarfBuzz and attempting to use it for converting a UTF-8 string that 
contains one or more sets of codepoints that should combine to form single 
complex glyphs to the correct string of glyphs. I’ve followed numerous examples 
and they all lead me to the point where I use hb_buffer_get_glyph_infos to get 
what I thought would be a hb_glyph_info object that contains the codepoints for 
the glyphs I seek. So my first question is as follows. Is that what I should be 
getting? I ask because I’m not getting what I would expect to get.

Yes.


I can’t even successfully get a complex glyph to represent the combination of 
the letter A and the grave accent. So if I’m just confused as to how or what 
HarfBuzz does, please help me find a better path. Thanks!

What do you get?  A + grave-accent only forms one glyph if the font was 
designed so.  It may very well be represented by two glyphs.

_______________________________________________
HarfBuzz mailing list
[email protected]<mailto:[email protected]>
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


--
behdad
http://behdad.org/
_______________________________________________
HarfBuzz mailing list
[email protected]<mailto:[email protected]>
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


_______________________________________________

HarfBuzz mailing list

[email protected]<mailto:[email protected]>

https://lists.freedesktop.org/mailman/listinfo/harfbuzz
--
Bobby de Vos
[email protected]<mailto:[email protected]>
_______________________________________________
HarfBuzz mailing list
[email protected]<mailto:[email protected]>
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

Reply via email to