[HarfBuzz] how to detect missing glyphs e.g. for font substitition

Louis Semprini Mon, 11 May 2015 00:56:49 -0700

What is the most reliable and non-font-dependent way to detect whether a string 
being shaped by hb_shape() has led to any missing glyphs, and to identify where 
those glyphs occur?


When I use the word "missing glyph" here I mean a glyph that is not what the 
user intended for that code point in that context, whether that be a little 
tofu box, a magical hex box, a space glyph (with or without zero advance), a 
diamond, or anything else that has substituted for the glyph that the user 
really wanted.

In particular, after calling hb_shape(),

- can we be guaranteed that a hb_glyph_info_t.codepoint (which is actually a 
glyph index despite the name) of 0 always means "missing glyph" ?

- can we be guaranteed that hb_glyph_info_t.codepoint==0 is the only possible 
value that means "missing glyph" and that no glyph index values OTHER THAN 0 
also mean "missing glyph"?

If not, is there a better way to detect missing glyphs using the output of 
hb_shape(), or some other Harfbuzz call?

If the answer is "yes, except the following cases used with the 
following shapers," that might still be useful, so please elaborate.

Or, must Harfbuzz callers first do a complete, separate pass where they run all 
code points of the input through some kind of mapping routine that uses the 
fonts' 'cmap' and other tables?  The latter would be a shame because it would 
require the Harfbuzz caller to duplicate a vast amount of the complexity that 
is nicely hidden in Harfbuzz in their own code.  It's also a shame because in 
most cases, no font substitution would be needed and so it would be inefficient 
in the average case.

As to the question of what a Harfbuzz caller would/could do after knowing that 
a missing glyph existed, in order to fix the problem, that totally depends on 
the particular application for which Harfbuzz is being used.  If the set of 
possible input code points and the set of possible fonts used to render them 
were totally unconstrained, of course that requires a full general-purpose font 
substitution scheme like that built into major OSes and is a massive project 
that may well depend on a deep knowledge of OpenType tables such as the unicode 
range flags in the 'OS/2' table and others.  But there are plenty of other 
useful cases where a Harfbuzz caller could make use of the 'missing glyph' 
information to institute a quick and effective solution.  For example, any app 
which displays data whose total set of code points is known (either static 
content or dynamic content where the set of code points that need to be 
supported is limited by the market where the app is sold) can reliably choose 
(at code authoring time) a particular fallback font to use if the user's choice 
of font leads to missing glyphs.  

For such situations it would be nice to hang that font substitution decision 
off of a "there were missing glyphs" result from hb_shape() since it would be 
by far the rare case, and the common case of OK glyphs would therefore be 
faster.  So that's why I am asking if there is any such way.

Thanks again all for your helpful answers in this forum.

_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

[HarfBuzz] how to detect missing glyphs e.g. for font substitition

Reply via email to