> > Additionally, we need support for handling Unicode ranges: > > > > CJKpunct u3000 - u303F; > > My intention was that any valid glyph name was to be valid as a > class character, but name_to_glyph apparently doesn't handle > Unicode characters. Should I be using a different function, or > should I extend name_to_glyph?
The range mechanism must rely not on glyph names (or indices), but on Unicode values. This is, all (predefined) groff entities must be mapped to the equivalent Unicode value(s) as given in file glyphuni.cpp. Similarly, `uXXXX' entities have to be converted too. For example, foo A - u1000 ; is the range U+0041 - U+1000. For simplicity I think it's best to disallow composite entities in ranges (but not as single values). For single values, I suggest to map everything to decomposed Unicode values (using the data in uniglyph.cpp and uniuni.cpp). For example, bar 'A *S ; should denote the entities u0041_0300 and u03A3. To implement this, we probably need two structures: The first one contains Unicode ranges, something like struct range { int first; int last; } and the second one is an array of arrays of composite Unicode values: struct composite { int base; int components[]; } This finally gives struct font_class { range ranges[]; composite composites[]; } For a given class, the lookup process first checks the `ranges' array, then it walks over the `composites' array, and after something has been found it is converted to a glyph index. [The above code is just for demonstration purposes, not to be meant for a real implementation.] Reason for using Unicode values everywhere and not directly glyph indices: The reuse of the class mechanism on the input side, and doing so should not depend on fonts. Example: .class ClassName A B C D E .class EquivalentClass A - E .class UppercaseAlphabet @EquivalentClass \ F - Z .class MostEfficient A - Z .class Identifier - A - Z a - z .class EquivIdentifier A - Z - a - z Note that e.g. kinsoku shori is an input character property, not a glyph property! As such, it shouldn't be implemented in the font definition files but in a start-up file of groff. > Even though character classes are stored in font files, they are > properties of the glyphs, not of the fonts. In other words, all > instances of the glyph 'A' will have the same attributes. You > probably want to put the same classes and attributes in every font > file; otherwise, you will get different results based on the order > in which fonts are loaded. This is a bad limitation, I think. Consider this: font 1: classes Alike A :A 'A `A ; properties kern @Alike V -3 ; font 2: classes Alike A 'A `A ; properties kern @Alike V -5 ; With other words, classes within a font description file should be local to this font's glyphs. Contrary to that, classes on the input side should be indeed valid for all characters. Example: .classflags 2 @CJKprepunct .classflags 4 @CJKpostpunct .classflags 128 @CJK Please comment. Werner