Re: [Groff] Character class support patch

Werner LEMBERG Sun, 06 Jan 2008 14:00:26 -0800

> > Additionally, we need support for handling Unicode ranges:
> >
> >    CJKpunct           u3000 - u303F;
>
> My intention was that any valid glyph name was to be valid as a
> class character, but name_to_glyph apparently doesn't handle
> Unicode characters.  Should I be using a different function, or
> should I extend name_to_glyph?


The range mechanism must rely not on glyph names (or indices), but on
Unicode values.  This is, all (predefined) groff entities must be
mapped to the equivalent Unicode value(s) as given in file
glyphuni.cpp.  Similarly, `uXXXX' entities have to be converted too.
For example,

  foo  A - u1000 ;

is the range U+0041 - U+1000.  For simplicity I think it's best to
disallow composite entities in ranges (but not as single values).

For single values, I suggest to map everything to decomposed Unicode
values (using the data in uniglyph.cpp and uniuni.cpp).  For example,

  bar 'A *S ;

should denote the entities u0041_0300 and u03A3.

To implement this, we probably need two structures: The first one
contains Unicode ranges, something like

   struct range {
     int first;
     int last;
   }

and the second one is an array of arrays of composite Unicode values:

  struct composite {
    int base;
    int components[];
  }

This finally gives

  struct font_class {
    range ranges[];
    composite composites[];
  }

For a given class, the lookup process first checks the `ranges' array,
then it walks over the `composites' array, and after something has
been found it is converted to a glyph index.  [The above code is just
for demonstration purposes, not to be meant for a real
implementation.]

Reason for using Unicode values everywhere and not directly glyph
indices: The reuse of the class mechanism on the input side, and doing
so should not depend on fonts.  Example:

  .class ClassName          A B C D E
  .class EquivalentClass    A - E
  .class UppercaseAlphabet  @EquivalentClass \
                            F - Z
  .class MostEfficient      A - Z
  .class Identifier         - A - Z a - z
  .class EquivIdentifier    A - Z - a - z

Note that e.g. kinsoku shori is an input character property, not a
glyph property!  As such, it shouldn't be implemented in the font
definition files but in a start-up file of groff.

> Even though character classes are stored in font files, they are
> properties of the glyphs, not of the fonts.  In other words, all
> instances of the glyph 'A' will have the same attributes.  You
> probably want to put the same classes and attributes in every font
> file; otherwise, you will get different results based on the order
> in which fonts are loaded.

This is a bad limitation, I think.  Consider this:

  font 1:

    classes
      Alike  A :A 'A `A ;
    properties
      kern @Alike V -3 ;

  font 2:

    classes
      Alike  A 'A `A ;
    properties
      kern @Alike V -5 ;

With other words, classes within a font description file should be
local to this font's glyphs.  Contrary to that, classes on the input
side should be indeed valid for all characters.  Example:

  .classflags 2   @CJKprepunct
  .classflags 4   @CJKpostpunct
  .classflags 128 @CJK

Please comment.


    Werner

Re: [Groff] Character class support patch

Reply via email to