[Groff] unicode support - questions

Bruno Haible Mon, 23 Jan 2006 12:09:28 -0800

Hi,

So far, I have a first draft of a patch that makes groff work with Unicode
fonts without having to first register thousands of characters. Before
submitting the patch slice after slice, may I have your opinion about four
questions?


  1) In nametoindex.cpp and troff/charinfo.h, the term "ascii_char" and
     "ascii_code" is used for unibyte characters in the input encoding.
     As far as I understand,
       - values >= 128 are possible and valid,
       - when the "latin1" device or "cp1047" device or "latin2" device
         (found in some Linux distributions) is used, values >= 128
         denote characters of this encoding.
     So I would like to rename these to "single_char" and "single_char_code"
     respecively. Is that OK? Do you find "unibyte" a better term?

  2) When CP1047 is used, and commands like .trin \[char72]\[,c] are active,
     does the font::name_to_index API see the character name before or
     after the translation? I.e. does it see "char72" or ",c"?

  3) My current patch creates two subclasses 'enumerated_font' and
     'unicode_font' of 'class font'.

     An enumerated font has all its characters enumerated in the font file.
     A unicode font covers all combined Unicode characters (consisting of a
     base character and zero or more combining characters).

     The subclasses in the HTML and TTY backends inherit from 'unicode_font',
     whereas the others inherit from 'enumerated_font'.

     Is it imaginable that a driver/backend might want to use both kinds
     of font? In that case I would merge back both classes into 'class font',
     and use a boolean is_unicode flag to distinguish the cases. The code
     becomes less pretty this way but it would avoid a possible problem in
     some future drivers/backends.

  4) Currently the API of nametoindex.cpp has a different implementation
     at the end of troff/input.cpp. My current patch needs to go back from
     the index to the character name, and so an additional inverse table
     mapping index -> character name needs to be introduced. This takes up
     memory and causes extra memory references. I would be inclined to
     replace this "int index" with a pointer to an abstract class, say
     abstract_char, of which the 'class charinfo' (on the troff side) and
     'class backend_char' (for the backends) would be subclasses. This
     would not only consume less memory but also make the code more robust
     (as it is easier to misuse an 'int' accidentally). What do you think
     about this?

Bruno



_______________________________________________
Groff mailing list
Groff@gnu.org
http://lists.gnu.org/mailman/listinfo/groff

[Groff] unicode support - questions

Reply via email to