Hi, So far, I have a first draft of a patch that makes groff work with Unicode fonts without having to first register thousands of characters. Before submitting the patch slice after slice, may I have your opinion about four questions?
1) In nametoindex.cpp and troff/charinfo.h, the term "ascii_char" and "ascii_code" is used for unibyte characters in the input encoding. As far as I understand, - values >= 128 are possible and valid, - when the "latin1" device or "cp1047" device or "latin2" device (found in some Linux distributions) is used, values >= 128 denote characters of this encoding. So I would like to rename these to "single_char" and "single_char_code" respecively. Is that OK? Do you find "unibyte" a better term? 2) When CP1047 is used, and commands like .trin \[char72]\[,c] are active, does the font::name_to_index API see the character name before or after the translation? I.e. does it see "char72" or ",c"? 3) My current patch creates two subclasses 'enumerated_font' and 'unicode_font' of 'class font'. An enumerated font has all its characters enumerated in the font file. A unicode font covers all combined Unicode characters (consisting of a base character and zero or more combining characters). The subclasses in the HTML and TTY backends inherit from 'unicode_font', whereas the others inherit from 'enumerated_font'. Is it imaginable that a driver/backend might want to use both kinds of font? In that case I would merge back both classes into 'class font', and use a boolean is_unicode flag to distinguish the cases. The code becomes less pretty this way but it would avoid a possible problem in some future drivers/backends. 4) Currently the API of nametoindex.cpp has a different implementation at the end of troff/input.cpp. My current patch needs to go back from the index to the character name, and so an additional inverse table mapping index -> character name needs to be introduced. This takes up memory and causes extra memory references. I would be inclined to replace this "int index" with a pointer to an abstract class, say abstract_char, of which the 'class charinfo' (on the troff side) and 'class backend_char' (for the backends) would be subclasses. This would not only consume less memory but also make the code more robust (as it is easier to misuse an 'int' accidentally). What do you think about this? Bruno _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff