Hi Andries, Thanks for the details.
> (2) You say: `The goal is that "groff -T... -mandoc" on any man page works, > without need to specify the encoding as an argument to groff'. > > (2A) This will work in simple cases, where input encoding and output > encoding and system character set are equal. > ... > /usr/bin/groff -Tnippon -mandocj The input encoding and the output encoding are often different. For example, when a user in a ja_JP.UTF-8 locale views a man page in EUC-JP encoding. The output device is -Tutf8 in this case. The problem with "-Tnippon" is that it needs to specify a particular output device in order to cope with input in EUC-JP. > (3A) man.conf contains the default invocation, like > /usr/bin/nroff -Tlatin1 -mandoc This is bad: The encoding of the output should be determined by the user's current locale, not hardcoded in a configuration file. Get rid of this line in man.conf! > (2B) Maybe this does not have to work - the requirement is that "man ls" > works, not that "groff [options] ls.1" works. No, the goal is really that "groff [options] ls.1" works. When a translator or man page author wants to view a man page, s/he should be able to do so without installing the file in particular directories. > (3C) The iconv hack mentioned earlier today used a charset file > in the directory to indicate the character set of all man pages in that > directory. That's bad, because the meaning of the file changes depending on which directory it sits in. "groff [options] ls.1" needs to work without referring to other files in the same directory. > (4) Yes, character set information in a man page would be desirable. > But it is bad to require it. Why? HTML requires it. XML requires it. We require it in PO files, and there it's a life saver. Emacs requires it in many files, in order to display the file correctly. > Putting the info on the first line of the file is a bad idea. > Many things want to be on the first line. > (The .so directive, the 't and 'e directives, etc.) When there's a .so directive, you don't need to specify the encoding. When there's 't and 'e directives, the comment with -*- coding -*- can come after it, without disturbing groff's determination of the preprocessors to be run. > (-) In short: the system-wide convention (you would choose UTF-8 > but I know people who would choose KOI-8) we have already, it is (3A). Sorry, this needs to go away. Hardcoding output encodings in a configuration file is a no-no. > The man program (and/or groff) can react to the user's locale settings. Yes, that's the way to go. > Since almost all translations are produced by national translation teams > working via the Montreal translation robot, the rules are rather uniform, > and it will not be very difficult to introduce new rules. Thanks, then let's go for the proposed .\" t -*- coding: EUC-JP -*- syntax. Bruno _______________________________________________ Groff mailing list [email protected] http://lists.gnu.org/mailman/listinfo/groff
