> > 1. If the input encoding has been explicitly specified, use it. > > > > 2. Otherwise, check whether the input starts with a Byte Order > > Mark. If found, use it.
> > 3. Finally, check whether there is a known coding tag in either > > the first or second input line. If found, use it. > > 4. If everything fails, use a default encoding as given by the > > current locale, or `latin1' if the locale is set to `C', > > `POSIX', or empty. > > I'm willing to try to implement this protocol for doclifter, but it > doesn't settle what the portability rule ought to be, which is our > concern right at the moment. What encoding(s) are we willing to > count on third-party viewers to support? Today, the input encoding of choice is UTF-8, of course. Besides this, the preconv program supports latin1 (because this is the `native' encoding for groff, more or less). Have a look into src/preproc/preconv/preconv.cpp, structure `emacs_to_mime': The comment explains which input encoding sets are worth to support today with tags. Note that everything is piped through iconv to produce the form ASCII + `uXXXX' glyph entities. > > Instead of using the groff's `uXXXX' glyphs, doclifter would > > directly map to HTML entities. > > There may be a misunderstanding here -- doclifter never generates > HTML entities. Instead it generates ISO XML entities. These sets > do overlap, but they are neither formally nor actually > identical. The HTML set is much smaller. My mistake. Anyway, I think XML also knows Unicode character entities, right? This is what I have meant. > In fact, *all* defined groff-1.19 glyphs except the old Bell Labs > bracket-pile graphics get mapped to ISO entities -- even the exotica > like yogh and o-with-ogonek. o-with-ogonek isn't an exotic letter at all! All Poles will object to your assertion :-) > > Well, I won't change groffer.man -- this is his contribution. > > Uh oh. You just invoked my hacker-anthropologist mode...I've seen > this kind of talk before and the results tend to be *bad*. [...] Whatever decision we will find, I won't force anything right now. Maybe later. Thus I don't evade a decision but postpone it. > It's possible that "no change" is the right answer, but because it's > "his contribution" is not a sufficient reason. As the project lead, > you have the responsibility to make a decision on factual and > technical grounds. [...] Hmm. To exaggerate, the only `technical ground' currently is that doclifter can't handle it. Up to now nobody has ever claimed problems with groffer.1 -- while I understand your arguments, I don't see an urgent need to react immediately. > > It seems that grohtml does a quite decent job for this man page: > > What about putting it into an exception list (even if it is the > > only member) so that it is converted with `groff -Thtml' instead > > of doclifter? > > Werner, in situations like this, exception lists frighten the shit > out of me. :-) Nice phrase. > The problem is that once it is known that you have one, people > invent all sorts of clever, plausible reasons they should be on it > rather than doing the bit of extra work needed for a clean solution. > [... omitting shameless exaggerations ...] According to your analysis, groffer.1 is basically the only candidate which is not going to be fixed easily -- for whatever reasons. Not bad to have just one single exception out of 10000... It's not necessary to tell anyone that an exception list exists :-) > And even for pages that can't be strictly viewer-portable, simplifying > them to the point where doclifter can lift them will have benefits. Uh, oh, I'm not comfortable with `simplifying until doclifter can handle it'. It's still us who are setting up the rules, not a program. How many MS Word users do the craziest things just to make this wacky program handle their documents... Let's not argument like that. > It's interesting that you picked groff_char.man as an example, > because I can tell you this: there is no reason in the universe we > should be unable to generate good XML-DocBook from that page. Indeed, there's nothing special in it except a large bunch of glyphs which can't be displayed on all output devices. However, to access them properly, I need groff extensions not available in AT&T troff. > > Ideally, they should use groff for formatting (opening a TTY > > window showing `man' output would be sufficient IMHO) if the > > number of problems exceeds a certain threshold. > > And that's an excellent idea for a general fallback. groffer.1 comes to my mind :-) Werner _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff