Werner LEMBERG <[EMAIL PROTECTED]>: > > Werner Lemberg wanted to know the status of \~. I found 17 uses > > within the groff documentation and 4 outside it. Of those 4, two > > were errors. So it's not much needed for manual pages, which is a > > good thing as it is not portable. In particular, I was unable to > > discover any corresponding ISO entity or Unicode character. > > Both `\<SP>' and `\~' (and \0) are equivalent to
Good. That's the behavior you're already getting from doclifter conversion, so I guess we can close this issue. > > I think we can declare Latin-1 and the intersection of groff glyphs > > with HTML entities portable as well, [...] > > I think this is something beyond us. Restricting man pages to latin-1 > encoding is bad. Right. Gunnar had already mostly persuaded me I was mistaken about this; I was waiting on (and expecting) your confirmation that I had misstepped. You guys understand i18n much better than me, so I will try to do as you and he direct. > Instead, I suggest the route which is outlined in > preconv.man (part of the CVS). > > 1. If the input encoding has been explicitly specified, use it. > > 2. Otherwise, check whether the input starts with a Byte Order Mark. > If found, use it. > > 3. Finally, check whether there is a known coding tag in either the > first or second input line. If found, use it. > > 4. If everything fails, use a default encoding as given by the > current locale, or `latin1' if the locale is set to `C', `POSIX', > or empty. I'm willing to try to implement this protocol for doclifter, but it doesn't settle what the portability rule ought to be, which is our concern right at the moment. What encoding(s) are we willing to count on third-party viewers to support? Gunnar seems to think UTF-8 is the right direction. I could go with that; doclifter happens to be written in Python, which has good UTF-8 support so implementing the right things shouldn't be too hard. > Instead of using the groff's `uXXXX' glyphs, doclifter would directly > map to HTML entities. There may be a misunderstanding here -- doclifter never generates HTML entities. Instead it generates ISO XML entities. These sets do overlap, but they are neither formally nor actually identical. The HTML set is much smaller. In fact, *all* defined groff-1.19 glyphs except the old Bell Labs bracket-pile graphics get mapped to ISO entities -- even the exotica like yogh and o-with-ogonek. It took a lot of work building translation tables, but I have nailed this part of the problem down solid. > > 1) Trim the groff manual pages so they use only the portable subset, > > plus the .SY and .OP macros that Werner and I have characterized. > > While I fully support .SY and .OP I wonder whether we need another > macro to better separate content from formatting issues. Gunnar, any > suggestions here? I would also welcome any such suggestions. Especially from Gunnar, but from anyone else as well. > > Yes, I know, Bernd Warken is in love with the hyperextended macros > > on groffer.1 and elsewhere, and will go ballistic. Too bad for him; > > we've established that they break too much software to live. > > Well, I won't change groffer.man -- this is his contribution. Uh oh. You just invoked my hacker-anthropologist mode...I've seen this kind of talk before and the results tend to be *bad*. It's possible that "no change" is the right answer, but because it's "his contribution" is not a sufficient reason. As the project lead, you have the responsibility to make a decision on factual and technical grounds. If you then fail to carry through that decision merely to avoid upsetting someone, you will be failing your responsibility, your other developers, and eventually your users. And note that I am not saying you should only carry through your decision if it goes the way I want. If you conclude that simplifying the groff-page macros is the wrong thing to do on technical and factual grounds, you should act consistently in accord with that decision and tell *me* to get stuffed. It is not required that either Bernd or I *like* your decision, only that we live with it -- unless we're willing to fork the project and lead the forks ourselves. On any project (with rare exceptions that don't work very well) there is someone who has to make these decisions even when they are uncomfortable and someone is likely to throw a fit. On this project it's you. Sorry to have to rough you up a bit about this, but you're talking about shirking that duty. *Don't.* Evading it never works out well. > It seems that grohtml does a quite decent job for this man page: What > about putting it into an exception list (even if it is the only > member) so that it is converted with `groff -Thtml' instead of > doclifter? Werner, in situations like this, exception lists frighten the shit out of me. The problem is that once it is known that you have one, people invent all sorts of clever, plausible reasons they should be on it rather than doing the bit of extra work needed for a clean solution. The complexity overhead of managing the exceptions goes up at least as the square of the number of exceptions. In an amazingly short time, you end up head-down in a swamp as nasty and fetid as the one you originally set out to drain. Does it sound like I'm speaking from bitter experience? Yes. Yes, in fact, I am. *shudder* Let's not go there...there may have to be an exception list someday, but we should fight to avoid starting one as long as possible. Nothing in the present corpus makes one necessary. > BTW, some man pages documenting groff itself will never be conformant. > It would be completely ridiculous to modify, say, groff_char.man so > that groff specific extensions would be avoided. We need an Orwellian > approach here: All man pages are equal, but some are more equal than > others. :-) I agree with your point here, but let's be careful not to muddle separate issues together -- the undoubted fact that groff_char.man cannot be portable is no reason to refrain from cleaning up pages that *can* be portable, like groffer.1. And even for pages that can't be strictly viewer-portable, simplifying them to the point where doclifter can lift them will have benefits. It's interesting that you picked groff_char.man as an example, because I can tell you this: there is no reason in the universe we should be unable to generate good XML-DocBook from that page. I've already done the hard part by embedding all the right glyph-to-entity mappings in doclifter. > > 2) Patches for .SY/.OP/.EX/.EE/.DS/.DE support should be developed > > for the KDE help browser and shipped as soon as possible. > > What I consider even more important is that all man pagers (which > don't use groff internally) emit a warning if they can't display the > man page correctly. Fair point. I'll add this to the work plan as a long-term item I'm not ready to schedule yet. > Ideally, they should use groff for formatting > (opening a TTY window showing `man' output would be sufficient IMHO) > if the number of problems exceeds a certain threshold. And that's an excellent idea for a general fallback. > > 2) When, in the portable-subset description, can we say that > > .EX/.EE, .SY/.OP, and .DS/.DE should be considered portable and no > > longer need local definitions? > > I really don't know. Just remember that Debian (and thus probably > Ubuntu as well) still uses the groff 1.18 series, for example. Yes. Actually, I suspected before you brought it up that Debian stable is probably the langest release cycle we'll have to cope with. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff