Werner LEMBERG <[EMAIL PROTECTED]>: > My mistake. Anyway, I think XML also knows Unicode character > entities, right? This is what I have meant.
Yes, you can embed Unicode entities in XML. Right now doclifter does this for a handful of cases in which the right ISO entities don't exist. The set of these is currently: ("lh", "☞"), # Hand pointing left ("rh", "☜"), # Hand pointing right ("CR", "␍"), # Carriage return symbol ("fo", "‹"), # Single left-pointing quotation mark ("fc", "›"), # Single right-pointing quotation mark # These are from groff ("yogh", "ȝ"), # Small letter yogh ("ohook", "ơ"), # Small letter o with hook or ogonek ("udot", "Ń"), # Combining underdot. The names on the left are aliases that doclifter generates internally in order to avoid having hard-to-read raw Unicode hex literals in the generated XML. Instead, it generates in the XML preamble one named entity definition for each hex literal, and then uses the named entity. > > In fact, *all* defined groff-1.19 glyphs except the old Bell Labs > > bracket-pile graphics get mapped to ISO entities -- even the exotica > > like yogh and o-with-ogonek. > > o-with-ogonek isn't an exotic letter at all! All Poles will object to > your assertion :-) Not to mention the Lithuanians, and nobody wants to offend a country so full of good-looking women. :-) Looking at my code, I see there is one more exception; troff \*(an, horizontal arrow extension, can't be mapped either. You'd think there'd be an ISO entity for this somewhere in the AMSA arrow set, but there isn't. Nor have I found a Unicode equivalent. > Whatever decision we will find, I won't force anything right now. > Maybe later. Thus I don't evade a decision but postpone it. Well, OK, but you'll only get to temporize until I turn in my patches for the 1.19 development tree. > Hmm. To exaggerate, the only `technical ground' currently is that > doclifter can't handle it. Up to now nobody has ever claimed problems > with groffer.1 -- while I understand your arguments, I don't see an > urgent need to react immediately. I see you've forgotten Gunnar's post on this topic. He actually showed how badly groffer.1 gets mangled in some viewers, with a screen dump. If that doesn't constitute "urgent need", I'm not sure what would. I'm really not trying to use the viewer-portability argument to solve problems exclusive to doclifter. I don't have to do that, because doclifter plus XML stylesheets already generates better HTML from a wider range of manual pages than any of the viewers can. And it's still improving; I just added code to parse ad-hoc tables made with .ta and tabs rather than TBL markup, and I think I'm going to be able to bite a large corner off of the .ti problem next. The constraint is actually the other way around. Gunnar demonstrated that my initial cut at a portable request set was far too large, because doclifter is better at emulating troff than the viewers are. (The cost we pay for this is that doclifter's running time would be too slow and variable for it to be used on the fly even if XSLT to render the DocBook to HTML didn't take much longer. The toolchain is just too slow; you have to batch-translate your man pages in advance and cache the HTML somewhere.) So I am trying to solve the viewer-portability problem now rather than grinding an axe for doclifter. Thank Gunnar for this, because he convinced me it was both worthwhile and possible. He caused me to discover that the difference between what we would have to do to solve doclifter's problems alone and the larger set of things we will have to do to solve viewer portability is small enough that tackling both at once makes sense. So I'm going after the bigger one now, and treating the solution to doclifter's problems as a happy and motivating side effect. If I were still only trying to solve doclifter's problems, groffer.1 could be allowed to live as it is -- I could do what was needed to doclifter to translate it, though that would be painful and I would still prefer not to. (The connection between solving the viewer-portability problem and solving the structure-lifting problem is not an accident. To solve the viewer portability problem, you have to define a sublanguage of troff+man that does not require knowing the fine physical capabilities of the output medium. This turns out to be almost the same subset as the one that can be structurally translated.) > > The problem is that once it is known that you have one, people > > invent all sorts of clever, plausible reasons they should be on it > > rather than doing the bit of extra work needed for a clean solution. > > [... omitting shameless exaggerations ...] > > According to your analysis, groffer.1 is basically the only candidate > which is not going to be fixed easily -- for whatever reasons. Not > bad to have just one single exception out of 10000... That would be one out of 13,000, and no groffer.1 isn't the only one. There about 54 others currently on the too broken-to-live list. These are mostly pages that will break viewers much worse than they break doclifter. Here is a rough breakdown: * 21 pages associated with netpbm. * 8 pages associated with groff. * 6 empty pages generated by broken Perl build machinery * 5 seriously mangled pages generated by the Canna project. (They run several pages together as one, complete with multiple .TH headers.) * 2 pages shipped by a defunct project called wordtrans (viewers handle these OK). * 4 pages generated from Doxygen sources by a very broken reporting tool. * 6 other pages with markup so gnarled that doclifter barfs on it -- mostly these are weird edge cases that trip over bugs in my mandoc interpreter. Maybe three of these could be patched around if I didn't have higher-priority things to do. This is really not good company for the groff documentation to be in. And it's going to be worse company in a couple of weeks when Bryan Henderson and I have fixed the netpbm problems (scheduled, and I know exactly how to do it, but it's not done yet). > It's not necessary to tell anyone that an exception list exists :-) Trust me. They find out :-( <--- Bitter experience speaking again. > > And even for pages that can't be strictly viewer-portable, simplifying > > them to the point where doclifter can lift them will have benefits. > > Uh, oh, I'm not comfortable with `simplifying until doclifter can > handle it'. No, no. You misunderstand. Simplifying until doclifter can handle it is *easier* than the real problem -- cross-viewer portability -- not harder. Simplifying until a page doesn't break non-groff viewers normally solves all of doclifter's problems handily. The exception cases where a page intrinsically can't be viewer-portable are *extremely* rare. Offhand I can only think of four such, groff_char.7 and three from the Canna project that are broken for other reasons. > > > Ideally, they should use groff for formatting (opening a TTY > > > window showing `man' output would be sufficient IMHO) if the > > > number of problems exceeds a certain threshold. > > > > And that's an excellent idea for a general fallback. > > groffer.1 comes to my mind :-) Well, yes. But for this to work, we'd have to push patches out for every single viewer first. That's a rather high price to pay to avoid offending one sulky groff contributor when we can fix the problem in one spot upstream. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff