At 2022-12-24T14:43:44-0800, Russ Allbery wrote: > I probably should have assumed. One of the things that I've noticed > over and over about free software is that nothing new ever truly > replaces something old in a comprehensive sense. I can think of very > few programs that truly no one is using any more, because once the > source code is available to keep them alive, someone will keep them > alive. It makes for a rather interesting diversity of software (and > other things; for instance, I still use Usenet).
I'd happily get back on USENET if someone has solved the spam problem. I'm old enough to remember those green-card hawking lawyers who were the harbingers of death. > Oh, so I was going to mention: currently, Pod::Man rolls its own > macros for verbatim text: > > .de Vb \" Begin verbatim text > .ft CW > .nf > .ne \\$1 > .. > .de Ve \" End verbatim text > .ft R > .fi > .. > > This looks basically equivalent to .EX/.EE, Yup. Except for the detail of the name of the constant-width font, which is not consistently defined across implementations or even output devices within an implementation (as already discussed). groff's tmac/an-ext.tmac says these days: .\" Define this to your implementation's constant-width typeface. .ds mC CW .if n .ds mC R > so I thought about using those macros (and defining my own if they're > not available, at least until no one is using older implementations > that don't have them). But the main thing that .EX doesn't support > that the long-standing Pod::Man behavior does is the .ne invocation, > which is used like this: > > # Get a count of the number of lines before the first blank line, which > # we'll pass to .Vb as its parameter. This tells *roff to keep that many > # lines together. We don't want to tell *roff to keep huge blocks > # together. > my @lines = split (m{ \n }xms, $text); > my $unbroken = 0; > for my $line (@lines) { > last if $line =~ m{ \A \s* \z }xms; > $unbroken++; > } > if ($unbroken > 12) { > $unbroken = 10; > } > > This logic is very long-standing and was designed for troff printing of a > manual page (and older nroff setups that still did pagination) to avoid > unnecessary page breaks in the middle of a verbatim block. I'm not sure > how much this matters given how people use man pages these days, but I > hate to break it for no reason. You've managed to wangle a display, and once people get that religion they're loath to give it up. Despite my commitment to a limited man(7) dialect I have proven unable to stop myself from adding `ne` requests to groff's own man pages to keep our PDF compilation from looking ugly. > So I think I'd need to add an .ne line after (before?) the .EE macro > if I switched to it? Well, you can throw away that line counting logic in Perl altogether and simply use `ne` _before_ EX (not EE). Another point of detail is that you should break with `br` _before_ the `ne` request. `ne` won't always do what you want if there is a pending output line. I have plans to add keep macros `KS`/`KE` to groff man(7) in the near future; they are probably the least controversial extensions I can possibly add because it will always be okay for an implementation to totally ignore them. No text will be lost or misformatted; page breaks will just happen in dumb places, and for the overwhelming majority of terminal users who experience the continuous rendering default, even that won't apply. > Okay, fair. :) Although historically people sometimes did, and of > course once upon a time people would sometimes typeset the full manual > for something with troff. They still do. Alex Colomar, the new linux-man maintainer, is shy of learning ms(7) or any other macro package. If a "full manual" doesn't need features that man(7) doesn't provide, I see no real harm in using it for non-man-page documents. Colin Watson's "-l" extension to man(1) has made this extremely straightforward. > That output probably isn't as nice as it used to, since I have > subsequently dropped a lot of the attempted magic that only applied to > troff output (replacing paired " quotes with `` '', adding small caps > to long strings of all capital letters, and things like that) because > they were all using scary regexes and occasionally broke things and > mangled things in weird ways, causing lots of maintenance issues. Yes, and there are concerns I would raise with both of those helpful bits of automagic anyway. > > Yes. But there are two problems to solve: (1) acceptance of Unicode > > (probably just UTF-8) input > > I was pleasantly surprised at how well this just worked with the > man-db setup on a Debian system, although I think that may involve a > fair amount of preprocessing. Mainly just running preconv(1), I think, which groff has supplied since 1.20, so for about 14 years I guess. > Just to provide additional detail for the record (and this is almost > certainly the sort of thing you mean by "acceptance of Unicode input") > here's the simple document I was using for some testing. > > https://raw.githubusercontent.com/rra/podlators/main/t/data/man/encoding.utf8 > > % groff -man -Tpdf -k encoding.utf8 > encoding.pdf > troff: encoding.utf8:72: warning: can't find special character 'u0308' > troff: encoding.utf8:74: warning: can't find special character 'u1F600' > > u1F600 is presumably a problem with the output font, Yes. Try that to the terminal (-Tutf8) and it should work. > but u0308 is a combining accent mark that groff does definitely > support, just not as a separate character. Right. It's \[ad]. > (Without preconv, one instead gets mojibake, as I expected.) I got warnings, too (using -ww): troff:EXPERIMENTS/encoding.utf8:72: warning: invalid input character code 136 troff:EXPERIMENTS/encoding.utf8:74: warning: invalid input character code 159 troff:EXPERIMENTS/encoding.utf8:74: warning: invalid input character code 152 troff:EXPERIMENTS/encoding.utf8:74: warning: invalid input character code 128 There is a whole universe of validity problems to cope with even if we had support for direct input of valid UTF-8. :( > My theory was that combining accent marks pose a bit of an interesting > issue for groff because groff probably shouldn't think of them as a > separate output character that can be mapped in an output font, but > instead needs to essentially transform them into something like > \[u0069_0308] during the input processing. (This may therefore > essentially be a preconv bug as opposed to a troff bug, and maybe > nroff gets away with it because it can just copy combining accent > marks to the output device and let xterm take care of rendering.) I don't actually know if xterm performs combinations like this or it expects precomposed characters. The groff_char(7) man page from groff Git covers some of this stuff in increased detail, such as `composite` request and the Normalization Form D requirement. But the discussion still may not be complete, as I haven't tried to solve the Unicode input problem myself. Fortunately we have a patch pending for CJK/UTF-16 font support which promises to give me an excuse to widen groff's internal character type. Here's hoping I haven't worn out the submitter's patience while I tried to get 1.23.0 ready... > It all makes sense when viewed through the lens of the *roff language, > but of course in the Unicode world one expects to be able to just > produce a stream of code points and have everything cope. Yes..."just coping" is achieved with a massive pile of standards documents that augment the ISO 10646 character encoding. :D > I am sad that currently Pod::Man is one of the impediments to good > rendering of manual pages in other formats, since I make use of more > of the *roff language (mostly to work around bugs) than those tools > often understand. So I have an incentive to want to simplify the > output as much as I can, consistent with remaining portable. Consider me a resource for this effort. Regards, Branden
signature.asc
Description: PGP signature