Hi Anthony! Good to hear from you. At 2020-10-30T16:44:22-0600, Anthony J. Bentley wrote: > Hi Branden, > > On Thu, Oct 29, 2020 at 5:06 AM G. Branden Robinson > <g.branden.robin...@gmail.com> wrote: > > The escaped versions of these characters are actually needed less > > often than one might think. That is a quantitative observation of > > significant qualitative impact. > > I agree. If it were merely a matter of paying attention while > authoring new manuals, it wouldn't be a serious problem. But... > > > The apostrophe is admittedly more frequent. But not ultra-common. > > Literal ASCII apostrophe is incredibly common in existing manuals, > whether shell examples or configuration files or source code snippets. > As is ~. If ` and ' get transformed to quotes in terminal output, > transforming ^ and ~ to U+02C6 and U+02DC can't be far behind.
Good idea! 3;-) > Changing the rendering without fixing manuals is not free. There is a > cost in user frustration. That's why I think distributors should absorb the remappings into their man.local files, which reside in /etc (or an equivalent place) and announce their availability for configuration to the user, instead of living in /usr/<whatever>/groff which usually means "if you touch this, stuff might break". > You argue that manpage authors should be aware of troff's > idiosyncracies, but surely you don't think *readers* have the same > obligation. (Well, reading further into the mail, perhaps you do!) I think it's possible for naïve readers to notice the difference between ' ` ´ ‘ and ’ with anything but a minimal level of attention. These glyphs _are_ confusable which is why I propose to ensure that we get the right ones into our man pages _at the source_. And to do that, we need mechanisms of detecting when they're wrong. That's why I made this commit; I needed it. And some of those who share my concerns will, too. It _far_ easier for me to run my little script that shows me diffs in rendered pages between commit A and commit B than to catch such problems by diffing roff source. > I fear a loss of mindshare. These days documentation is often an > afterthought. Quality man pages even more so. I've spent considerable > time and effort convincing authors to consider a typesetter they > consider to be archaic. I think it likely that triggering unexpected, > frustrating rendering changes like this will drive software developers > even further in the direction of HTML and Markdown. I sympathize, and I don't want that outcome either. What to do? I have some ideas below. > > As you've pointed out in the frequency studies I did a few years > > ago, raw counts get thrown off by the high volume of man pages that > > are actually composed in something else altogether, like DocBook or > > POD. Fix those tools, and many pages correct these defects as soon > > as they are generated again. > > In OpenBSD alone, uses of ', `, ~ and ^ that will need escaping number > easily in the thousands—and I'm only including uses within > human-authored -mdoc pages in that number. I would not curse anyone with performing such changes one by one in a text editor even with global replacement operations. I imagine, based on my experiences with groff's mere 60 pages, that it can be done in fits with sed scripts that recognize certain tropes. You need not rip out the remappings until the work is done, or so close to done that the remaining stragglers in perverse cases are thought to be so obscure that they won't contribute measurably to the frustrated-reader problem. > I don't share your optimism that roff-generating tools will be fixed. > DocBook's generated manpages have been truly awful for many years; > when has that ever improved? I noted on this list some years ago that docbook-to-man seems to poison everyone who touches it. By that reasoning, all someone has to do to get rid of me and my crazy schemes is encourage me to fix it. ;-) > Does POD even provide a capability to semantically separate prose from > code literals and escape characters accordingly? I know Russ Allbery from Debian and he's a reasonable guy. I do so little Perl programming that I'm not even sure what pod2man[1] actually gets _wrong_ in this department. But I'm confident Russ can be approached with a well-motivated change request if the problem is articulated clearly. > Similarly, I don't share your optimism that manuals themselves will be > fixed, slowly or quickly. Especially since you suggest distributions > turn this off in man.local or render manuals in ASCII! It is a little difficult to argue against a position which holds simultaneously that my commit was both far too disruptive and will have negligible impact. Granted, I haven't modified the groff's sample man.local (it has nothing in it but a comment header) to shift the remappings over there, but I'm happy to do so if people think the _major_ redistributors of groff are so inattentive that they wouldn't do so themselves. I was hoping to stimulate consideration, on the part of groff packgers, as to whether they'd like to help move this ball forward with their respective communities. I also acknowledge that this change is worthy of an item in the NEWS file. > > It may thus perhaps be a mortifying realization on your part that I > > have plans to fix all our hyphens, too, and remove _that_ part of > > our an-old.tmac, too. > > I'm not familiar with this. Are proposing translating - to U+2010 as > well? Yes. But only after I fix all groff's own pages to do the right thing. That might take some time. I haven't measured the problem yet. > Will you argue that literal ASCII hyphens are "not ultra-common" > in manpages too? Be serious. Oh, no, they _are_ ultra-common. There's one right there. The good news is that man page writers have much higher awareness of the hyphen-minus problem already, and correct practice is already widespread. So hyphens that should be pastable dashes are already '\-' in many cases. > They trusted the user, in an environment where a higher proportion of > readers were familiar with the formatting language in question, where > there were few alternative means of producing documentation, and > perhaps most importantly, where copy & paste did not exist. The copy and paste point is the best of these. I want to get the world's man pages--or, at the very least, groff's--to the point where they copy and paste code specimens correctly _without_ the crutch of this remapping. > This change visibly and obviously affects tens of thousands of troff > documents in the output format in which they are most often read. > Whatever groff does in the end, I just feel like something with such > an impact deserves some discussion first. Certainly. No release has been made and several courses of action are possible. 1. Advise distributors and direct consumers of groff releases to apply the remappings in their site man.local (and mdoc.local[2]) files) if they don't want to see the buggy man pages and (presumably) participate in an effort to get them fixed. 2. Restore the remappings, but in our tmac/man.local. Distributors and direct consumers will have to perform a merge with their existing files. 3. Restore the remappings to man.local, but make them conditional on a register that defaults off. 4. Restore the remappings to man.local, but make them conditional on a register that defaults on. 5. Revert the change[3] entirely. 6. Revert the change an un-fix the misuses of ` and ' in code specimens that I've been repairing for the past few years. I posit (6) not because I think anyone is willing to admit to holding the position, but to establish an endpoint for the conservative continuum. By symmetry I suppose there is a Molotov-hurling radical position (0), which is to make a parallel change in tmac/doc.tmac-u and say nothing about it in any form of release notes. This is not my position but I think Ingo feared that it was. He's accustomed to being alarmed by me. :D I'd be happy with any of (1) through (4), with a mild grumbling crankiness increasing with the integers. My biggest problem with (3) and (4) is thinking of a good register name (this is groff, so we need not limit ourselves to two characters). I think any of the first four avenues merits some sort of mention in NEWS. What do you think? Regards, Branden [1] https://metacpan.org/pod/distribution/podlators/scripts/pod2man.PL [2] But I haven't removed the remappings from mdoc yet, as Ingo noted. [3] 697e6db7fcacd403f5dde682002d02caa52e48df https://lists.gnu.org/archive/html/groff-commit/2020-10/msg00087.html
signature.asc
Description: PGP signature