Bjarni, At 2024-10-07T23:00:59+0000, Bjarni Ingi Gislason wrote: > Package: groff > Version: upstream, GIT HEAD > Severity: minor > Tags: patch
If this were a Savannah ticket I would close it as "Invalid" immediately. "Unreproducible" also applies. > * What led up to the situation? > > Checking for defects with > > [test-][g|n]roff -mandoc -t -K utf8 -rF0 -rHY=0 -ww -b -z < "man page" For those on the groff@ list who don't also subscribe to bug-groff, allow me to introduce "bjarnigroff", Bjarni's personal fork of groff. https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&func=browse&set=custom&msort=0&report_id=225&advsrch=0&bug_id=&submitted_by=0&category_id=0&severity=0&bug_group_id=0&resolution_id=0&assigned_to=0&status_id=0&plan_release_id=0&summary=bjarnigroff&history_search=0&history_field=0&history_event=modified&history_date_dayfd=8&history_date_monthfd=10&history_date_yearfd=2024&chunksz=50&spamscore=5&boxoptionwanted=1#options > [test-groff is a script in the repository for "groff"] (local copy and > "troff" slightly changed by me). It would be more illuminating to note that "test-nroff" is a script of your creation that does not, and has not ever, existed in GNU groff. > * What was the outcome of this action? > > troff: backtrace: file '<stdin>':4162 > troff:<stdin>:4162: warning: trailing space in the line GNU troff does not emit this diagnostic. Trailing spaces have (fairly) well-defined semantics in *roff. roff(7): Trailing spaces on text lines ... ... are discarded. The formatter flushes any pending output line upon encountering the end of input. After the formatter performs an automatic break, it may then adjust the line, widening inter‐word spaces until the text reaches the right margin. Extra spaces between words are preserved. Leading and trailing spaces are handled as noted above. ... On AT&T troff/nroff, trailing spaces also cancel end-of-sentence detection. It might be worth having this as a style warning if/when GNU troff gets a "style" warning category. https://savannah.gnu.org/bugs/?62776 > troff: backtrace: file '<stdin>':11 > troff:<stdin>:11: warning: macro 'I' not defined This is alarming. The `I` macro has been in the macro language since 1979[1] and there is no prospect of it going away. This is a very bad bug in your fork. I urge you to fix it. > troff: backtrace: file '<stdin>':18 > troff:<stdin>:18: warning: macro 'MR' not defined `MR` is a Plan 9 from User Space troff and groff man(7) macro. groff_man(7): Hyperlink macros Man page cross references are best presented with .MR. Text may be hyperlinked to email addresses with .MT/.ME or other sorts of URI with .UR/.UE. ... .MT, .ME, .UR, and .UE are GNU extensions supported by Heirloom Doctools and mandoc (.UR/.UE since 1.12.3; .MT/.ME since 1.14.2) but not by Documenter’s Workbench, Plan 9, or Solaris troffs. Plan 9 from User Space’s troff implements .MR. See subsection “Use of extensions” in groff_man_style(7). [I expect mandoc(1) to support `MR` in its next release.[1]] [...] Prepare arguments to .MR, .MT, and .UR for typesetting; they can appear in the output. Use special character escape sequences to encode Unicode basic Latin characters where necessary, particularly the hyphen‐minus. .MR topic [manual‐section [trailing‐text]] (since groff 1.23) Set a man page cross reference as “topic(manual‐section)”. If manual‐section is absent, the package omits the surrounding parentheses. If trailing‐text (typically punctuation) is specified, it follows the closing parenthesis without intervening space. Hyphenation is disabled while the cross reference is set. topic is set in the font specified by the MF string. If manual‐section is present, the cross reference hyperlinks to a URI of the form “man:topic(manual‐section)”. [...] Except for .EX/.EE, James Clark implemented the foregoing features in early versions of groff. Later, groff 1.20 (2009) resurrected .EX/.EE and originated .SY/.YS, .TQ, .MT/.ME, and .UR/.UE. Plan 9 from User Space’s troff introduced .MR in 2020. > Output from "test-nroff -mandoc -t -K utf8 -rF0 -rHY=0 -ww -b -z ": It is not wise to run the formatter on groff_man.7.man.in, because it is not a *roff document. It is input to m4. If I run groff (from Git HEAD) on groff_man.7 with the same flags, here's what I get: $ groff -mandoc -t -K utf8 -rF0 -rHY=0 -ww -b -z ./build/tmac/groff_man.7 && echo DONE DONE Consequently, all of your tool's output is spurious. I very much hope you are not filing bug reports against other projects with a tool of such poor quality. > * What outcome did you expect instead? > > No output (no warnings). In that case you need to examine the properties of your own fork. > General remarks and further material, if a diff-file exist, are in the > attachments. Hmm, I see. Much advice, some good and much bad, follows. > Any program (person), that produces man pages, should check the output > for defects by using both groff and nroff. > > [test][g|n]roff -mandoc -t -ww -b -z -K utf8 <man page> First of all, "test-groff" exists only in the build tree of a groff built from source. Practically no one is going to have such a tool installed. And as noted above, you may be the only person in the world who has a script named "test-nroff". Your notation is pretty confusing if you mean to suggest input to a shell prompt. Moreover, you've forgotten about the `-` after `test`. The above advice therefore will mystify your readers, or draw their derision if they are already clued in about man page composition and maintenance. Possibly, if they've received similar reports from you before, then they will have learned to ignore you through operant conditioning. > Common defects: > > Input text line longer than 80 bytes. You should learn to distinguish style matters from correctness matters. GNU troff has no problem with input lines far in excess of 80 bytes. As far as I've seen, mandoc(1) has no such problem either. > Not removing trailing spaces (in in- and output). You overreach here in two respects; first, the semantics of trailing spaces are well defined as noted above, and secondly, it's none of your concern or business whether the _input_ to a generator of man(7) applies semantic value to trailing spaces. None of man(1), groff(1), or mandoc(1) care, either. > The reason for these trailing spaces should be found and eliminated. I sometimes feel the same way about your bug reports. > Not beginning each input sentence on a new line. > Lines should thus be shorter. > > See man-pages(7), item 'semantic newline'. That's a sound style guide (saith I, who contributed to it). I hope that people's impressions of it are not tainted by the frequently misleading content with which you couple your references to it. [...] > and for groff, using > > "printf '%s\n%s\n' '.kern 0' '.ss 12 0' | groff -mandoc -Z - " This is going to throw "grout" in the reader's face, which few people have expertise reading. It is emphatically _not_ a skill that man page authors or maintainers need to acquire. If they experience, with their man pages, trouble so resistant to comprehension that study of GNU troff (as opposed to output driver) output is necessary, then they should locate an expert and ask for help. This list has several. (That said, I'd like to make "grout" more readable. But I have to negotiate with Deri first. :) ) The foregoing is also revealing of a low level of sophistication with printf(1). That utility applies the given format string to _each_ of its arguments. On the bright side, that usage may advertise to the reader that your prescription originates with someone who is advising beyond their level of expertise, so by all means retain it. > Output from "mandoc -T lint groff_man.7.man.in": (possibly shortened list) > > mandoc: groff_man.7.man.in:1232:2: WARNING: skipping paragraph macro: PP empty > mandoc: groff_man.7.man.in:1242:2: WARNING: skipping paragraph macro: PP empty I don't get these warnings with mandoc 1.14.6--not on the *.in source document (which one should NOT be giving to mandoc(1) as input in the first place as noted above), nor the generated man(7) document. Have you forked mandoc(1) too? Here's the output I _do_ get: $ mandoc -T lint ./build/tmac/groff_man.7 mandoc: ./build/tmac/groff_man.7:32:2: UNSUPP: unsupported roff request: do mandoc: ./build/tmac/groff_man.7:2260:2: UNSUPP: unsupported roff request: do These are fine. mandoc(1) is complaining about our AT&T compatibility mode management requests. Go back to this list's archives for 2017, I think, for the most recent discussion of them. mandoc: ./build/tmac/groff_man.7:3:5: STYLE: lower case character in document title: TH groff_man We expect a future release of mandoc(1) to withdraw this complaint. Again, search the list archives for a statement from Ingo to this effect. mandoc: ./build/tmac/groff_man.7:1209:2: WARNING: skipping paragraph macro: br at the end of SS This warning is spurious. Here's the context. .\" ==================================================================== .SS Registers .\" ==================================================================== . Registers are described in section \(lqOptions\(rq below. . They can be set not only on the command line but in the site .I man.local file as well; see section \(lqFiles\(rq below. . . .br .ne 7v .\" ==================================================================== .SS Strings .\" ==================================================================== . The following strings are defined for use in man pages. What I'm doing in the foregoing is avoiding "widows"/"orphans"/stranded paragraph lines. And resorting to *roff requests to do it, since the macro package offers no support for this. (It's a hard problem without using diversions or, as Doug once noted, "self-renewing input traps", a gauntlet I'd like to pick up some day.) Anyway, there's no validity problem here. A formatter can completely ignore the `br` and `ne` requests with no harm done to the correctness of the output. I'm not sure I'd even ask Ingo to make mandoc(1) detect cases like this and suppress the warning. Occasionally, man(7) authors resort to "expert mode". It's fine if some tool reminds them that, and where, they have done so. > Lines containing '\c' (' \c' does not make sense): Is that a generalization? > 25:After processing by m4, both child pages in the above case will carry \c This is why you shouldn't run man(7) validation tools on things that aren't man(7) documents. The foregoing line goes to m4(1)'s "black hole diversion". > 610:.RB "].\|.\|.\& \e\- "\c This is a style grievance, not a correctness problem. Here's the context. For example, a section called \(lqName\(rq or \(lqNAME\(rq must exist, must be the first section after the .B .TH call, and must contain only text of the form .RS \" Invisibly move left margin to current .IP indentation. .RS \" Now indent further, visibly. .IR topic [\c .BI , " another-topic"\c .RB "].\|.\|.\& \e\- "\c .I summary-description .RE \" Move left margin back to .IP indentation. for tools like .MR makewhatis 8 or .MR mandb 8 to index them. .RE \" Move left margin back to standard position. That's some pretty thick business. Here's how it formats. material within sections. For example, a section called “Name” or “NAME” must exist, must be the first section after the .TH call, and must contain only text of the form topic[, another‐topic]... \- summary‐description for tools like makewhatis(8) or mandb(8) to index them. I'm accustomed to writing sequences of macro calls like that as paragraph tags (see the `TP` macro), for example in a man page's "Options" section. You're right that it isn't necessary here. So, thanks. That means your message was not completely without value. > Separate an ellipsis from the preceding string with a space > character, if it does not mean a continuation of it. > > See a manual of style about the difference between "abc..." and > "abc ...". > > 4162:To get a \(lqliteral\(rq.\|.\|. .\|.\|.should be input. > 4212:Instead of.\|.\|. .\|.\|.should be considered. > 4401:Instead of.\|.\|. .\|.\|.should be considered. > 4460:Instead of.\|.\|. .\|.\|.do this. No, wrong--the existing usage is idiomatic English. I turn your advice around and suggest that _you_ consult a style manual. > Change a HYPHEN-MINUS (code 0x2D) to a minus(-dash) (\-), > if it > is in front of a name for an option, > is a symbol for standard input, > is a single character used to indicate an option, > or is in the NAME section (man-pages(7)). > N.B. - (0x2D), processed as a UTF-8 file, is changed to a hyphen > (0x2010, groff \[u2010] or \[hy]) in the output. > > 3619:.TP 9.25n \" "-rHY=0" + 2n + hand-tuned for PDF That's in a comment, Bjarni. It depicts the width _as formatted_. > Three full stops (periods) are used for an ellipsis > > 3376:.\" ..and which Clark included in groff man(7) from 1.01 or earlier... All right, then. I'll fix this comment. > Add a zero (0) in front of a decimal fraction that begins with a period > (.) > > 3540:.\" .5v after, as well as... No. Not only is this standard/accepted usage, the numeric expression `.5v` is valid *roff, too. > Split a punctuation from a single argument, if a two-font macro is meant > > 228:.I roff; > 252:.I break. > 301:.I arguments, > 490:.I section, > 500:.I header-middle; > 569:.I heading-text. > 635:.I subheading-text. > 750:.I inset-amount, > 876:.I indentation, > 1577:.I trailing-text. > 1632:.I trailing-text. > 3205:.I system. > 3421:.I version. > 3566:.I groff. > 3622:.I adjustment-mode, > 3715:.I footer-distance; > 3849:.I subsection-indentation. > 4350:.I level. Nope. I said what I meant. The output looks better this way when typeset. However, I tend to set trailing punctuation in roman after an italicized word if I expect what is italicized to be copy-and-pasted; that makes it easier to aim one's pointing device for selection. > Output from "test-groff -mandoc -t -K utf8 -rF0 -rHY=0 -ww -b -z ": > > troff: backtrace: file '<stdin>':4162 > troff:<stdin>:4162: warning: trailing space in the line Not present in groff Git. Maybe you damaged the document in your fork. > Additionally (general): > > Abbreviations get a '\&' added after their final full stop (.) to mark them as > such and not as an end of sentence. > > There is no need to add a '\&' before a full stop (.) if it has a character > before it! Why are you prescribing this with respect to groff_man(7)? You show exhibits of other things--why not this one? > -After processing by m4, both child pages in the above case will carry \c > +After processing by m4, both child pages in the above case will carry > escape sequences followed by text lines starting with punctuation one > normally does not find in that position (and in the case of the period, > which has to be protected from interpretation as a control line). Once again, this part of the file is not a man(7) document. See above regarding m4 and the black hole diversion. If these terms are unfamiliar to you, then learn about m4. There's a nice introduction in Volume 2 of the Seventh Edition Unix Programmer's Manual. Furthermore, if you would read the sentence for comprehension, you would recognize that it is discussing the `\c` escape sequence _specifically_. > -.I roff; > +.IR roff ; No; see above. > -.I break. > +.IR break . No; see above. > Some macros interpret > -.I arguments, > +.IR arguments , No; see above. > -.I section, > +.IR section , No; see above. > -.I header-middle; > +.IR header-middle ; No; see above. > -.I heading-text. > +.IR heading-text . No; see above. > -.RB "].\|.\|.\& \e\- "\c > +.RB "].\|.\|.\&" " \e\- "\c Acknowledged. A macro-recast will be in my next push. > -.I subheading-text. > +.IR subheading-text . No; see above. > -.\" Also see subsection "History" below... > +.\" Also see subsection "History" below ... No; see above. > -.I inset-amount, > +.IR inset-amount , No; see above. > -.I indentation, > +.IR indentation , No; see above. > @@ -1229,7 +1229,6 @@ produces the following output. > .YS > . > . > -.P > .SY groff > .B \-h > .YS You didn't prepare you reader for this proposed change in any way. Moreover, it is dead wrong for the forthcoming groff 1.24. I guess you overlooked or have forgotten the discussions on this list in about April of this year. NEWS: * The behavior of the an (man) package's `SY` and `YS` macros has been expanded to enable greater user control over vertical spacing and to make them convenient for synopsizing C language functions, not just commands. `SY` no longer puts vertical space on the output, and initially breaks the output line _only_ if it is encountered repeatedly without a preceding `YS` call. The computed indentation of synopsis lines after the first now also includes the width of anything already on the output line, so that you can precede the `SY` call with, for instance, the C language data type used for the return value in a function prototype. The `SY` macro now accepts an optional second argument. This second argument is typeset in bold, replaces the fixed-width space that is appended to the synopsis keyword in `SY`'s single-argument form, and is used in computation of the indentation of non-initial synopsis lines. However, this computed indentation can now also be overridden with that of the previous synopsis item. To do this, give any argument to the `YS` macro call "closing" the synopsis whose indentation you want to reuse. When you're done with such a grouped synopsis, leave the argument off the final `YS` call. In a "Synopsis" section of a man page, existing synopses consisting of a single item require no migration. This is the most common case. For others, where before you would write... .SY mv .I source .I destination .YS . .SY mv .I source \&.\|.\|. .I destination-directory .YS ...you would now write the following. .SY mv .I source .I destination .YS . . .P .SY mv .I source \&.\|.\|. .I destination-directory .YS (That is, simply add a paragraphing macro.) And where before you would write... .SY mv .B \-h . .SY mv .B \-\-help .YS ...you would now write the following. .SY mv .B \-h .YS . .SY mv .B \-\-help .YS (That is, simply add `YS` after the first synopsis item.) Likely the biggest benefit of these changes is that it is now much easier to format C function prototypes with these macros. Here's how we would synopsize a somewhat complex standard C library function. .B "#include <stdio.h>" .P .B void *\c .SY bsearch ( .BI const\~void\~*\~ key , .BI const\~void\~*\~ base , .BI size_t\~ nmemb , .BI int\~(* compar )\c .B (const\~void\~*, const\~void\~*)); .YS > @@ -1239,7 +1238,6 @@ produces the following output. > .YS > . > . > -.P > .SY groff > .B \-v > .RI [ option\~ .\|.\|.\&] No; see above. > -.\" ...because it is followed by characters that are transparent to > -.\" end-of-sentence detection, and a newline... > +.\" ... because it is followed by characters that are transparent to > +.\" end-of-sentence detection, and a newline ... No; see above. > -.I trailing-text. > +.IR trailing-text . No; see above. > -.I trailing-text. > +.IR trailing-text . No; see above. > @@ -2318,7 +2316,6 @@ file as well; > see section \(lqFiles\(rq below. > . > . > -.br > .ne 7v > .\" ==================================================================== > .SS Strings > @@ -2780,7 +2777,7 @@ End a text line without inserting space No; see above. In fact is it is _wrong_ to delete only the `br` here, because then the `ne`eded space will be calculated from the baseline of a _pending_ output line if one exists, which throws off the computation. I had to learn this the hard way a few years ago. > -.\" end-of-sentence detection is performed, and... > +.\" end-of-sentence detection is performed, and ... No; see above. > -.I system. > +.IR system . No; see above. > -.\" ..and which Clark included in groff man(7) from 1.01 or earlier... > +.\" ... and which Clark included in groff man(7) from 1.01 or earlier ... > is deprecated. Acknowledged. The dot count will be corrected in my next push. The extra space will not appear, as it is incorrect. > -.I version. > +.IR version . No; see above. > -.\" ...and de-documented .LP... > +.\" ... and de-documented .LP ... No; see above. > -.\" ...as well as \n[PD], which we implement but don't expose. > +.\" ... as well as \n[PD], which we implement but don't expose. No; see above. > -.\" rules (\[br]) as margin characters, as well as... > +.\" rules (\[br]) as margin characters, as well as ... No; see above. > -.\" .5v after, as well as... > +.\" 0.5v after, as well as ... No; see above. > -.\" <https://lists.gnu.org/archive/html/groff/2019-07/msg00038.html>... > +.\" <https://lists.gnu.org/archive/html/groff/2019-07/msg00038.html> ... No; see above. > -.I groff. > +.IR groff . No; see above. > -.\" ...along with implementations of OP, EX, and EE. > +.\" ... along with implementations of OP, EX, and EE. No; see above. > -.TP 9.25n \" "-rHY=0" + 2n + hand-tuned for PDF > +.TP 9.25n \" "\-rHY=0" + 2n + hand-tuned for PDF No; see above. > -.I adjustment-mode, > +.IR adjustment-mode , No; see above. > -.I footer-distance; > +.IR footer-distance ; No; see above. > -.I subsection-indentation. > +.IR subsection-indentation . No; see above. > @@ -4159,7 +4156,7 @@ this translation is sometimes not desira > .TS > Lb Lb > RfCR LfCR. > -To get a \(lqliteral\(rq .\|.\|.\& .\|.\|.\& should be input. > +To get a \(lqliteral\(rq .\|.\|. .\|.\|.should be input. > _ > \(aq \(rs(aq > \- \(rs\- This case merits comment. While the dummy character escape sequences are in fact unnecessary here, because they are embedded in ordinary (non-text-block) table entries... tbl(1): Ordinarily, a table entry is typeset rigidly. It is not filled, broken, hyphenated, adjusted, or populated with additional inter‐ sentence space. ...I chose to retain them here for pedagogical reasons. I would rather that inexpert man(7) authors _always_ follow their ellipses with `\&` _unless_ they are deliberately ending a sentence with one. groff_man_style(7) covers this subject. For example, if they decide to convert the table to ordinary prose, the "unnecessary" dummy characters may stop being unnecessary. > -Instead of.\|.\|. .\|.\|.should be considered. > +Instead of .\|.\|. .\|.\|. should be considered. No; see above. (But maybe I should _add_ dummy characters here!) > @@ -4226,6 +4223,7 @@ _ > \fR.\|.\|. .RE > _ > \&.B one two \(dq\(dq three .B one two three > +_ > .TE You didn't prepare your reader for this proposed change in any way. I think the table looks okay without a rule at the bottom. This is a highly discretionary matter and you should not be mixing such things in with purported correctness advice without distinguishing it. > -.I level. > +.IR level . No; see above. > -Instead of.\|.\|. .\|.\|.should be considered. > +Instead of .\|.\|. .\|.\|. should be considered. No; see above. (But maybe I should _add_ dummy characters here!) > @@ -4455,9 +4453,9 @@ when not ending a sentence. > .if t .ne 5v > .if n .ne 7v \" account for horizontal rules > .TS > -Cb Cb > +Lb Lb You didn't prepare your reader for this proposed change in any way. The other "Instead of..."/"...do this." table centers its headings. Your change would introduce an inconsistency. Your level of attention to detail is erratic. > LfCR LfCR. > -Instead of.\|.\|. .\|.\|.do this. > +Instead of .\|.\|. .\|.\|. do this. No; see above. (But maybe I should _add_ dummy characters here!) Bjarni, I counsel you to try harder to improve the quality of your recommendations. There was far, far more chaff than wheat here. And don't forget: you're not helping the man page maintenance community by making unsound style recommendations predicated on the output of a tool they cannot obtain. Your attempted elimination of the `I` macro from the man(7) language is a radioactively bad idea and should not be considered even for a second. If you continue in this vein, I expect I'll be making many references to this email in the future when advising man page maintainers. Irritably, Branden [1] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man7/man.7 [2] https://cvsweb.bsd.lv/mandoc/roff.c?rev=1.400&content-type=text/x-cvsweb-markup&sortby=date
signature.asc
Description: PGP signature