Hi, Alex! At 2022-03-19T17:07:09+0100, Alejandro Colomar (man-pages) wrote: > While fixing style issues in the man-pages project, > I'm finding a few recurrent issues that I think you could warn about: > > Unnecessary quotations: > > [ > .I "foo bar" > .IR foo "bar" > ]
That is going to be hard to detect from within a macro package. As noted in our recent discussion of quotation marks in macro calls, by the time these arguments get to the `I` and `IR` macros, those macros have no way of knowing of they were excessively quoted in the calling context. I don't have a solution for this problem. To solve it would require modifying GNU troff's input parser to track some kind of "extraneous quote" state. Since as we saw in our earlier discussion, a sequence of up to four double quotes can be perfectly valid, my intuition is that this problem is worse than regex-hard, and the cost might rapidly outweigh the benefit. If you need this, it's probably better to just write a regex-based tool that scans the man page source. You can then enforce a stricter discipline, permitting false positives on valid but unusual constructs that would be better recast. > Unnecessary escape \f: > > [ > foo \fIbar\fP baz > ] > > The last one is more difficult to decide when it's unnecessary, but > you could maybe start with non-formatted lines. This is also a big challenge, and on my first reflection, even worse, as you suspect. The problem is that what you quote is an ordinary text line, and *roffs don't generally look very far ahead when parsing. There aren't many ways in the language to peek ahead in the input stream. The only ways I can think of would be to set up the macro package such that all text lines get captured into a macro or diversion. You might then be able to iterate through the stored content somehow--though I don't know off the top of my head a way to do this line by line. I also don't know how to do something like save some kind of pending input line into a string for processing with the few simple requests we have for that. There's also the problem of interpreting that input well enough to recognize undesirable constructs--do you want to write a troff in troff? Again I would attack this with a less perfect but much more tractable regex-based input scanner. I would filter out tbl(1) regions and then flag _any_ font selection escape sequence that isn't on a control line, meaning a line starting with '.' (that's an over-crudification[1], but I predict that it will work well for most pages. I'm attaching a shell script I've come up with do this. For groff's own pages, it mostly turns up use of non-man(7)-standard fonts (not roman, bold, or italic) and some pages I haven't yet done a thorough revision on. Regards, Branden [1] no-break control character, line continuation, yadda yadda yadda
find-font-escapes.sh
Description: Bourne shell script
signature.asc
Description: PGP signature