Hi Oliver, At 2025-05-10T14:02:36+0200, Oliver Corff via GNU roff typesetting system discussion wrote: > I should definitely write more macros in order to better understand > the subtleties of macro definition and execution.
This process may be a little easier with the forthcoming groff 1.24 release (or groff Git's trunk), which lets you dump macro contents, forcing the GNU troff program to disclose what they contain. The new feature enables "white-box" testing in addition to the traditional "black-box" methods we've used. Here's an example using your input file. (I converted your use of NBSP characters, at least as they showed up in your email, and also removed spaces from before a comment in your `mso` invocation due to another change.[1]) First, however, I note that turning on warnings would have furnished a clue to the site of the problem. $ cat ATTIC/oliver-set-register-in-macro.groff .mso s.tmac\" Load ms . .de pageno \\$1:\c \" displayed as intended .nr xx \\$1 \" register is not set? \n[xx] \" register reads zero .. .\" .PP My page number: .pageno 123 $ groff -ww -a ATTIC/oliver-set-register-in-macro.groff troff:ATTIC/oliver-set-register-in-macro.groff:6: warning: register 'xx' not defined <beginning of page> My page number: 123:0 But even knowing the trouble was on line 6 of the file, one might be baffled as to the cause. No matter, let's add something to the end of the file. $ echo '.pm pageno' >> ATTIC/oliver-set-register-in-macro.groff What we're about to do will produce JSON-formatted output to the standard error so let's (1) suppress other output and (2) pipe that JSON stuff to jq(1) to pretty-print it. $ tg -Wreg -z ATTIC/oliver-set-register-in-macro.groff 2>&1 | jq { "name": "pageno", "file name": "ATTIC/oliver-set-register-in-macro.groff", "starting line number": 4, "length": 43, "contents": "\\$1:\u001c \n.nr xx \\$1 \n0 \n", "node list": [] } We can observe a few things. 1. Most importantly, on the last line of the macro we see a literal '0', which confirms that the use of register interpolation at macro definition time took place, freezing the `xx` register's value at that time inside the macro body. 2. Comments are stripped from the macro definition, as documented our Texinfo manual. We can preserve the comments, which may help us to navigate the macro definition, by bracketing the macro definition in `eo` and `ec`. Illustration: $ diff -u ATTIC/oliver-set-register-in-macro.groff ATTIC/oliver-set-register-in-macro-escape-off.groff --- ATTIC/oliver-set-register-in-macro.groff 2025-05-10 07:45:07.148742779 -0500 +++ ATTIC/oliver-set-register-in-macro-escape-off.groff 2025-05-10 07:50:33.207570377 -0500 @@ -1,10 +1,12 @@ .mso s.tmac\" Load ms . +.eo .de pageno \\$1:\c \" displayed as intended .nr xx \\$1 \" register is not set? \n[xx] \" register reads zero .. +.ec .\" .PP My page number: With warnings on, that provides another clue that we haven't done what we intended. $ groff -ww -a ATTIC/oliver-set-register-in-macro-escape-off.groff<beginning of page> troff:ATTIC/oliver-set-register-in-macro-escape-off.groff:13: warning: expected numeric expression, got character '\' troff:ATTIC/oliver-set-register-in-macro-escape-off.groff:13: warning: register 'xx' not defined However, warning output causes jq(1) to choke. parse error: Invalid literal at line 1, column 6 (Hmm. jq(1) doesn't identify itself in diagnostics. That's bad.) With the escape character off, we no longer have to remember to double backslashes in the macro definition. I suspect this is why use of `eo` and `ec` was Werner Lemberg's recommendation and practice. And it works well, though there are tradeoffs.[2] $ diff -u ATTIC/oliver-set-register-in-macro.groff ATTIC/oliver-set-register-in-macro-escape-off.groff --- ATTIC/oliver-set-register-in-macro.groff 2025-05-10 07:45:07.148742779 -0500 +++ ATTIC/oliver-set-register-in-macro-escape-off.groff 2025-05-10 07:56:06.366354237 -0500 @@ -1,10 +1,12 @@ .mso s.tmac\" Load ms . +.eo .de pageno -\\$1:\c \" displayed as intended -.nr xx \\$1 \" register is not set? -\n[xx] \" register reads zero +\$1:\c \" displayed as intended +.nr xx \$1 \" register is not set? +\n[xx] \" register reads zero .. +.ec .\" .PP My page number: Now the macro works, _and_ we can dump its contents in all their commented glory. $ groff -a ATTIC/oliver-set-register-in-macro-escape-off.groff 2>/dev/null <beginning of page> My page number: 123:123 $ groff -z ATTIC/oliver-set-register-in-macro-escape-off.groff 2>&1 | jq { "name": "pageno", "file name": "ATTIC/oliver-set-register-in-macro-escape-off.groff", "starting line number": 5, "length": 117, "contents": "\\$1:\\c \\\" displayed as intended\n.nr xx \\$1 \\\" register is not set?\n\\n[xx] \\\" register reads zero\n", "node list": [] } But didn't I say we don't have to escape the backslashes? Why do the backslashes show up as doubled? Because that's the JSON specification for representing them as strings.[3] (It also requires them for double quotes, so a comment shows up as '\\\"'.) $ printf '.ds backslash \\\\\n.pm backslash\n' | groff 2>&1 | jq { "name": "backslash", "file name": "<standard input>", "starting line number": 1, "length": 1, "contents": "\\", "node list": [] } See the '"length": 1'? That reassures us that we're not "really" looking at two backslashes inside this string definition, but one. The ubiquity of the backslash as an escape character in Unix tools also shows up in our printf(1) format string, of course. In developing and experimenting with the new dumping features I've added to GNU troff, I've found myself mostly pleased but a little nonplussed by the intense gravity of the backslash-escaping problem. As often happens in engineering problems, we face a tradeoff; we can force the formatter to yield up its secrets, but we must exercise care when interpreting them. Regards, Branden [1] https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS?id=0cd44362696c9d65ab59f6014f15221ac53b57f3#n13 [2] Bracketing macro definitions with `eo` and `ec` _prevents_ you from being able to interpolate macro and string values into the macro definition if you _want_ to do so at definition time; this is admittedly a rare requirement. Also, because comment escape sequences are thus not interpreted, the formatter doesn't discard them from the definition, which makes the macro definition larger, marginally increasing storage and processing time requirements. I welcome emprical exploration of the magnitude of this difference. I predict that with today's RAM allotments and CPU frequencies, the difference is negligible for practical purposes. Maybe it would start to matter for someone who formatted millions of pages of *roff input per day, every day. But hard data would be better. [3] https://www.json.org/json-en.html
signature.asc
Description: PGP signature