Hi Oliver,

At 2025-05-10T14:02:36+0200, Oliver Corff via GNU roff typesetting
system discussion wrote:
> I should definitely write more macros in order to better understand
> the subtleties of macro definition and execution.

This process may be a little easier with the forthcoming groff 1.24
release (or groff Git's trunk), which lets you dump macro contents,
forcing the GNU troff program to disclose what they contain.

The new feature enables "white-box" testing in addition to the
traditional "black-box" methods we've used.

Here's an example using your input file.  (I converted your use of NBSP
characters, at least as they showed up in your email, and also removed
spaces from before a comment in your `mso` invocation due to another
change.[1])

First, however, I note that turning on warnings would have furnished a
clue to the site of the problem.

$ cat ATTIC/oliver-set-register-in-macro.groff
.mso s.tmac\" Load ms
.
.de pageno
\\$1:\c         \" displayed as intended
.nr xx \\$1     \" register is not set?
\n[xx]          \" register reads zero
..
.\"
.PP
My page number:
.pageno 123
$ groff -ww -a ATTIC/oliver-set-register-in-macro.groff
troff:ATTIC/oliver-set-register-in-macro.groff:6: warning: register 'xx' not 
defined
<beginning of page>
 My page number: 123:0

But even knowing the trouble was on line 6 of the file, one might be
baffled as to the cause.  No matter, let's add something to the end of
the file.

$ echo '.pm pageno' >> ATTIC/oliver-set-register-in-macro.groff

What we're about to do will produce JSON-formatted output to the
standard error so let's (1) suppress other output and (2) pipe that JSON
stuff to jq(1) to pretty-print it.

$ tg -Wreg -z ATTIC/oliver-set-register-in-macro.groff 2>&1 | jq
{
  "name": "pageno",
  "file name": "ATTIC/oliver-set-register-in-macro.groff",
  "starting line number": 4,
  "length": 43,
  "contents": "\\$1:\u001c         \n.nr xx \\$1     \n0          \n",
  "node list": []
}

We can observe a few things.

1.  Most importantly, on the last line of the macro we see a literal
    '0', which confirms that the use of register interpolation at macro
    definition time took place, freezing the `xx` register's value at
    that time inside the macro body.

2.  Comments are stripped from the macro definition, as documented our
    Texinfo manual.  We can preserve the comments, which may help us to
    navigate the macro definition, by bracketing the macro definition in
    `eo` and `ec`.

    Illustration:

$ diff -u ATTIC/oliver-set-register-in-macro.groff 
ATTIC/oliver-set-register-in-macro-escape-off.groff
--- ATTIC/oliver-set-register-in-macro.groff    2025-05-10 07:45:07.148742779 
-0500
+++ ATTIC/oliver-set-register-in-macro-escape-off.groff 2025-05-10 
07:50:33.207570377 -0500
@@ -1,10 +1,12 @@
 .mso s.tmac\" Load ms
 .
+.eo
 .de pageno
 \\$1:\c         \" displayed as intended
 .nr xx \\$1     \" register is not set?
 \n[xx]          \" register reads zero
 ..
+.ec
 .\"
 .PP
 My page number:

With warnings on, that provides another clue that we haven't done what
we intended.

$ groff -ww -a ATTIC/oliver-set-register-in-macro-escape-off.groff<beginning of 
page>
troff:ATTIC/oliver-set-register-in-macro-escape-off.groff:13: warning: expected 
numeric expression, got character '\'
troff:ATTIC/oliver-set-register-in-macro-escape-off.groff:13: warning: register 
'xx' not defined

However, warning output causes jq(1) to choke.

parse error: Invalid literal at line 1, column 6

(Hmm.  jq(1) doesn't identify itself in diagnostics.  That's bad.)

With the escape character off, we no longer have to remember to double
backslashes in the macro definition.  I suspect this is why use of `eo`
and `ec` was Werner Lemberg's recommendation and practice.  And it works
well, though there are tradeoffs.[2]

$ diff -u ATTIC/oliver-set-register-in-macro.groff 
ATTIC/oliver-set-register-in-macro-escape-off.groff
--- ATTIC/oliver-set-register-in-macro.groff    2025-05-10 07:45:07.148742779 
-0500
+++ ATTIC/oliver-set-register-in-macro-escape-off.groff 2025-05-10 
07:56:06.366354237 -0500
@@ -1,10 +1,12 @@
 .mso s.tmac\" Load ms
 .
+.eo
 .de pageno
-\\$1:\c         \" displayed as intended
-.nr xx \\$1     \" register is not set?
-\n[xx]          \" register reads zero
+\$1:\c         \" displayed as intended
+.nr xx \$1     \" register is not set?
+\n[xx]         \" register reads zero
 ..
+.ec
 .\"
 .PP
 My page number:

Now the macro works, _and_ we can dump its contents in all their
commented glory.

$ groff -a ATTIC/oliver-set-register-in-macro-escape-off.groff 2>/dev/null
<beginning of page>
 My page number: 123:123

$ groff -z ATTIC/oliver-set-register-in-macro-escape-off.groff 2>&1 | jq
{
  "name": "pageno",
  "file name": "ATTIC/oliver-set-register-in-macro-escape-off.groff",
  "starting line number": 5,
  "length": 117,
  "contents": "\\$1:\\c         \\\" displayed as intended\n.nr xx \\$1     
\\\" register is not set?\n\\n[xx]         \\\" register reads zero\n",
  "node list": []
}

But didn't I say we don't have to escape the backslashes?  Why do the
backslashes show up as doubled?  Because that's the JSON specification
for representing them as strings.[3]  (It also requires them for double
quotes, so a comment shows up as '\\\"'.)

$ printf '.ds backslash \\\\\n.pm backslash\n' | groff 2>&1 | jq
{
  "name": "backslash",
  "file name": "<standard input>",
  "starting line number": 1,
  "length": 1,
  "contents": "\\",
  "node list": []
}

See the '"length": 1'?  That reassures us that we're not "really"
looking at two backslashes inside this string definition, but one.

The ubiquity of the backslash as an escape character in Unix tools also
shows up in our printf(1) format string, of course.

In developing and experimenting with the new dumping features I've added
to GNU troff, I've found myself mostly pleased but a little nonplussed
by the intense gravity of the backslash-escaping problem.  As often
happens in engineering problems, we face a tradeoff; we can force the
formatter to yield up its secrets, but we must exercise care when
interpreting them.

Regards,
Branden

[1] 
https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS?id=0cd44362696c9d65ab59f6014f15221ac53b57f3#n13

[2] Bracketing macro definitions with `eo` and `ec` _prevents_ you
    from being able to interpolate macro and string values into the
    macro definition if you _want_ to do so at definition time; this is
    admittedly a rare requirement. Also, because comment escape
    sequences are thus not interpreted, the formatter doesn't discard
    them from the definition, which makes the macro definition larger,
    marginally increasing storage and processing time requirements.  I
    welcome emprical exploration of the magnitude of this difference.
    I predict that with today's RAM allotments and CPU frequencies, the
    difference is negligible for practical purposes.  Maybe it would
    start to matter for someone who formatted millions of pages of *roff
    input per day, every day.  But hard data would be better.

[3] https://www.json.org/json-en.html

Attachment: signature.asc
Description: PGP signature

Reply via email to