[following up on my email of 9 March, but sending only to groff@] I have some happy announcements to make and questions to ask of this list's subscribers. In my previous status email I enumerated several problems with getting to a groff 1.24.0 release candidate.
All of them, more or less, are now resolved.
* I noted Savannah #66675 as a trouble spot. With today's push, it's
fixed. The ultimate resolution was simple. Dave Kemper has been
extremely helpful in identifying problems and regressions, catching me
out in misconceptions, and compelling me to make sense after I fail to
explain things cogently.
* All of the debugging features I mused about, except details of the
character resolution process, are now implemented. I'll share their
man page descriptions and illustrate with example shell sessions.
1. You can demand information about any ordinary, special, or indexed
character.
groff(7):
.pchar c ...
Report, to the standard error stream, information about
each ordinary or special character c. A character
defined by a request (char, fchar, fschar, or schar),
reports its contents as a JSON‐encoded string, but the
output is not otherwise in JSON format.
$ groff
.pchar a
character 'a'
is not translated
does not have a macro
special translation: 0
hyphenation code: 97
flags: 0
ASCII code: 97
asciify code: 0
is found
is transparently translatable
is not translatable as input
mode: normal
.pchar \['a]
special character "'a"
is not translated
does not have a macro
special translation: 0
hyphenation code: 97
flags: 0
ASCII code: 0
asciify code: 225
is found
is transparently translatable
is translatable as input
mode: normal
.pchar \N'65'
character indexed 65 in current font
is not translated
does not have a macro
special translation: 0
hyphenation code: 0
flags: 0
ASCII code: 0
asciify code: 0
is found
is transparently translatable
is not translatable as input
mode: normal
.char \[happy] :-)
.pchar \[happy]
special character "happy"
is not translated
has a macro: "contents": ":-)"
special translation: 0
hyphenation code: 0
flags: 0
ASCII code: 0
asciify code: 0
is found
is transparently translatable
is not translatable as input
mode: normal
2. The new `pline` request is now much, much more powerful. Because a
node list is really a tree structure, to accurately report the node
list corresponding to a pending input line, we needed recursive node
dumping operations. Now we have them.
groff(7):
.pline Report, in JSON syntax to the standard error stream, the
list of output nodes corresponding to the pending output
line. In JSON, a pair of empty brackets “[ ]”
represents an empty list.
$ printf 'Check out this \\%%Bu\[~n]uel flick.\n.pline\n' | ./build/test-groff
-z
[{"type": "line_start_node", "diversion level": 0, "is_special_node": false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "C"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "h"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "e"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "c"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "k"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false,
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false,
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width":
2500 }], "unformat": false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "o"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "u"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "t"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false,
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false,
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width":
2500 }], "unformat": false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "t"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "h"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "i"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "s"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false,
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false,
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width":
2500 }], "unformat": false},
{"type": "hyphen_inhibitor_node", "diversion level": 0, "is_special_node":
false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "B"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "u"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, "special
character": "~n"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "u"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "e"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "l"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false,
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false,
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width":
2500 }], "unformat": false},
{"type": "ligature_node", "diversion level": 0, "is_special_node": false, "n1":
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "f"}, "n2": {"type": "glyph_node", "diversion level": 0,
"is_special_node": false, "character": "l"}},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "i"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "c"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "k"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "."},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false,
"hunits": 5000, "undiscardable": false, "is hyphenless breakpoint": false,
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width":
2500 }], "unformat": false}]
That's a lot. Send the standard error stream to jq(1) to make the
tree structure more obvious.
$ printf 'Check out this \\%%Bu\[~n]uel flick.\n.pline\n' \
| ./build/test-groff -z 2>&1 | jq
[
{
"type": "line_start_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "C"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "h"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "e"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "c"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "k"
},
{
"type": "word_space_node",
"diversion level": 0,
"is_special_node": false,
"hunits": 2500,
"undiscardable": false,
"is hyphenless breakpoint": false,
"terminal_color": "default",
"width_list": [
{
"width": 2500,
"sentence_width": 2500
}
],
"unformat": false
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "o"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "u"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "t"
},
{
"type": "word_space_node",
"diversion level": 0,
"is_special_node": false,
"hunits": 2500,
"undiscardable": false,
"is hyphenless breakpoint": false,
"terminal_color": "default",
"width_list": [
{
"width": 2500,
"sentence_width": 2500
}
],
"unformat": false
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "t"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "h"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "i"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "s"
},
{
"type": "word_space_node",
"diversion level": 0,
"is_special_node": false,
"hunits": 2500,
"undiscardable": false,
"is hyphenless breakpoint": false,
"terminal_color": "default",
"width_list": [
{
"width": 2500,
"sentence_width": 2500
}
],
"unformat": false
},
{
"type": "hyphen_inhibitor_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "B"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "u"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"special character": "~n"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "u"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "e"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "l"
},
{
"type": "word_space_node",
"diversion level": 0,
"is_special_node": false,
"hunits": 2500,
"undiscardable": false,
"is hyphenless breakpoint": false,
"terminal_color": "default",
"width_list": [
{
"width": 2500,
"sentence_width": 2500
}
],
"unformat": false
},
{
"type": "ligature_node",
"diversion level": 0,
"is_special_node": false,
"n1": {
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "f"
},
"n2": {
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "l"
}
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "i"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "c"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "k"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "."
},
{
"type": "word_space_node",
"diversion level": 0,
"is_special_node": false,
"hunits": 5000,
"undiscardable": false,
"is hyphenless breakpoint": false,
"terminal_color": "default",
"width_list": [
{
"width": 2500,
"sentence_width": 2500
}
],
"unformat": false
}
]
3. The `pm` request now (optionally) accepts a list of names to dump.
(Its behavior when given no arguments is unchanged.)
groff(7):
.pm Report, to the standard error stream, the names of all
defined macros, strings, and diversions and their sizes
in bytes.
.pm name ...
Report, to the standard error stream, the name and JSON‐
encoded contents of each macro, string, or diversion
name.
$ printf '.ds mystring " hello, \\[dq]world\\[dq]\n.pm mystring\n' \
| ./build/test-groff -ms
{"name": "mystring", "contents": " hello, \\[dq]world\\[dq]"}
Caution: a single backslash has to be escaped both for printf(1) on the
way in, and for correct JSON representation on the way. So there's
really only one backslash before each `[dq]` in this example. With that
in mind, we can see that string definitions are read in copy mode just
as the documentation has always claimed. Also observe the leading space
in the string contents.
$ echo '.pm LP' | ./build/test-groff -ms
{"name": "LP", "contents": ".if !'\\n[.z]''
\u0016\u0011.\tbr\n.di\n.\u0017\n.br\n.cov*ab-init\n.cov*print\n.nop
\\*[\\$0]\\\n"}
The disclosure of GNU troff's encoding technique for certain tokens
is a mixed blessing. On the one hand, no one can be expected to
know what these JSON-encoded C0 control characters mean off the top
of their head, and they'll have to consult "src/rocc/troff/input.h"
in the groff source tree to decode them. On the other hand,
exposure of this information, formerly impossible outside of a GDB
session, should be a boon to developers and ambitious macro
programmers.
4. Did you notice the word "diversions" in the previous item?
Implementing this feature cleared up some confusion I had about the
nature of the `macro_header` class inside GNU troff. In my earlier
message I wondered why it contained objects of both `char_list` and
`node_list` types. Now I know. These could have been wrapped in a
C/C++ `union`. (In Ada, we'd use a "discriminated record".) Macros
and strings use only the `char_list`. Diversions use only the
`node_list`. This made implementation of the dumping feature
straightforward. It also means that diversion dumping can be even
more chatty than dumping the pending output line node list. Here's
the example I put in the commit message.
$ printf '.di foo\nABC.\n.sp\nDEF\n.br\n.di\n.pm foo\n' \
| build/test-groff -z 2>&1
{"name": "foo", "contents": [{"type": "line_start_node", "diversion level": 0,
"is_special_node": false}, {"type": "glyph_node", "diversion level": 0,
"is_special_node": false, "character": "A"}, {"type": "glyph_node", "diversion
level": 0, "is_special_node": false, "character": "B"}, {"type": "glyph_node",
"diversion level": 0, "is_special_node": false, "character": "C"}, {"type":
"glyph_node", "diversion level": 0, "is_special_node": false, "character":
"."}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node":
false, "vunits": -12000}, {"type": "vertical_size_node", "diversion level": 0,
"is_special_node": false, "vunits": 0}, {"type": "diverted_space_node",
"diversion level": 0, "is_special_node": false, "vunits": 12000}, {"type":
"line_start_node", "diversion level": 0, "is_special_node": false}, {"type":
"glyph_node", "diversion level": 0, "is_special_node": false, "character":
"D"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false,
"character": "E"}, {"type": "glyph_node", "diversion level": 0,
"is_special_node": false, "character": "F"}, {"type": "vertical_size_node",
"diversion level": 0, "is_special_node": false, "vunits": -12000}, {"type":
"vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits":
0}]}
$ printf '.di foo\nABC.\n.sp\nDEF\n.br\n.di\n.pm foo\n' \
| build/test-groff -z 2>&1 | jq
{
"name": "foo",
"contents": [
{
"type": "line_start_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "A"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "B"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "C"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "."
},
{
"type": "vertical_size_node",
"diversion level": 0,
"is_special_node": false,
"vunits": -12000
},
{
"type": "vertical_size_node",
"diversion level": 0,
"is_special_node": false,
"vunits": 0
},
{
"type": "diverted_space_node",
"diversion level": 0,
"is_special_node": false,
"vunits": 12000
},
{
"type": "line_start_node",
"diversion level": 0,
"is_special_node": false
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "D"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "E"
},
{
"type": "glyph_node",
"diversion level": 0,
"is_special_node": false,
"character": "F"
},
{
"type": "vertical_size_node",
"diversion level": 0,
"is_special_node": false,
"vunits": -12000
},
{
"type": "vertical_size_node",
"diversion level": 0,
"is_special_node": false,
"vunits": 0
}
]
}
In practice, a diversion may contain up to an entire page of
formatted text, so I expect their dumps to potentially be really
huge. But the user can now inspect them in minute detail.
Next steps:
* I need to know from this community what, if anything, should now gate
RC1. I don't plan on a code freeze until RC2, but I don't want to
mess with the formatter anymore, except to possibly do one thing I've
already worked up and tested.
* Review the Savannah 1.24.0 release goals ticket.
https://savannah.gnu.org/bugs/?65099
Deri's patiently been awaiting my feedback on his contribution of PDF
superpowers to the ms package, which could easily be added to the
goals. As illustrated on this list, it seems to work fine with the
reasonably complex ms.ms document. Getting first-class PDF support
into all our full-service macro packages is, I think, a prerequisite
to making the default output device PDF. Maybe more than a
"prerequisite": once we have that support, I'm finding it hard to
imagine reasons _not_ to change the default output device thus. I
don't think we'll get groff_mm(7) or groff_me(7) in time for 1.24.0.
(Nobody's working on these tasks, and I don't want to wait/gate on
myself to take care of them.)
* The one respect in which I'm contemplating still changing the
formatter itself is this:
diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index 37dd7954c..3fcc6c098 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -2543,6 +2543,8 @@ void environment::do_break(bool want_adjustment)
break;
}
}
+ if (getenv("GROFF_DUMP") != 0 /* nullptr */)
+ curenv->dump_pending_nodes();
node *tem = line;
line = 0 /* nullptr */;
output_line(tem, width_total, was_centered);
That's all. What does this do? It tells GNU troff to do the equivalent
of `pline` every time it's about to perform a break.
What does that mean? You get a complete node graph of your document.
Because like Osiris, a *roff's node-generation procedure dies and is
born again with every new output line,[1] this graph is, more precisely,
a linear forest: a list of trees. (A hedgerow? Bustling since 1971?)
This is something mandoc(1) has had for years. Now we can have it too.
The reason I haven't already committed this is because it requires an
interface decision. Use an environment variable? If so, named what?
Use a command-line option? If so, which letter do we want to
permanently eat for it?
Not many are available:
groff(1):
groff [-abcCeEgGijklNpRsStUVXzZ] [-d ctext] [-d string=text]
[-D fallback‐encoding] [-f font‐family] [-F font‐directory]
[-I inclusion‐directory] [-K input‐encoding] [-L spooler‐
argument] [-m macro‐package] [-M macro‐directory] [-n page‐
number] [-o page‐list] [-P postprocessor‐argument]
[-r cnumeric‐expression] [-r register=numeric‐expression]
[-T output‐device] [-w warning‐category] [-W warning‐
category] [file ...]
I want to hear your feedback on all of the questions above.
Regards,
Branden
[1] I'll bet the main reason for this was to reduce the memory footprint
of the implementation back in core-starved PDP-11 days.
signature.asc
Description: PGP signature
