Hi Deri,

[somewhat rearranged]
At 2026-01-28T20:49:25+0000, Deri wrote:
> I agree with Bruno, the fix can wait. I'm not sure about this one
> though,
> 
> [derij@pip build (master)]$ echo "\X'pdf: xrev'"|groff -Tpdf -ms -Z
> x T pdf
> x res 72000 1 1
> x init
> p1
> troff: src/roff/troff/input.cpp:3107: const char*
> token::description(): Assertion `0 == "unhandled case of `type`
> (token)"' failed.
> groff: error: troff: Aborted (core dumped)

Good catch--I wasn't aware of this.

> it seems to be only in current groff:-

That much is not a surprise.  Here's the commit (ec856178ff) that added
the assertion.

diff --git a/ChangeLog b/ChangeLog
index 383f3263b..e86c69426 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2025-11-28  G. Branden Robinson <[email protected]>
+
+       * src/roff/troff/input.cpp (token::description): Add assertion;
+       every token type should have a human-readable description.  In
+       the event that's not the case and `NDEBUG` is defined, describe
+       the anomalous token as "an undescribed token" rather than "a
+       magic token", to make it clearer that the problem results from
+       developer oversight.
+
 2025-11-28  G. Branden Robinson <[email protected]>

        * src/roff/troff/token.h: Add new inline member function
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 0ff52efd1..35224c502 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -3031,9 +3031,9 @@ const char *token::description()
   case TOKEN_EOF:
     return "end of input";
   default:
-    break;
+    assert(0 == "unhandled case of `type` (token)");
+    return "an undescribed token";
   }
-  return "a magic token";
 }

 void skip_line()

This assertion is tripping when `token::description()`, a member
function that is called only by diagnostic routines to tell the user (on
the standard error stream) that something has gone wrong, hits a "this
should never happen situation".  Civilized languages like Haskell and
(in this respect) Rust force the programmer to consider every
possibility in switch/case-style control flow.

Another approach, long seen in Pascal and Ada, is to have honest-to-God
real enumerated types that cannot take on undefined values.[1]  This being
C/C++, an `enum` is mostly superfluous window dressing around a machine
word, which is the only data type Real Programmers care about.

Anyway, my dissatisfaction with C/C++'s proud tradition of slovenly data
typing aside, let us continue by tracing the provenance of "a magic
token".

$ git blame ec856178ff^ -- src/roff/troff/input.cpp | grep -C3 '"a magic token"'
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3033)   default:
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3034)     break;
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3035)   }
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3036)   return "a magic token";
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3037) }
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3038)
^351da0dcd troff/input.c            (James Clark         1991-06-02 04:20:34 
-0500  3039) void skip_line()

Well, that didn't take long.

The assertion tripping is my doing, but it's also the sort of thing I
_wanted_ to catch.  Or thought I did.

What does groff 1.23.0 do?

$ echo "\X'pdf: xrev'"|~/groff-1.23.0/bin/groff -Tpdf -ms -Z
x T pdf
x res 72000 1 1
x init
p1
V84000
H72000
x font 5 TR
f5
s10000
V84000
H72000
md
DFd
x X pdf: xrev
n12000 0
V768000
H540000
n12000 0
x trailer
V792000
x stop

The foregoing seems okay.

There is therefore a mystery here and I will dig into it.

Thanks for the report.

> It only dumps if the -ms is included. It does not matter what text
> appears in the \X command.

Those two facts make this behavior _extra_ mysterious to me.  There's no
mechanism for redefining an escape sequence, so WTF?

I must love a challenge.

...one brief GDB session later:

##(gdb) list token::description
...
3000      static char buf[bufsz];
3001      (void) memset(buf, 0, bufsz);
3002      switch (type) {
3003      case TOKEN_EMPTY:
3004        return "an indeterminate token (at start of input?)";
3005      case TOKEN_BACKSPACE:
3006        return "a backspace character";
3007      case TOKEN_CHAR:
3008        if (INPUT_DELETE == c)
3009          return "a delete character";
##(gdb) p type
$1 = token::TOKEN_BEGIN_TRAP

Hmmmmmm!

That would explain why loading the (full-service) macro package provoked
the problem; it set up (proper) traps (cf. the "implicit page trap").

Hypothesis: the input stream pointer is beyond where I thought it was.

I beat the ever-living heck out of `\X` escape sequence handling, at the
lexical level, for this release cycle, as recorded in the epic bug
#63074.  So it's highly plausible that I goofed here.

Will investigate and advise.

Regards,
Branden

Please find below my irregularly scheduled sarcastic jeremiad against
brogrammers, past and present.  (And, implicitly, their
"velocity"-obsessed managers.)

[1] Pascal somewhat notoriously had its compilers inject bounds checks
    upon, reputedly, _every_ assignment to a subrange type,[2] which
    folks like Kernighan seized upon as potentially wasteful.
    Kernighan's admirers eagerly latched upon his criticism, repeating
    it rotely and typically without ever bothering to perform any
    empirical measurement of the impact themselves.  (If they did,
    somehow they never remember to cite any.)  Ada had seen this
    tradeoff coming at least as far back as the late 1970s, and mandated
    that the compiler undertake static analysis and inject bounds checks
    only if it could not prove that the code wasn't admitting
    out-of-range values in the first place.  The aforementioned
    Kernighan admirers responded by (a) proclaiming that it was too hard
    to write an Ada compiler (anything approaching formal methods being
    too difficult for Unix nerds); and (b) studiously ignoring Ada
    except whenever arose an opportunity to denigrate it as bloated
    DoD-ware for people who wore crew cuts.  Nowadays, both GCC and LLVM
    have multiple, large sophisticated systems for doing semantic,
    control flow, data flow, and memory-safety analysis, and are even
    considered sexy for this reason (except LLVM is sexier because it's
    not copylefted).  But Ada is still wrong and stupid and irrelevant
    for having these things too soon when they weren't cool.

[2] In Pascal, enumerated and subrange types looked respectively as
    follows.

      type Day = (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday,
                  Saturday);
      type Weekday = Monday .. Friday;

    As I understand, even terrible old Pascal never needed bounds checks
    on assignments to a variable of an enumerated type, because you
    could statically check for a valid assignment.  Subrange types
    _were_ idiomatically used for array bounds.

      type MinesweeperField = array [1 .. 20, 1 .. 20] of Boolean;

    The foregoing could be handled statically too, because the valid
    array indices were from a static range, but the following was not
    handleable statically and I think it's what aggrieved Kernighan, who
    badly wanted variable-length strings, and Pascal's single biggest
    blunder was not having a good story for them.  Wirth's examples even
    in the "Report" version of Pascal that formed the basis of ISO 7185
    Standard Pascal, were pretty cringe, clearly stuck in the fixed-form
    tradition of punched card-based records (as was FORTRAN 77).

      Read(Inputfile, N);
      type UserName = array [1 .. N] of Char;

    (I'm not sure the foregoing is a conforming [piece of a] Standard
    Pascal program; in its official form, Pascal was even stricter than
    traditional and ANSI C about the lexical organization of blocks.
    You had to define your constants first, then types, then variables;
    then _declare_ any procedures and functions referenced within the
    block; and only then could you write statements.)

    ISO C struggles to this day with variable-length arrays and flexible
    array members, which should suggest to C partisans that they don't
    completely have their story straight in this department.  But it
    doesn't.  The advantage of Pascal's run-time bounds checks was that
    they prevented entire classes of undefined behavior.  In the 1980s,
    C hackers circulated copies of Kernighan's "Why Pascal Is Not My
    Favorite Programming Language" like samizdat, and referred to
    it--without necessarily having read it--as an authoritative case
    _against performing run-time bounds checks at all in any context_.

    I don't think Kernighan would have approved of this reckless and
    gigantic generalization of his point, but I also don't think it
    would have mattered if he had objected vociferously.  A Real
    Programmer cites authorities when they support whatever it is one
    wanted to do in the first place, and ignores them otherwise.  Thus
    did an entire sector of the software industry, centered on Unix and
    C, gleefully introduce countless vectors for security exploits.
    Their code ran faster!  I suppose NSA loves C and Unix because they
    can easily penetrate any system employing them.  No wonder Bob
    Morris was hired straight out of Bell Labs in 1986 to become its
    chief scientist.  He had seen what the burgeoning field was doing
    for intelligence and counter- intelligence work.  Remember, now,
    Real Programmers aren't _reflexively_ opposed to the U.S. federal
    government: NSA good--DoD bad.  NSA won't make you cut your hair or
    your beard.  One just have to pass an FBI background check, sign
    away one's freedom of speech for the rest of one's life, and,
    equipped with classified knowledge that you encourage people to
    infer is immensely valuable, walk around acting superior to
    everyone.  But one was already well-practiced at that, no?

    In case it need be said, when you apply run-time bounds checks
    intelligently, you are, as when Jules Winnfield handed over the
    contents of his wallet to the gun-toting "Ringo", _buying_ something
    with your money, and that's a degree of protection from undefined
    behavior and security vulnerabilities.  Is the benefit worth the
    cost?  To answer that, one sometimes needs to undertake empirical
    analysis.  Ain't no Real Programmer got time for that.

Attachment: signature.asc
Description: PGP signature

Reply via email to