Re: Let's Play: Use the Source, Luke! (was: .ie as target of .if)

John Gardner Sun, 27 Sep 2020 01:16:43 -0700

I'm assuming the logic you've described applies to `.while` requests too?
(IIRC, this is the only other conditional that shares the semantics of
`.ie` and `.if`).


The next conditional handles input in...non-copy mode, a thing that no
> *roff documentation I have ever seen has a name for.  (This irritates me.
> Inside me there is an Aristotle or a Linnaeus struggling to get out.))


CSTR #54 § 7.2 defines "copy mode" as *"[input] copied without
interpretation"*, so a more accurate name for *"non-copy mode"* might
be *"interpreted
mode"*. Corrections welcome.

On Sun, 27 Sep 2020 at 17:44, G. Branden Robinson <
g.branden.robin...@gmail.com> wrote:

> Hi, Dave!
>
> At 2020-09-17T12:03:31-0500, Dave Kemper wrote:
> > Consider the much simpler example:
> >
> > .if 0 .if 1 \{\
> > .tm foo
> > .\}
> > .tm bar
> >
> > Following your explanation, the interpreter would evaluate ".if 0",
> > decide it was false, and ignore the rest of the line, thus missing
> > that the line ends in a \{.  Therefore it would go to the next line,
> > and -- unaware that it's inside an opening brace, since it never "saw"
> > it -- execute the ".tm foo" request.  Proceeding to the next line, it
> > encounters an unbalanced closing brace, which it silently ignores (you
> > can verify that it doesn't care about mismatched closing braces by
> > duplicating that line as many times as you please in the input file).
> > Finally, it hits the last line and emits "bar" on stderr.
> >
> > But that's not what happens.  Groff does not print "foo" to stderr,
> > which can only happen if it does in fact process the opening brace --
> > which is associated with a request (the second .if) that it never
> > looks at.  This implies that, at least in some circumstances, the
> > interpreter recognizes opening braces as flow-control structures, and
> > scans for them even in code it would otherwise never examine.
> >
> > The .ie request is just as much a language flow-control element as an
> > opening brace, yet (per my original question) the interpreter does not
> > treat them the same, ignoring the .ie request in a position (after a
> > false conditional) where it does not ignore an opening brace.  And the
> > opening brace is associated with the ".if 1", not the ".if 0", so it's
> > not as simple as a special case of looking for such a brace
> > immediately following a false conditional.  It is, in fact, looking
> > BEYOND where it would have needed to look just to find the .ie request
> > of my first example.
> >
> > Again, if this is considered "working as designed," it should be
> > documented as such, but it's not clear to me just how to document it.
> > Tadziu's suggestion does not account for the opening-brace exception.
> >
> > And are there other exceptions?  And why are there exceptions at all?
>
> I'm far from an expert on the groff parser, but I have studied it a bit
> and made _small_ changes.
>
> I can think of two reasons there are exceptions to your model:
>
> (1) Ease of maintenance of a hand-written recursive-descent parser; and
> (2) No lookahead.  troff has to operate as a Unix filter.  It can store
> all the state it wants but it must act on the most recent character it
> has read.
>
> > It seems like a more consistent (and, not incidentally, easier to
> > document) language design to handle all flow-control constructs the
> > same way: it either unilaterally ignores them after an .if that
> > evaluates to false, or unilaterally scans ahead to see whether any
> > occur later on the line.  Instead, the behavior seems arbitrary and
> > capricious -- which *can* be documented, but still isn't a good
> > language design.
>
> Well, let's go to the source.  What we need is a few functions from
> src/roff/troff/input.cpp:
>
> do_if_request()   (by far the longest)
> if_else_request()
> if_request()
> else_request()
>
> The reason we have two handlers for "if" is that the actual if-handling
> logic has two call sites; one, if_request(), is dispatched when an ".if"
> request is seen on the input.  The other is called by if_else_request().
>
> A key difference between these two functions is that if_request has no
> return value (returns void, in C parlance)--just like all *roff request
> handlers in GNU troff.  do_if_request() returns an integer.
>
> Another key design feature is a data structure called "int_stack", which
> as you may have guessed is simply a stack for integers.  The one of
> interest here is called "if_else_stack".
>
> static int_stack if_else_stack;
>
> Let us consider the short, easy functions first.
>
> void if_request()
> {
>   do_if_request();
> }
>
> ...as simple as you can get.
>
> void if_else_request()
> {
>   if_else_stack.push(do_if_request());
> }
>
> This is more revealing.  If we have an .ie request, call do_if_request()
> _but push its return value onto the integer stack we set up_.
>
> What about the "else" part of our "if-then-else"?
>
> void else_request()
> {
>   if (if_else_stack.is_empty()) {
>     warning(WARN_EL, "unbalanced .el request");
>     skip_alternative();
>   }
>
> The above is pretty obvious.  If we hit an .el, we'd better have seen an
> .ie first.
>
>   else {
>     if (if_else_stack.pop())
>       skip_alternative();
>     else
>       begin_alternative();
>   }
> }
>
> I think we're getting closer to the heart of the discussion here.
>
> In a well-formed groff document, an .el is only encountered after an
> .ie, which as seen above pushed the result of the if-conditional onto
> the stack.  So when we see .el, we pop that integer value and test its
> truthiness.
>
> If the condition was FALSE, we call begin_alternative:
>
> static void begin_alternative()
> {
>   while (tok.space() || tok.left_brace())
>     tok.next();
> }
>
> This just throws away space and left brace tokens until it can return.
> But that makes sense, if the condition was FALSE, we want to execute the
> "body" of the .el.
>
> skip_alternative() has the harder job.  It has to consume the body of
> the ELSE in a semi-interpreted way; enough to syntactically find the
> end of it, but not actually change the state of the engine with respect
> to anything it sees.
>
> Recall that we entered this function from an .el whose body is being
> skipped either because the .el was invalid (.el without .ie) or because
> the "if" part of an if-else (.ie) was true.  There's one[1] other call
> site as we'll get to in a moment.
>
> This is the second-longest function we'll examine in today's excursion.
> And it's only 40 lines!
>
> static void skip_alternative()
> {
>   int level = 0;
>
> We're going to keep track of how many \{ \} escapes are nested.
>
>   // ensure that ".if 0\{" works as expected
>   if (tok.left_brace())
>     level++;
>
> The above is a special case, as noted.
>
>   int c;
>   for (;;) {
>     c = input_stack::get(0);
>     if (c == EOF)
>       break;
>
> That's more mal-formed input handling.
>
>     if (c == ESCAPE_LEFT_BRACE)
>       ++level;
>     else if (c == ESCAPE_RIGHT_BRACE)
>       --level;
>
> I _think_ the above refer to the quasi-interned form in which, for
> instance, macro definitions are stored.  In other words, if we see
> these, we're reading something was stored in "copy mode".  We're seeing
> it because someone called a macro, and its body has been interpolated
> into the input stream for us.
>
> The next conditional handles input in...non-copy mode, a thing that no
> *roff documentation I have ever seen has a name for.  (This irritates
> me.  Inside me there is an Aristotle or a Linnaeus struggling to get
> out.))
>
>     else if (c == escape_char && escape_char > 0)
>       switch(input_stack::get(0)) {
>       case '{':
>         ++level;
>         break;
>       case '}':
>         --level;
>         break;
>
> At any rate, the last four cases we've seen do obvious things: increase
> the nesting level if we've seen some form of open-brace, and decrease it
> if we've seen some form of close-brace.
>
>       case '"':
>         while ((c = input_stack::get(0)) != '\n' && c != EOF)
>         ;
>
> We're still inside that "else if (c == escape_char), so this is handling
> a traditional-style roff comment: \" foo.  It runs until the next
> newline.
>
> I don't know why \# isn't handled here.  Someone want to try to break
> the parser with a test case before I get around to it?
>
>       }
>     /*
>       Note that the level can properly be < 0, e.g.
>
>         .if 1 \{\
>         .if 0 \{\
>         .\}\}
>
>       So don't give an error message in this case.
>     */
>     if (level <= 0 && c == '\n')
>       break;
>
> The DevTeam thinks of everything!
>
> More importantly, this break takes us out of the for loop when we leave
> more scopes than we entered, or see the newline at the end of the
> current braceless scope.
>
>   }
>   tok.next();
>
> And there's the magic.  We're still inside that "for (;;)", so we just
> eat tokens forever until forced to break out of the loop.
>
> }
>
> End of function.
>
> At this point I'm finding myself wanting dinner, so I'll be a bit of a
> dick and leave the ~140 line do_if_request() as an exercise for the
> reader.  But actually I think above answered the question on point.
>
> Also, a lot of the following function is tied up with implementing the
> *roff conditionals, ".if d", ".if r", and so on, so it's not interesting
> from the perspective of resolving when GNU troff fully interprets
> conditional input versus when it doesn't.  Skip to the end for the good
> bits.
>
> int do_if_request()
> {
>   int invert = 0;
>   while (tok.space())
>     tok.next();
>   while (tok.ch() == '!') {
>     tok.next();
>     invert = !invert;
>   }
>   int result;
>   unsigned char c = tok.ch();
>   if (c == 't') {
>     tok.next();
>     result = !nroff_mode;
>   }
>   else if (c == 'n') {
>     tok.next();
>     result = nroff_mode;
>   }
>   else if (c == 'v') {
>     tok.next();
>     result = 0;
>   }
>   else if (c == 'o') {
>     result = (topdiv->get_page_number() & 1);
>     tok.next();
>   }
>   else if (c == 'e') {
>     result = !(topdiv->get_page_number() & 1);
>     tok.next();
>   }
>   else if (c == 'd' || c == 'r') {
>     tok.next();
>     symbol nm = get_name(1);
>     if (nm.is_null()) {
>       skip_alternative();
>       return 0;
>     }
>     result = (c == 'd'
>               ? request_dictionary.lookup(nm) != 0
>               : number_reg_dictionary.lookup(nm) != 0);
>   }
>   else if (c == 'm') {
>     tok.next();
>     symbol nm = get_long_name(1);
>     if (nm.is_null()) {
>       skip_alternative();
>       return 0;
>     }
>     result = (nm == default_symbol
>               || color_dictionary.lookup(nm) != 0);
>   }
>   else if (c == 'c') {
>     tok.next();
>     tok.skip();
>     charinfo *ci = tok.get_char(1);
>     if (ci == 0) {
>       skip_alternative();
>       return 0;
>     }
>     result = character_exists(ci, curenv);
>     tok.next();
>   }
>   else if (c == 'F') {
>     tok.next();
>     symbol nm = get_long_name(1);
>     if (nm.is_null()) {
>       skip_alternative();
>       return 0;
>     }
>     result = check_font(curenv->get_family()->nm, nm);
>   }
>   else if (c == 'S') {
>     tok.next();
>     symbol nm = get_long_name(1);
>     if (nm.is_null()) {
>       skip_alternative();
>       return 0;
>     }
>     result = check_style(nm);
>   }
>   else if (tok.space())
>     result = 0;
>   else if (tok.delimiter()) {
>     token delim = tok;
>     int delim_level = input_stack::get_level();
>     environment env1(curenv);
>     environment env2(curenv);
>     environment *oldenv = curenv;
>     curenv = &env1;
>     suppress_push = 1;
>     for (int i = 0; i < 2; i++) {
>       for (;;) {
>         tok.next();
>         if (tok.newline() || tok.eof()) {
>           warning(WARN_DELIM, "missing closing delimiter");
>           tok.next();
>           curenv = oldenv;
>           return 0;
>         }
>         if (tok == delim
>             && (compatible_flag
>             || input_stack::get_level() == delim_level))
>           break;
>         tok.process();
>       }
>       curenv = &env2;
>     }
>     node *n1 = env1.extract_output_line();
>     node *n2 = env2.extract_output_line();
>     result = same_node_list(n1, n2);
>     delete_node_list(n1);
>     delete_node_list(n2);
>     curenv = oldenv;
>     have_input = 0;
>     suppress_push = 0;
>     tok.next();
>   }
>   else {
>     units n;
>     if (!get_number(&n, 'u')) {
>       skip_alternative();
>       return 0;
>     }
>     else
>       result = n > 0;
>   }
>   if (invert)
>     result = !result;
>   if (result)
>     begin_alternative();
>   else
>     skip_alternative();
>   return result;
> }
>
> Regards,
> Branden
>

Re: Let's Play: Use the Source, Luke! (was: .ie as target of .if)

Reply via email to