I'm assuming the logic you've described applies to `.while` requests too? (IIRC, this is the only other conditional that shares the semantics of `.ie` and `.if`).
The next conditional handles input in...non-copy mode, a thing that no > *roff documentation I have ever seen has a name for. (This irritates me. > Inside me there is an Aristotle or a Linnaeus struggling to get out.)) CSTR #54 ยง 7.2 defines "copy mode" as *"[input] copied without interpretation"*, so a more accurate name for *"non-copy mode"* might be *"interpreted mode"*. Corrections welcome. On Sun, 27 Sep 2020 at 17:44, G. Branden Robinson < g.branden.robin...@gmail.com> wrote: > Hi, Dave! > > At 2020-09-17T12:03:31-0500, Dave Kemper wrote: > > Consider the much simpler example: > > > > .if 0 .if 1 \{\ > > .tm foo > > .\} > > .tm bar > > > > Following your explanation, the interpreter would evaluate ".if 0", > > decide it was false, and ignore the rest of the line, thus missing > > that the line ends in a \{. Therefore it would go to the next line, > > and -- unaware that it's inside an opening brace, since it never "saw" > > it -- execute the ".tm foo" request. Proceeding to the next line, it > > encounters an unbalanced closing brace, which it silently ignores (you > > can verify that it doesn't care about mismatched closing braces by > > duplicating that line as many times as you please in the input file). > > Finally, it hits the last line and emits "bar" on stderr. > > > > But that's not what happens. Groff does not print "foo" to stderr, > > which can only happen if it does in fact process the opening brace -- > > which is associated with a request (the second .if) that it never > > looks at. This implies that, at least in some circumstances, the > > interpreter recognizes opening braces as flow-control structures, and > > scans for them even in code it would otherwise never examine. > > > > The .ie request is just as much a language flow-control element as an > > opening brace, yet (per my original question) the interpreter does not > > treat them the same, ignoring the .ie request in a position (after a > > false conditional) where it does not ignore an opening brace. And the > > opening brace is associated with the ".if 1", not the ".if 0", so it's > > not as simple as a special case of looking for such a brace > > immediately following a false conditional. It is, in fact, looking > > BEYOND where it would have needed to look just to find the .ie request > > of my first example. > > > > Again, if this is considered "working as designed," it should be > > documented as such, but it's not clear to me just how to document it. > > Tadziu's suggestion does not account for the opening-brace exception. > > > > And are there other exceptions? And why are there exceptions at all? > > I'm far from an expert on the groff parser, but I have studied it a bit > and made _small_ changes. > > I can think of two reasons there are exceptions to your model: > > (1) Ease of maintenance of a hand-written recursive-descent parser; and > (2) No lookahead. troff has to operate as a Unix filter. It can store > all the state it wants but it must act on the most recent character it > has read. > > > It seems like a more consistent (and, not incidentally, easier to > > document) language design to handle all flow-control constructs the > > same way: it either unilaterally ignores them after an .if that > > evaluates to false, or unilaterally scans ahead to see whether any > > occur later on the line. Instead, the behavior seems arbitrary and > > capricious -- which *can* be documented, but still isn't a good > > language design. > > Well, let's go to the source. What we need is a few functions from > src/roff/troff/input.cpp: > > do_if_request() (by far the longest) > if_else_request() > if_request() > else_request() > > The reason we have two handlers for "if" is that the actual if-handling > logic has two call sites; one, if_request(), is dispatched when an ".if" > request is seen on the input. The other is called by if_else_request(). > > A key difference between these two functions is that if_request has no > return value (returns void, in C parlance)--just like all *roff request > handlers in GNU troff. do_if_request() returns an integer. > > Another key design feature is a data structure called "int_stack", which > as you may have guessed is simply a stack for integers. The one of > interest here is called "if_else_stack". > > static int_stack if_else_stack; > > Let us consider the short, easy functions first. > > void if_request() > { > do_if_request(); > } > > ...as simple as you can get. > > void if_else_request() > { > if_else_stack.push(do_if_request()); > } > > This is more revealing. If we have an .ie request, call do_if_request() > _but push its return value onto the integer stack we set up_. > > What about the "else" part of our "if-then-else"? > > void else_request() > { > if (if_else_stack.is_empty()) { > warning(WARN_EL, "unbalanced .el request"); > skip_alternative(); > } > > The above is pretty obvious. If we hit an .el, we'd better have seen an > .ie first. > > else { > if (if_else_stack.pop()) > skip_alternative(); > else > begin_alternative(); > } > } > > I think we're getting closer to the heart of the discussion here. > > In a well-formed groff document, an .el is only encountered after an > .ie, which as seen above pushed the result of the if-conditional onto > the stack. So when we see .el, we pop that integer value and test its > truthiness. > > If the condition was FALSE, we call begin_alternative: > > static void begin_alternative() > { > while (tok.space() || tok.left_brace()) > tok.next(); > } > > This just throws away space and left brace tokens until it can return. > But that makes sense, if the condition was FALSE, we want to execute the > "body" of the .el. > > skip_alternative() has the harder job. It has to consume the body of > the ELSE in a semi-interpreted way; enough to syntactically find the > end of it, but not actually change the state of the engine with respect > to anything it sees. > > Recall that we entered this function from an .el whose body is being > skipped either because the .el was invalid (.el without .ie) or because > the "if" part of an if-else (.ie) was true. There's one[1] other call > site as we'll get to in a moment. > > This is the second-longest function we'll examine in today's excursion. > And it's only 40 lines! > > static void skip_alternative() > { > int level = 0; > > We're going to keep track of how many \{ \} escapes are nested. > > // ensure that ".if 0\{" works as expected > if (tok.left_brace()) > level++; > > The above is a special case, as noted. > > int c; > for (;;) { > c = input_stack::get(0); > if (c == EOF) > break; > > That's more mal-formed input handling. > > if (c == ESCAPE_LEFT_BRACE) > ++level; > else if (c == ESCAPE_RIGHT_BRACE) > --level; > > I _think_ the above refer to the quasi-interned form in which, for > instance, macro definitions are stored. In other words, if we see > these, we're reading something was stored in "copy mode". We're seeing > it because someone called a macro, and its body has been interpolated > into the input stream for us. > > The next conditional handles input in...non-copy mode, a thing that no > *roff documentation I have ever seen has a name for. (This irritates > me. Inside me there is an Aristotle or a Linnaeus struggling to get > out.)) > > else if (c == escape_char && escape_char > 0) > switch(input_stack::get(0)) { > case '{': > ++level; > break; > case '}': > --level; > break; > > At any rate, the last four cases we've seen do obvious things: increase > the nesting level if we've seen some form of open-brace, and decrease it > if we've seen some form of close-brace. > > case '"': > while ((c = input_stack::get(0)) != '\n' && c != EOF) > ; > > We're still inside that "else if (c == escape_char), so this is handling > a traditional-style roff comment: \" foo. It runs until the next > newline. > > I don't know why \# isn't handled here. Someone want to try to break > the parser with a test case before I get around to it? > > } > /* > Note that the level can properly be < 0, e.g. > > .if 1 \{\ > .if 0 \{\ > .\}\} > > So don't give an error message in this case. > */ > if (level <= 0 && c == '\n') > break; > > The DevTeam thinks of everything! > > More importantly, this break takes us out of the for loop when we leave > more scopes than we entered, or see the newline at the end of the > current braceless scope. > > } > tok.next(); > > And there's the magic. We're still inside that "for (;;)", so we just > eat tokens forever until forced to break out of the loop. > > } > > End of function. > > At this point I'm finding myself wanting dinner, so I'll be a bit of a > dick and leave the ~140 line do_if_request() as an exercise for the > reader. But actually I think above answered the question on point. > > Also, a lot of the following function is tied up with implementing the > *roff conditionals, ".if d", ".if r", and so on, so it's not interesting > from the perspective of resolving when GNU troff fully interprets > conditional input versus when it doesn't. Skip to the end for the good > bits. > > int do_if_request() > { > int invert = 0; > while (tok.space()) > tok.next(); > while (tok.ch() == '!') { > tok.next(); > invert = !invert; > } > int result; > unsigned char c = tok.ch(); > if (c == 't') { > tok.next(); > result = !nroff_mode; > } > else if (c == 'n') { > tok.next(); > result = nroff_mode; > } > else if (c == 'v') { > tok.next(); > result = 0; > } > else if (c == 'o') { > result = (topdiv->get_page_number() & 1); > tok.next(); > } > else if (c == 'e') { > result = !(topdiv->get_page_number() & 1); > tok.next(); > } > else if (c == 'd' || c == 'r') { > tok.next(); > symbol nm = get_name(1); > if (nm.is_null()) { > skip_alternative(); > return 0; > } > result = (c == 'd' > ? request_dictionary.lookup(nm) != 0 > : number_reg_dictionary.lookup(nm) != 0); > } > else if (c == 'm') { > tok.next(); > symbol nm = get_long_name(1); > if (nm.is_null()) { > skip_alternative(); > return 0; > } > result = (nm == default_symbol > || color_dictionary.lookup(nm) != 0); > } > else if (c == 'c') { > tok.next(); > tok.skip(); > charinfo *ci = tok.get_char(1); > if (ci == 0) { > skip_alternative(); > return 0; > } > result = character_exists(ci, curenv); > tok.next(); > } > else if (c == 'F') { > tok.next(); > symbol nm = get_long_name(1); > if (nm.is_null()) { > skip_alternative(); > return 0; > } > result = check_font(curenv->get_family()->nm, nm); > } > else if (c == 'S') { > tok.next(); > symbol nm = get_long_name(1); > if (nm.is_null()) { > skip_alternative(); > return 0; > } > result = check_style(nm); > } > else if (tok.space()) > result = 0; > else if (tok.delimiter()) { > token delim = tok; > int delim_level = input_stack::get_level(); > environment env1(curenv); > environment env2(curenv); > environment *oldenv = curenv; > curenv = &env1; > suppress_push = 1; > for (int i = 0; i < 2; i++) { > for (;;) { > tok.next(); > if (tok.newline() || tok.eof()) { > warning(WARN_DELIM, "missing closing delimiter"); > tok.next(); > curenv = oldenv; > return 0; > } > if (tok == delim > && (compatible_flag > || input_stack::get_level() == delim_level)) > break; > tok.process(); > } > curenv = &env2; > } > node *n1 = env1.extract_output_line(); > node *n2 = env2.extract_output_line(); > result = same_node_list(n1, n2); > delete_node_list(n1); > delete_node_list(n2); > curenv = oldenv; > have_input = 0; > suppress_push = 0; > tok.next(); > } > else { > units n; > if (!get_number(&n, 'u')) { > skip_alternative(); > return 0; > } > else > result = n > 0; > } > if (invert) > result = !result; > if (result) > begin_alternative(); > else > skip_alternative(); > return result; > } > > Regards, > Branden >