On Sat, 4 Jun 2022 15:53:25 -0500 "G. Branden Robinson" <g.branden.robin...@gmail.com> wrote:
> Hi James, Hi Brandon, We seem to come at this question from different persepectives. I'm going to try to bring you over to mine. > This is like saying that all states in a finite state machine (FSM) > are equivalent. It's just false (for nontrivial FSMs, and a state > machine fit to parse the roff language is far from trivial). I think we disagree on what "how the input is parsed" means. The .ec request changes how the input is parsed. From that point forward, the input is interpreted differently. The \& character (ahem) has no such effect. The very fact you had to use gdb to make your point is proof that any effect it does have is not visible externally. FTR, I probably would say all states in a FSM are equivalent. To say it's "just false" is only to disagree on "equivalent". yacc is an example, isn't it? yacc is LALR(1): one look-ahead token parsing left to right. No one token is privileged over the others, whether two legs or four. > \& in no way tells the output to device to find a "zero-width space > glyph", like Unicode U+200B, and stick it on the output. And that is > what an increasing number of people who have grown up in the Unicode > era will expect. Nothing says troff produces Unicode. Is space a character? That's an angels-on-the-head-of-a-pin question. There's lots of space in troff: horizontal space and vertical space, measured in different units for different needs. If you want to describe \& in terms of \h, that's OK with me. Just don't handcuff yourself to resolving the question in terms of glyphs or Unicode. I started programming in 1985. Depending on whether or not I've grown up (opinions differ) and what the Unicode Era is, I think I'd qualify. But -- even though I've spent many hours mastering HTML and troff (and TeX and DocBook and a few others) -- I don't spend my days studying Unicode code points. The fact that Unicode has similar terminology for its domain shouldn't confuse anyone. Overlapping terminology is a fact of life in our line of work. > > To me, the term "non-printing input break" verges on nonsense > > because it suggests there might be such a thing as "printing input". > > "Non-printing" modifies the phrase "input break". > > I'll grant that, in English, an ambiguous parse is possible. It wasn't the ambiguity I objected to. It's the point of view. What I'm trying to tell you is that this particular term -- and your highly technical defense of it -- starts from a misplaced sense of what matters to the user. I do not think the term "input break" is meaningful to the user. Until today (and still now) I've been fairly successful using groff in total ignorance of whether or not there are "input breaks" of any kind, printing or nonprinting. I just scanned groff((7) for 1.22.4. Every use of "break" refers to output. Unless I'm mistaken, it means exactly one thing: a line break. > That's not all it is used for; see also kerning adjustment prevention, > suppression of end-of-sentence detection, and (I think) other > applications. This is what makes it a bit of a magical thing in > troff. Exactly. None of those are examples of altering how the input is parsed. It's not in the least "magical". The formatter reads it as it does all other input. Inserted into the input stream, the interpreter doesn't act on what would otherwise be consecutive tokens. The user could use \h'0' instead, or any other sequence that wouldn't affect the output. We offer \& both for convenience and because "zero-width space" is easy to remember. Regards, --jkl