Re: Zero Width Space (was Re: How to print a literal '.' as the first character in a line?)

James K. Lowden Sun, 05 Jun 2022 16:27:20 -0700

On Sat, 4 Jun 2022 15:53:25 -0500
"G. Branden Robinson" <g.branden.robin...@gmail.com> wrote:


> Hi James,

Hi Brandon, 

We seem to come at this question from different persepectives.   I'm
going to try to bring you over to mine.  

> This is like saying that all states in a finite state machine (FSM)
> are equivalent.  It's just false (for nontrivial FSMs, and a state
> machine fit to parse the roff language is far from trivial).

I think we disagree on what "how the input is parsed" means.  

The .ec request changes how the input is parsed.  From that point
forward, the input is interpreted differently.  

The \& character (ahem) has no such effect.  The very fact you had to
use gdb to make your point is proof that any effect it does have is not
visible externally.  

FTR, I probably would say all states in a FSM are equivalent.  To say
it's "just false" is only to disagree on "equivalent".  

yacc is an example, isn't it?  yacc is LALR(1): one look-ahead token
parsing left to right.  No one token is privileged over the others,
whether two legs or four.    

> \& in no way tells the output to device to find a "zero-width space
> glyph", like Unicode U+200B, and stick it on the output.  And that is
> what an increasing number of people who have grown up in the Unicode
> era will expect.

Nothing says troff produces Unicode.  

Is space a character?  That's an angels-on-the-head-of-a-pin question.
There's lots of space in troff: horizontal space and vertical space,
measured in different units for different needs. If you want to
describe \& in terms of \h, that's OK with me. Just don't handcuff
yourself to resolving the question in terms of glyphs or Unicode.  

I started programming in 1985.  Depending on whether or not I've grown
up (opinions differ) and what the Unicode Era is, I think I'd qualify.
But -- even though I've spent many hours mastering HTML and troff (and
TeX and DocBook and a few others) -- I don't spend my days studying
Unicode code points.  

The fact that Unicode has similar terminology for its domain shouldn't
confuse anyone. Overlapping terminology is a fact of life in our line
of work.  

> > To me, the term "non-printing input break" verges on nonsense
> > because it suggests there might be such a thing as "printing input".
> 
> "Non-printing" modifies the phrase "input break".
> 
> I'll grant that, in English, an ambiguous parse is possible.

It wasn't the ambiguity I objected to.  It's the point of view.  What
I'm trying to tell you is that this particular term -- and your
highly technical defense of it -- starts from a misplaced sense of what
matters to the user.  

I do not think the term "input break" is meaningful to the user.  Until
today (and still now) I've been fairly successful using groff in total
ignorance of whether or not there are "input breaks" of any kind,
printing or nonprinting.  

I just scanned groff((7) for 1.22.4.  Every use of "break" refers to
output.  Unless I'm mistaken, it means exactly one thing: a line break. 

> That's not all it is used for; see also kerning adjustment prevention,
> suppression of end-of-sentence detection, and (I think) other
> applications.  This is what makes it a bit of a magical thing in
> troff.

Exactly.  None of those are examples of altering how the input is
parsed.  

It's not in the least "magical".  The formatter reads it as it does all
other input.  Inserted into the input stream, the interpreter doesn't
act on what would otherwise be consecutive tokens.  The user could use
\h'0' instead, or any other sequence that wouldn't affect the output.
We offer \&  both for convenience and because "zero-width space"
is easy to remember.  

Regards, 

--jkl

Re: Zero Width Space (was Re: How to print a literal '.' as the first character in a line?)

Reply via email to