Re: weird \s

G. Branden Robinson Fri, 03 Apr 2020 07:09:10 -0700

At 2020-04-02T23:58:01+0000, Bjarni Ingi Gislason wrote:
>   This has been too much (long) chatter without simple solutions.


I'm sorry to break with the consensus that's building around your post,
but I don't find it part of it reasonably implementable.

> "Research is seeing the obvious." [1]
> 
>   1) The missing part is information.
> 
>   Solution:
> 
>   a) Provide a message (warning, error), if "\snn" is in the input.

Let me reiterate.  The problem here is not a syntactical one, but a
semantic one.  It arises from a historical decision to break an
established syntactical pattern to special-case a parse to do what the
user "must have meant".

There is nothing syntactically invalid about the token stream

  \s39

in a roff document.

The problem is whether it means:

1. Set point size to three and render a digit "9", or;
2. Set point size to "39".

Just throwing a diagnostic if we see two digits after '\s' is wrong for
the same reason throwing a diagnostic on the input sequence '\fBI' in
the following would be:

\fBInvisible things are the only realities.\fP

Does this mean:

1. Set font style to BI (bold italic) and render the text from
"nvisible..." forward in it; or
2. Set font style to B (bold) and render the text from "Invisible..."
forward in it?

The answer is obviously (2) to experienced roff users.

>   b) Augment the documentation to tell the readers,
>  that "\snn" is deprecated, obsolete, out of date, etc.
> 
>   and what they should use instead.

When you say "\snn" do you mean "any occurrence of two decimal digits
after \s in the input stream", or the specific parse of that sequence as
setting the point size to a two-digit value if and only if that digit
sequence is in the range 10-39?

If the former, I disagree.  If the latter, I agree strongly.

>   The problem is the people who (still, will) write "\snn"
>  instead of "\s(nn" (portability) or "\s[nn]" (for "groff" and
>  compatibles).

This statement is insufficiently qualified.  The problem is people who
write "\snn" _and expect it to have the same meaning as_ "\s(nn" and
"\s[nn]".

Please excuse me if I'm picking tiny nits here, but the problem domain
is a hand-written parser.  This is the level of detail one has to sweat.

>   3) Other things.
> 
>   a) A missing part of messages is the name of the culprit,
> in this case the s-escape (\s).

Good idea!  This is easily fixable.

However I am reminded of the problem of what to do in the "arbitrary
delimiter" case.  I directed a question to the list in my exchange with
Ingo.  I would ask that people read our exchange[1].  It's not _Bleak
House_.  Not quite.

>   Solution: Provide the name ("\\s escape" is already used once in the
> subroutine).

I'd prefer to refer to the construct by what it _is_ rather than how
it's rendered.

Putting "\s" in the diagnostic seems brilliantly clear until someone
uses the .ec request.  Then it misleads them about what to look for.
By contrast, we can expect people who deliberately use point-size
escapes to know what they look like.  (People who don't are roff novices
who should be reading introductory documentation and/or asking for
help.)

I acknowledge that there is already an exhibit of what I'm objecting to:

      warning(WARN_RANGE,
              "\\s escape results in non-positive point size; set to 1");

I would change that.  Probably:

      warning(WARN_RANGE,
              "point-size escape results in non-positive argument %1; set to 1",
              x);

I realize the above possibly runs afoul of your maxim

"b) Adding details of the argument of the escape in messages is not
necessary."

But not only do I disagree with that in general, but I would question
its applicability here because by this point, "x" is an _interpreted_
value, not necessarily appearing in the input stream.  Concrete example:

$ ./test-groff >/dev/null
.ps 10
foo\s(-12bar
troff: <standard input>:2: warning: \s escape results in non-positive point 
size; set to 1

In my opinion, this would be better:

troff: <standard input>:2: warning: point-size escape results in non-positive 
argument -2; set to 1

(Yes, I know that's a bit long.  Look on the bright side; most people
will never see it because range warnings are off by default. <facepalm>)

In situations where the point size or its adjustment is a computed
expression, which can happen in serious work (e.g., macro packages),
seeing the value that actually got calculated could be helpful for
troubleshooting.

>   b) Adding details of the argument of the escape in messages is not
> necessary.

I disagree.  Input lines are unbounded in length and we don't track the
column number.  Reporting the offending token in a bad parse helps the
user find the exact spot where the trouble is.

Have you seen the output of modern C compilers?  LLVM and GCC have been
in an arms race for years to provide diagnostics that are actually
meaningful and helpful to the user, as opposed to the ridiculous

"invalid lvalue in assignment"

I grew up with. (The generation before me had to cope with the even more
terse "Lvalue required".  This was even lint's idea of user assistance.)

>   c) Adding specific code to report specific syntax errors is not
> necessary.

This maxim is either useless or destructive.

1. Destructive, if it means that it is acceptable to emit only "syntax
error" for _any_ validity problem in lexical analysis (exception: a
language with fewer than three terminal symbols); or

2. Useless, because it moves the problem to what, precisely, you mean by
"specific".

Under the simple dialect of C++ in which groff is written, _all_
diagnostics are reported with "specific code".  In fact, you usually get
at least two frames pushed onto the stack for each one.

Charitably, I will assume that you mean that adding branches to the code
just to elaborate the content of the diagnostic message--say, to
distinguish an invalid zero input from an invalid negative input--is
superfluous.  This is _often_ true but I would not elevate the principle
to the level you have.  Such things are worth doing if users keep
stubbing their toes on a lack of distinction in the message.

>   A look in the documentation should reveal, what the correct, best
> syntax is.

The documentation is large.  Good diagnostic messages--meaning, ones
with specifics--aid the user to find the correct material in the
documentation to read.

> Examples:
> 
> A) Report bad practice:
> 
>       else {
>       val = val*10 + (c - '0');
>       error("Use \\s(%1... instead of \\s%1... for more \
> robustness.\n\tSee the documentation.", val);

This prohibits completely valid input of the form previously described,
such as:

'Set point size to three and render a digit "9"'

> B) Add informations to messages

I know from your whitespace-stripping bug reports that you have no fear
of litigating minute details, so I will remind you that in English,
"information" is a mass (or abstract) noun and does not take the plural.
I mention this mainly because the above solecism increases my cortisol
level.

> a)
>   const char *s_escape_msg = "Previous error message is caused by the \
> \\s escape";

The GNU Coding Standards mandate against capitalizing diagnostic
messages, FYI.

Also, one of the reasons Ingo disliked my (now reverted) commits is that
they increased the ratio of diagnostics-per-parse-error above one.

>   const char *escape = "\\s escape";

See above with respect to reporting _meaning_ to the user and the
hazards of the .ec request making the above inaccurate.

> b) (a = add, c = changed)

For better communication with your peers, please use a standardized
patch format; "diff -u" output is almost universally understood.

>   else if (!tok.delimiter(1)) {
> a    error("%1", s_escape_msg);
>     return 0;
> 
>     if (!get_number(&val, 'z')) {
> a      error("%1", s_escape_msg);
>       return 0;

These are not necessary if we have already thrown a diagnostic (and we
certainly should have).

>       if (start.ch() == '[')
> c     error("%1: missing ']'", escape);
>       else
> c     error("%1: missing closing delimiter", escape);
>       return 0;
>   else {
> c    error("%1: bad digit in point size", escape);
>     return 0;

These are good ideas.  I find the lifting of part of the fixed
diagnostic text into its own variable to be a micro-optimization; it's
not long, and it makes it _harder_, not easier, for a source-diver to
locate the diagnostic of interest with grep.  In my view, components of
diagnostic messages generally should not be parameterized except on
runtime variables.

There is a set of one-liner changes to diagnostics in
input.cpp:read_size() that can be made here; I'll go ahead and do those.

For the slightly more contentious \s39/\s40 parse issue, I'll prepare a
revised version of the patch I posted a few days ago[2].

Thank you for your feedback.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2020-04/msg00000.html
[2] https://lists.gnu.org/archive/html/groff/2020-03/msg00074.html

signature.asc
Description: PGP signature

Re: weird \s

Reply via email to