At 2020-04-02T23:58:01+0000, Bjarni Ingi Gislason wrote: > This has been too much (long) chatter without simple solutions.
I'm sorry to break with the consensus that's building around your post, but I don't find it part of it reasonably implementable. > "Research is seeing the obvious." [1] > > 1) The missing part is information. > > Solution: > > a) Provide a message (warning, error), if "\snn" is in the input. Let me reiterate. The problem here is not a syntactical one, but a semantic one. It arises from a historical decision to break an established syntactical pattern to special-case a parse to do what the user "must have meant". There is nothing syntactically invalid about the token stream \s39 in a roff document. The problem is whether it means: 1. Set point size to three and render a digit "9", or; 2. Set point size to "39". Just throwing a diagnostic if we see two digits after '\s' is wrong for the same reason throwing a diagnostic on the input sequence '\fBI' in the following would be: \fBInvisible things are the only realities.\fP Does this mean: 1. Set font style to BI (bold italic) and render the text from "nvisible..." forward in it; or 2. Set font style to B (bold) and render the text from "Invisible..." forward in it? The answer is obviously (2) to experienced roff users. > b) Augment the documentation to tell the readers, > that "\snn" is deprecated, obsolete, out of date, etc. > > and what they should use instead. When you say "\snn" do you mean "any occurrence of two decimal digits after \s in the input stream", or the specific parse of that sequence as setting the point size to a two-digit value if and only if that digit sequence is in the range 10-39? If the former, I disagree. If the latter, I agree strongly. > The problem is the people who (still, will) write "\snn" > instead of "\s(nn" (portability) or "\s[nn]" (for "groff" and > compatibles). This statement is insufficiently qualified. The problem is people who write "\snn" _and expect it to have the same meaning as_ "\s(nn" and "\s[nn]". Please excuse me if I'm picking tiny nits here, but the problem domain is a hand-written parser. This is the level of detail one has to sweat. > 3) Other things. > > a) A missing part of messages is the name of the culprit, > in this case the s-escape (\s). Good idea! This is easily fixable. However I am reminded of the problem of what to do in the "arbitrary delimiter" case. I directed a question to the list in my exchange with Ingo. I would ask that people read our exchange[1]. It's not _Bleak House_. Not quite. > Solution: Provide the name ("\\s escape" is already used once in the > subroutine). I'd prefer to refer to the construct by what it _is_ rather than how it's rendered. Putting "\s" in the diagnostic seems brilliantly clear until someone uses the .ec request. Then it misleads them about what to look for. By contrast, we can expect people who deliberately use point-size escapes to know what they look like. (People who don't are roff novices who should be reading introductory documentation and/or asking for help.) I acknowledge that there is already an exhibit of what I'm objecting to: warning(WARN_RANGE, "\\s escape results in non-positive point size; set to 1"); I would change that. Probably: warning(WARN_RANGE, "point-size escape results in non-positive argument %1; set to 1", x); I realize the above possibly runs afoul of your maxim "b) Adding details of the argument of the escape in messages is not necessary." But not only do I disagree with that in general, but I would question its applicability here because by this point, "x" is an _interpreted_ value, not necessarily appearing in the input stream. Concrete example: $ ./test-groff >/dev/null .ps 10 foo\s(-12bar troff: <standard input>:2: warning: \s escape results in non-positive point size; set to 1 In my opinion, this would be better: troff: <standard input>:2: warning: point-size escape results in non-positive argument -2; set to 1 (Yes, I know that's a bit long. Look on the bright side; most people will never see it because range warnings are off by default. <facepalm>) In situations where the point size or its adjustment is a computed expression, which can happen in serious work (e.g., macro packages), seeing the value that actually got calculated could be helpful for troubleshooting. > b) Adding details of the argument of the escape in messages is not > necessary. I disagree. Input lines are unbounded in length and we don't track the column number. Reporting the offending token in a bad parse helps the user find the exact spot where the trouble is. Have you seen the output of modern C compilers? LLVM and GCC have been in an arms race for years to provide diagnostics that are actually meaningful and helpful to the user, as opposed to the ridiculous "invalid lvalue in assignment" I grew up with. (The generation before me had to cope with the even more terse "Lvalue required". This was even lint's idea of user assistance.) > c) Adding specific code to report specific syntax errors is not > necessary. This maxim is either useless or destructive. 1. Destructive, if it means that it is acceptable to emit only "syntax error" for _any_ validity problem in lexical analysis (exception: a language with fewer than three terminal symbols); or 2. Useless, because it moves the problem to what, precisely, you mean by "specific". Under the simple dialect of C++ in which groff is written, _all_ diagnostics are reported with "specific code". In fact, you usually get at least two frames pushed onto the stack for each one. Charitably, I will assume that you mean that adding branches to the code just to elaborate the content of the diagnostic message--say, to distinguish an invalid zero input from an invalid negative input--is superfluous. This is _often_ true but I would not elevate the principle to the level you have. Such things are worth doing if users keep stubbing their toes on a lack of distinction in the message. > A look in the documentation should reveal, what the correct, best > syntax is. The documentation is large. Good diagnostic messages--meaning, ones with specifics--aid the user to find the correct material in the documentation to read. > Examples: > > A) Report bad practice: > > else { > val = val*10 + (c - '0'); > error("Use \\s(%1... instead of \\s%1... for more \ > robustness.\n\tSee the documentation.", val); This prohibits completely valid input of the form previously described, such as: 'Set point size to three and render a digit "9"' > B) Add informations to messages I know from your whitespace-stripping bug reports that you have no fear of litigating minute details, so I will remind you that in English, "information" is a mass (or abstract) noun and does not take the plural. I mention this mainly because the above solecism increases my cortisol level. > a) > const char *s_escape_msg = "Previous error message is caused by the \ > \\s escape"; The GNU Coding Standards mandate against capitalizing diagnostic messages, FYI. Also, one of the reasons Ingo disliked my (now reverted) commits is that they increased the ratio of diagnostics-per-parse-error above one. > const char *escape = "\\s escape"; See above with respect to reporting _meaning_ to the user and the hazards of the .ec request making the above inaccurate. > b) (a = add, c = changed) For better communication with your peers, please use a standardized patch format; "diff -u" output is almost universally understood. > else if (!tok.delimiter(1)) { > a error("%1", s_escape_msg); > return 0; > > if (!get_number(&val, 'z')) { > a error("%1", s_escape_msg); > return 0; These are not necessary if we have already thrown a diagnostic (and we certainly should have). > if (start.ch() == '[') > c error("%1: missing ']'", escape); > else > c error("%1: missing closing delimiter", escape); > return 0; > else { > c error("%1: bad digit in point size", escape); > return 0; These are good ideas. I find the lifting of part of the fixed diagnostic text into its own variable to be a micro-optimization; it's not long, and it makes it _harder_, not easier, for a source-diver to locate the diagnostic of interest with grep. In my view, components of diagnostic messages generally should not be parameterized except on runtime variables. There is a set of one-liner changes to diagnostics in input.cpp:read_size() that can be made here; I'll go ahead and do those. For the slightly more contentious \s39/\s40 parse issue, I'll prepare a revised version of the patch I posted a few days ago[2]. Thank you for your feedback. Regards, Branden [1] https://lists.gnu.org/archive/html/groff/2020-04/msg00000.html [2] https://lists.gnu.org/archive/html/groff/2020-03/msg00074.html
signature.asc
Description: PGP signature