Re: [PATCH] COBOL 7/15 492K par: parser

James K. Lowden Mon, 17 Feb 2025 10:42:51 -0800

On Sat, 15 Feb 2025 23:35:16 -0500
David Malcolm <dmalc...@redhat.com> wrote:


On better messages ...

> +                  if( ($$ & $2) == $2 ) {
> +                    error_msg(@2, "%s clause repeated", clause);
> +                    YYERROR;
> +                  }
> 
> Obviously not needed for initial release, but it would be neat to have
> a fix-it hint here that deletes the repeated token (fixit hints are
> done via class rich_location, FWIW)

Noted, thanks, 

> +                  if( $data_clause == redefines_clause_e ) {
> +                    error_msg(@2, "REDEFINES must appear "
> +                             "immediately after LEVEL and NAME");
> +                    YYERROR;
> +                  }
> 
> A strict reading of our diagnostic guidelines suggests that all of
> these keywords in these messages should be in quotes, via %{ and %}, or
> via %qs.  But given that cobol has UPPERCASE KEYWORDS THAT ALREADY
> REALLY STAND OUT, maybe that?s overkill.

I endeavored to report every keyword in uppercase.  The user isn't required to 
use uppercase; COBOL is (despite the official name, heh) case-insensitive.  But 
the fact of the keyword is all we have.  The lexer doesn't capture how the user 
typed it in; it reports only the presence of the token.  

In the case of user-defined names, the actual name supplied is captured and 
reported literally.  So, the user could have e.g. 

        001-Initialization Section.

where "Section" is a token whose input string is discarded, and 
"001-Initialization" is a user-defined name whose supplied form is preserved, 
and reported verbatim.  

With that in mind, I propose a policy that builds on your observation (for 
gcc-16, not today):  Report token names in uppercase, unquoted, and 
user-defined names vebatim, quoted.  

I have that filed under "tasks for an eager volunteer, but probably me".  More 
tedious than difficult.  

> +                      error_msg(@2, "%s is binary NUMERIC type, "
> +                               "incompatible with SIGN IS", field->name);
> 
> Again, this isn?t needed for the initial release, but GCCs diagnostics
> can have ?rules? associated with them, which can have URLs (see
> diagnostic-metadata.h)  Is there a useful public standard for Cobol
> with such rules that the output can link to?

There is no freely available COBOL standard.  IBM and Microfocus (and others) 
do publish their documentation on the  web, but the official standard is 
copyrighted and comes at a price.  Please write to your congressman.  

That said, it's been my ambition to tie every relevant message to the ISO 
standard in force at time of compilation.  To that end, I want to move all 
messages to a table keyed by ISO version and section number (or other, for 
-dialect option).  The caller would refer to the table by the key, and 
error_msg() et al. would report that information along with the message text.  

I don't know of another compiler that does that.  I don't mind showing them how 
it's done!  

> +                    auto name = nice_name_of($inspected->field);
> +                    if( !name[0] ) name = "its argument";
> +                    error_msg(@inspected, "INSPECT cannot write to %s", 
> name);
> 
> Building up messages in fragments is a problem for i18n.  Better to
> have an if/else guarding two separate calls to error_msg.  Use %qs in
> the one that uses name, so the name appears in quotes.

Understood, thanks. 

> +                  if( $a.on_error && $a.not_error ) {
> +                    error_msg(@b, "too many ON EXCEPTION clauses");
> 
> Another thing not needed for the initial release - in general, if we?re
> complaining about something in the code being incompatible with other
> code we already saw, it?s good to issue a ?note? immediately after the
> ?error?, 

Agreed, thanks.  

> Not needed for the initial release, but I see a lot of naked ?new?,
> assigned to $$.  Could this be a std::unique_ptr, and use make_unique
> rather than new? Similarly for the return type of functions like
> new_reference and new_literal.  I suspect this would be a lot of work,
> though, and may run into snags (does yylval have to a POD?), so no
> obligation.

You may have already seen my lengthy defense of not worrying about every little 
string.  I would much rather waste 100 KB on unrecovered parser bits and bobs 
than spend one afternoon tracking down a double-free, or subjecting one user to 
a gcc_internal_error.  I would argue it's the only responsible choice, from a 
pareto optimal point of view.  

I hope that's persuasive.  If we're still in doubt, then we'll have to discuss 
what to do.  For sure, it would be a lot of work.  

yylval does have to be a POD.  If there's a contructor, IIRC I get an error 
message to the effect that the constructor was deleted.  But yylval can be a 
pointer to anything, because pointers can be copied.  

Whether or not std::unique_ptr fits under that aegis or not, I don't know.  It 
would take some doing to convince me to adopt that model.  I've avoided it for 
what, 15 years so far?  Since we have a clear hierarchical recursive structure, 
we could "free as we go" along on the path to $accept.  I just see no reason to 
bother.  

I am glad to have brought the diagnostics up to snuiff (barely).  I know users 
will appreciate that work, even if they have *no* idea what a hassle it is to 
track the column.  ;-)  I look forward to implementing your recommendations.  

--jkl

Re: [PATCH] COBOL 7/15 492K par: parser

Reply via email to