On Tue, Jul 23, 2024 at 05:12:14PM -0400, Jason Merrill wrote:
> On 7/23/24 4:18 PM, Marek Polacek wrote:
> > On Tue, Jul 23, 2024 at 12:53:07AM -0400, Jason Merrill wrote:
> > > On 7/20/24 2:31 PM, Marek Polacek wrote:
> > > > [ Entering the contest to fix the oldest PR in this cycle. ]
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > This 18-year-old PR reports that we parse certain comma expressions
> > > > as a declaration rather than statement when the statement begins with
> > > > a functional-style cast expression.  Consider
> > > > 
> > > >     int(x), 0;
> > > > 
> > > > which does not declare x--it only casts x to int--, whereas
> > > > 
> > > >     int(x), (y);
> > > > 
> > > > declares x and y.  We need some kind of look-ahead to decide how we
> > > > should disambiguate the construct, because cp_parser_init_declarator
> > > > commits eagerly once it has seen "int(x)", and then it's too late to
> > > > recover.
> > > > 
> > > > This patch makes us try to parse the code as a sequence of declarators;
> > > > if that fails, we are likely looking at a statement.  That's a simple
> > > > idea, but it's complicated by code like
> > > > 
> > > >     void (*p)(void *)(fun);
> > > > 
> > > > which initializes a pointer-to-function, or
> > > > 
> > > >     int(x), (x) + 1;
> > > > 
> > > > which is an expression statement, but the second (x) is parsed as
> > > > a valid declarator, only the + after reveals that the whole thing
> > > > is an expression.  You can have things like
> > > > 
> > > >     int(**p)
> > > > 
> > > > which by itself doesn't tell you much.  You can have
> > > > 
> > > >     int(*q)(void*)
> > > > 
> > > > which looks like it starts with a functional-style cast, but it is not
> > > > a cast.  The simple
> > > > 
> > > >     int(x) = 42;
> > > > 
> > > > has an initializer so it declares x; it is not an assignment.  But then,
> > > > 
> > > >       int(d) __attribute__(());
> > > > 
> > > > does not have an initializer, but the attribute makes it a declaration.
> > > > 
> > > >         PR c++/29834
> > > >         PR c++/54905
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > >         * parser.cc (cp_parser_lambda_introducer): Use
> > > >         cp_parser_next_token_starts_initializer_p.
> > > >         (cp_parser_simple_declaration): Add look-ahead to decide if 
> > > > we're
> > > >         looking at a declaration or statement.
> > > >         (cp_parser_next_token_starts_initializer_p): New.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > >         * g++.dg/parse/ambig15.C: New test.
> > > >         * g++.dg/parse/ambig16.C: New test.
> > > > ---
> > > >    gcc/cp/parser.cc                     | 73 ++++++++++++++++++++++--
> > > >    gcc/testsuite/g++.dg/parse/ambig15.C | 83 
> > > > ++++++++++++++++++++++++++++
> > > >    gcc/testsuite/g++.dg/parse/ambig16.C | 18 ++++++
> > > >    3 files changed, 168 insertions(+), 6 deletions(-)
> > > >    create mode 100644 gcc/testsuite/g++.dg/parse/ambig15.C
> > > >    create mode 100644 gcc/testsuite/g++.dg/parse/ambig16.C
> > > > 
> > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > index 1fa0780944b..797cfc3204e 100644
> > > > --- a/gcc/cp/parser.cc
> > > > +++ b/gcc/cp/parser.cc
> > > > @@ -2947,6 +2947,8 @@ static bool 
> > > > cp_parser_next_token_ends_template_argument_p
> > > >      (cp_parser *);
> > > >    static bool cp_parser_nth_token_starts_template_argument_list_p
> > > >      (cp_parser *, size_t);
> > > > +static bool cp_parser_next_token_starts_initializer_p
> > > > +  (cp_parser *);
> > > >    static enum tag_types cp_parser_token_is_class_key
> > > >      (cp_token *);
> > > >    static enum tag_types cp_parser_token_is_type_parameter_key
> > > > @@ -11663,9 +11665,7 @@ cp_parser_lambda_introducer (cp_parser* parser, 
> > > > tree lambda_expr)
> > > >         }
> > > >          /* Find the initializer for this capture.  */
> > > > -      if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
> > > > -         || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
> > > > -         || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
> > > > +      if (cp_parser_next_token_starts_initializer_p (parser))
> > > >         {
> > > >           /* An explicit initializer exists.  */
> > > >           if (cxx_dialect < cxx14)
> > > > @@ -11747,9 +11747,7 @@ cp_parser_lambda_introducer (cp_parser* parser, 
> > > > tree lambda_expr)
> > > >                   /* If what follows is an initializer, the second 
> > > > '...' is
> > > >                      invalid.  But for cases like [...xs...], the first 
> > > > one
> > > >                      is invalid.  */
> > > > -                 if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
> > > > -                     || cp_lexer_next_token_is (parser->lexer, 
> > > > CPP_OPEN_PAREN)
> > > > -                     || cp_lexer_next_token_is (parser->lexer, 
> > > > CPP_OPEN_BRACE))
> > > > +                 if (cp_parser_next_token_starts_initializer_p 
> > > > (parser))
> > > >                     ellipsis_loc = loc;
> > > >                   error_at (ellipsis_loc, "too many %<...%> in lambda 
> > > > capture");
> > > >                   continue;
> > > > @@ -16047,6 +16045,58 @@ cp_parser_simple_declaration (cp_parser* 
> > > > parser,
> > > >        else
> > > >          break;
> > > > +  /* If we are still uncommitted, we're probably looking at something 
> > > > like
> > > > +     T(x), which can be a declaration but does not have to be, 
> > > > depending
> > > > +     on what comes after.  Consider
> > > > +       int(x), 0;
> > > > +     which is _not_ a declaration of x, it's a functional cast, and
> > > > +       int(x), (y);
> > > > +     which declares x and y.  We need some kind of look-ahead to 
> > > > decide,
> > > > +     cp_parser_init_declarator below will commit eagerly once it has 
> > > > seen
> > > > +     "int(x)".  So we try to parse this as a sequence of declarators; 
> > > > if
> > > > +     that fails, we are likely looking at a statement.  (We could avoid
> > > > +     all of this if there is no non-nested comma.)  */
> > > 
> > > Unfortunately, this seems to break
> > > 
> > > int main()
> > > {
> > >    int(x), y[sizeof(x)];
> > > }
> > 
> > Oy.  That's a problem, thanks for catching that.  So:
> > 
> >    void f(int v, int *w)
> >    {
> >      int(x), y[sizeof(x)]; // decl
> >      int(v), w[sizeof(v)], true; // expr
> >    }
> > 
> > The problem is that the cp_parser_declarator call I added was meant to
> > be purely syntactic checking -- does it _look_ like it could be a decl? --
> > but here it emits an error.
> > 
> > Is it worth experimenting with a new CP_PARSER_SYNTAX_ONLY flag to
> > prevent emitting any errors?
> 
> I've wished for a while that we would store diagnostics during tentative
> parsing and then emit them all once we commit to the tentative parse.

Yea, it's come up a few times, hasn't it.  It would fix

https://gcc.gnu.org/PR61259
https://gcc.gnu.org/PR109775

and I wonder what else.

> With that functionality, we could parse the decls normally (but tentatively)
> and then go back and remove them from the current scope if they turn out not
> to have actually been declarations?
> 
> Extremely vexing.

It's an interesting project, but these PRs are pretty obscure and have fairly
simple workarounds.  I think I better address other projects first...

Well, it was worth a shot.  Maybe I can at least add a testcase with your
testcase (we don't test it currently), and extract whatever already works
from my new test.

Marek

Reply via email to