Hello,
On Wed, 22 Jan 2025, Martin Uecker wrote:
> > > > If y is not a member it must be an expression, true. But if it's
> > > > a member you don't know, it may be a designation or an expression.
> > >
> > > In an initializer I know all the members.
> >
> > My sentence was ambiguous :-) Trying again: When it's a member, and
> > you know it's a member, then you still don't know if it's going to be
> > a designation or an expression. It can be both.
>
> I guess this depends on what you mean by "it can be". The rule would
> simply be that it is not an expression.
So, that then exactly introduces the notion of
expression-but-not-quite-expression that Joseph mentioned. Ala
'". identifier" is a primary expression, but only within counted_by'.
That's a major modification of the C grammar, regarding name lookup rules
and top-level non-terminals.
> The rationale is the following:
Sure, I see all that. In a recursive descent parser it can even be
trivially hacked upon (not so easily with a parser written in e.g. bison).
But it's IMHO bad language design. Josephs initial idea of __self__, or
something along the line, would instead be a composable extension of
existing constructs: a simple new conditionally defined identifier that is
in no way special for the grammar, it fits just right in and everything
falls into place automatically.
> If it is inside the initializer of a structure and references a member
> of the same structure, then it can not simultaneously be inside the
> argument to a counted_by attribute used in the declaration of this
> structure (which at this time has already been parsed completely). So
> there is no reason to allow it be interpreted as an expression and the
> rule I proposed would therefor simply state that it then is not an
> expression.
Yes, at _that_ place, but what about other places that accept expressions?
You basically introduce ".x as expression, except when (list of
exceptions)". Conditional syntax (in difference to conditional semantics)
is always a bad thing. Look at the c++ stmt/decl ambiguity requiring
exactly this infinite look-ahead delayed parsing I'm worried about.
> struct {
> int n;
> int *buf [[counted_by(.n)]]; // this n is in a counted_by
> } x = { .n }; // this n can not be in counted_by for the same struct
>
>
> If we allowed this to be interpreted as an expression, then you could use
> it to reference a member during initialization, e.g.
>
> struct { int y; int x; } a = { .y = 1, .x = .y };
>
> but this would be another extension unrelated to counted_by, which I did
> not intend to suggest.
Yes, that's what I was also getting at. I'm aware that you didn't want to
suggest that. But it is what you get when you allow ". ident" as primary
expression generally. After doing that, one then needs all kinds of
exceptions to that acceptance to not actually allow such self-references,
or whatever other issues may come up with dot-ident being a
primary-expression in random places.
> There are other possibilities for disambiguation, we could also simply
> state that in initializers at the syntatic position where a designator
> is allowed, it is always a designator and not expression, and it then
> does not reference a member of the structure being initialized, it is an
> error. Maybe this is even preferable.
Perhaps. It does solve the designator-or-expression ambiguity. But you
still would have dot-ident as expression in other contexts, and you still
need to worry about what to do when they are not within counted_by. A
conditionally defined identifier like __self__ effectively solves this at
the name lookup level.
One might say that there's no difference between
conditionally activating a grammar production like ".ident ->
primary-expr" and conditionally defining an identifier (and just letting
the existing "ident -> primary-expr" production do its thing). But that
would be wrong, it's a very big difference.
> I would like to mention that what clang currently has in a prototype
> uses a mechnanism called "delayed parsing" which is essentially infinite
> lookahead (in addition to being confusing and incoherent with C language
> rules). So IMHO we need something better.
Definitily. The infinite look-ahead trial parsing necessary for C++ in
various places is terrible. If something requires more than two tokens in
C, then it's a mis-designed proposal.
> My proposal for the moment would be to only allow very restricted
> syntactic forms, and not generic expressions, side stepping all these
> issues.
Restricting syntax will have it's own problems. If you really want only
restricted syntactical forms then you need grammar rules for all these
cases that you do want to allow. That will be trivial initially, but
inevitably someone will come along and wants to extend the acceptable
cases on the grounds of "you know, this is really an expression here,
please let me write '.x + 0'". Eventually you again end up with a set of
grammar productions that awfully look like assignment-expression, but not
quite.
What you ideally want is introducing all these concepts _without_ major
changes to syntax (at least not introducing ambiguities), and do the
context dependend things one has to do either at the name-lookup or
semantic level, which _are_ already context dependend. Something along
the lines of counted_by accepting general expressions, but then in the
constraints saying that it must be a self-referentional expression (for
lack of a better term, and to be precisely defined :) ).
(Or course I agree that at least initially the acceptable types of
expressions should be limited, for the reasons you already stated in
other mails).
Ciao,
Michael.