Hello,

On Wed, 22 Jan 2025, Martin Uecker wrote:

> > > > If y is not a member it must be an expression, true.  But if it's 
> > > > a member you don't know, it may be a designation or an expression.
> > > 
> > > In an initializer I know all the members.
> > 
> > My sentence was ambiguous :-)  Trying again: When it's a member, and 
> > you know it's a member, then you still don't know if it's going to be 
> > a designation or an expression.  It can be both.
> 
> I guess this depends on what you mean by "it can be".  The rule would 
> simply be that it is not an expression.

So, that then exactly introduces the notion of 
expression-but-not-quite-expression that Joseph mentioned.  Ala 
'". identifier" is a primary expression, but only within counted_by'.
That's a major modification of the C grammar, regarding name lookup rules 
and top-level non-terminals.

> The rationale is the following:

Sure, I see all that.  In a recursive descent parser it can even be 
trivially hacked upon (not so easily with a parser written in e.g. bison).  
But it's IMHO bad language design.  Josephs initial idea of __self__, or 
something along the line, would instead be a composable extension of 
existing constructs: a simple new conditionally defined identifier that is 
in no way special for the grammar, it fits just right in and everything 
falls into place automatically.

> If it is inside the initializer of a structure and references a member 
> of the same structure, then it can not simultaneously be inside the 
> argument to a counted_by attribute used in the declaration of this 
> structure (which at this time has already been parsed completely).  So 
> there is no reason to allow it be interpreted as an expression and the 
> rule I proposed would therefor simply state that it then is not an 
> expression.

Yes, at _that_ place, but what about other places that accept expressions?  
You basically introduce ".x as expression, except when (list of 
exceptions)". Conditional syntax (in difference to conditional semantics) 
is always a bad thing.  Look at the c++ stmt/decl ambiguity requiring 
exactly this infinite look-ahead delayed parsing I'm worried about.

> struct { 
>   int n;
>   int *buf [[counted_by(.n)]]; // this n is in a counted_by
> } x = { .n }; // this n can not be in counted_by for the same struct
> 
> 
> If we allowed this to be interpreted as an expression, then you could use
> it to reference a member during initialization, e.g.
> 
> struct { int y; int x; } a = { .y = 1, .x = .y };
> 
> but this would be another extension unrelated to counted_by, which I did 
> not intend to suggest.

Yes, that's what I was also getting at.  I'm aware that you didn't want to 
suggest that.  But it is what you get when you allow ". ident" as primary 
expression generally.  After doing that, one then needs all kinds of 
exceptions to that acceptance to not actually allow such self-references, 
or whatever other issues may come up with dot-ident being a 
primary-expression in random places.

> There are other possibilities for disambiguation, we could also simply 
> state that in initializers at the syntatic position where a designator 
> is allowed, it is always a designator and not expression, and it then 
> does not reference a member of the structure being initialized, it is an 
> error.  Maybe this is even preferable.

Perhaps.  It does solve the designator-or-expression ambiguity.  But you 
still would have dot-ident as expression in other contexts, and you still 
need to worry about what to do when they are not within counted_by.  A 
conditionally defined identifier like __self__ effectively solves this at 
the name lookup level.

One might say that there's no difference between 
conditionally activating a grammar production like ".ident -> 
primary-expr" and conditionally defining an identifier (and just letting 
the existing "ident -> primary-expr" production do its thing).  But that 
would be wrong, it's a very big difference.

> I would like to mention that what clang currently has in a prototype 
> uses a mechnanism called "delayed parsing" which is essentially infinite 
> lookahead (in addition to being confusing and incoherent with C language 
> rules). So IMHO we need something better.

Definitily.  The infinite look-ahead trial parsing necessary for C++ in 
various places is terrible.  If something requires more than two tokens in 
C, then it's a mis-designed proposal.

> My proposal for the moment would be to only allow very restricted 
> syntactic forms, and not generic expressions, side stepping all these 
> issues.

Restricting syntax will have it's own problems.  If you really want only 
restricted syntactical forms then you need grammar rules for all these 
cases that you do want to allow.  That will be trivial initially, but 
inevitably someone will come along and wants to extend the acceptable 
cases on the grounds of "you know, this is really an expression here, 
please let me write '.x + 0'".  Eventually you again end up with a set of 
grammar productions that awfully look like assignment-expression, but not 
quite.

What you ideally want is introducing all these concepts _without_ major 
changes to syntax (at least not introducing ambiguities), and do the 
context dependend things one has to do either at the name-lookup or 
semantic level, which _are_ already context dependend.  Something along 
the lines of counted_by accepting general expressions, but then in the 
constraints saying that it must be a self-referentional expression (for 
lack of a better term, and to be precisely defined :) ).

(Or course I agree that at least initially the acceptable types of 
expressions should be limited, for the reasons you already stated in 
other mails).


Ciao,
Michael.

Reply via email to