> On Apr 1, 2025, at 15:25, Martin Uecker <[email protected]> wrote:
>
> Am Dienstag, dem 01.04.2025 um 18:58 +0000 schrieb Qing Zhao:
>>
>>> On Apr 1, 2025, at 11:28, Martin Uecker <[email protected]> wrote:
>>>
>>> Am Dienstag, dem 01.04.2025 um 15:01 +0000 schrieb Qing Zhao:
>>>>
>>>>> On Apr 1, 2025, at 10:04, Martin Uecker <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
>>>>>>> I'd like to offer up this to solve the issues we're facing. This is a
>>>>>>> combination of everything that's been discussed here (or at least that
>>>>>>> I've been able to read in the centi-thread :-).
>>>>>
>>>>> Thanks! I think this proposal much better as it avoids undue burden
>>>>> on parsers, but it does not address all my concerns.
>>>>>
>>>>>
>>>>> From my side, the issue about compromising the scoping rules of C
>>>>> is also about unintended non-local effects of code changes. In
>>>>> my opinion, a change in a library header elsewhere should not cause
>>>>> code in a local scope (which itself might also come from a macro) to
>>>>> emit a warning or require a programmer to add a workaround. So I am
>>>>> not convinced that adding warnings or a workaround such as
>>>>> __builtin_global_ref is a good solution.
>>>>>
>>>>>
>>>>> I could see the following as a possible way forward: We only
>>>>> allow the following two syntaxes:
>>>>>
>>>>> 1. Single argument referring to a member.
>>>>>
>>>>> __counted_by(len)
>>>>>
>>>>> with an argument that must be a single identifier and where
>>>>> the identifier then must refer to a struct member.
>>>>>
>>>>> (I still think this is not ideal and potentially
>>>>> confusing, but in contrast to new scoping rules it is
>>>>> at least relatively easily to explain as a special rule.).
>>>>>
>>
>> So, in allowed syntax 1, the identifier inside counted_by attribute will be
>> looked up inside
>> the structure.
>>
>> This is our current implementation of the counted_by for FAM and my previous
>> submitted
>> patch for counted_by for Pointers inside structures.
>>
>> Keeping this syntax is good.
>>
>>>>>
>>>>> 2. Forward declarations.
>>>>>
>>>>> __counted_by(size_t len; len + PADDING)
>>>>
>>>> In the above, the PADDING is some constant?
>>>
>>> In principle - when considering only the name lookup rules -
>>> it could be a constant, a global variable, or an automatic
>>> variable, i.e. any ordinary identifiers which is visible at
>>> this point.
>>
>> I am a little confused here:
>> Is this syntax 2 a new syntax, and with new name lookup rules other than the
>> syntax 1?
>
> Yes. With the regular C name lookup rules other than syntax 1.
>
>>
>> How should the identifiers inside counted_by attribute with this syntax be
>> looked up?
>> Inside the structure first? Then if not found, looking up the outer scope
>> for identifiers in the
>> PADDING part?
>
> The identifier in the forward declaration ("len") will be looked
> up in the structure and will be made available when parsing
> the expression. Any other identifiers (such as "PADDING")
> will not be looked up in the structure. So it is always
> clear where each identifier is going to be looked up.
Yeah, this sounds a good idea to me, and a nice compromise solution. -:)
Then, if more than one members need to be in the expression, for example:
int number;
struct A {
size_t count_1;
size_t count_2;
char *array __counted_by (size_t count_1; size_t count_2; count1 + count2 +
number * 4)
}
i.e., all the members that will be in the counted_by expression should be
declared first inside the
counted_by, then all other variables in the expression could be looked up per
the default scoping rule.
Is the understanding correct?
Qing
>
>> Then, has a new scoping been introduced now?
>> Or some other special looking up rules for counted_by attribute?
>>
>>>
>>>>
>>>> More complicated expressions involving globals will not be supported?
>>>
>>> I think one could allow such expressions, But I think the
>>> expressions should be restricted to expressions which have
>>> no side effects.
>>
>> See my question in above, does this new syntax 2 introduce a new “structure
>> scope” to enable
>> the identifiers to be looked up inside the structure first as syntax 1? Or,
>> this new syntax has the
>> same lookup rule as the current C, will NOT look up inside the structure
>> first?
>
> It will NOT look into the structure, except for the forward
> declared identifier.
>
>
> Martin
>
>>
>>>
>>>>
>>>>> where then the second part can also be a more complicated
>>>>> expression, but with the explicit requirement that all
>>>>> identifiers in this expression are then looked up according to
>>>>> regular C language rules. So except for the forward declared
>>>>> member(s) they are *never* looked up in the member namespace of
>>>>> the struct, i.e. no new name lookup rules are introduced.
>>>>
>>>> One question here:
>>>>
>>>> What’s the major issue if we’d like to add one new scoping rule, for
>>>> example,
>>>> “Structure scope” (the same as the C++’s instance scope) to C?
>>>>
>>>> (In addition to the "VLA in structure" issue I mentioned in my previous
>>>> writeup,
>>>> is there any other issue to prevent this new scoping rule being added into
>>>> C ?).
>>>
>>> Note that the "VLA in structure" is a bit of a red herring. The exact same
>>> issues apply to lookup of any other ordinary identifiers in this context.
>>>
>>> enum { MY_BUF_SIZE = 100 };
>>> struct foo {
>>> char buf[MY_BUF_SIZE];
>>> };
>>>
>> Yes, this is because there is NO “structure scope” available in C. As long
>> as the “structure scope”
>> is added into C, identifiers could be looked up inside the “structure scope”
>> first before looking up
>> outer scopes.
>>
>>>
>>> C++ has instance scope for member functions. The rules for C++ are also
>>> complex and not very consistent (see the examples I posted earlier,
>>> demonstrating UB and compiler divergence).
>>
>> Yes, I studied those C++ examples when I wrote the proposal. And my
>> observation
>> was: in C++, the instance scope always has higher priority than local and
>> global scopes.
>> i.e, when there is a conflict between instance scope and local/global scope
>> for the identifier,
>> The identifier within the instance scope will shadow the one with the same
>> name in the
>> outer scope.
>>
>> But in C, there is No concept of “structure scope” at all. Identifiers will
>> NOT looked up
>> inside a structure at all.
>>
>>> For C such a concept would
>>> be new and much less useful, so the trade-off seems unfavorable (in
>>> constrast to C++ where it is needed).
>>
>> This concept is needed when referring a member variable inside the structure
>> is needed,
>> Such as the counted_by attribute, or later when we extend C language to
>> include the bound info
>> Into the TYPE.
>>
>> But I agree with you that introducing a new instance scope into C might be
>> too risky.
>>
>>
>>> I also see others issues: Fully
>>> supporting instance scope would require changes to how C is parsed,
>>> placing a burden on all C compilers and tooling. Depending on how you
>>> specify it, it would also cause a change in semantics
>>> for existing code, something C tries very hard to avoid.
>>
>> Yes, agreed.
>> Introducing a new instance scope in C might be too risky, therefore not
>> worth to
>> do it.
>>
>>
>>> If you add
>>> warnings as mitigation, it has the problem that it causes non-local
>>> effects where introducing a name in in enclosing scope somewhere else
>>> now necessitates a change to unrelated code, exactly what scoping rules
>>> are meant to prevent.
>>
>> Yes, that’s right.
>>>
>>> In any case, it seems a major change with many ramifications, including
>>> possibly unintended ones. This should certainly not be done without
>>> having a clear specification and support from WG14 (and probably not
>>> done at all.)
>>
>> Yes, I agree.
>>
>> Qing
>>>
>>> Martin
>>>
>>>>
>>>> Qing
>>>>
>>>>
>>>>>
>>>>>
>>>>> I think this could address my concerns about breaking
>>>>> scoping in C. Still, I personally would prefer designator syntax
>>>>> for both C and C++ as a nicer solution, and one that already
>>>>> has some support from WG14.
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> 1. The use of '__self' isn't feasible, so we won't use it. Instead,
>>>>>>> we'll rely upon the current behavior—resolving any identifiers to the
>>>>>>> "instance scope". This new scope is used __only__ in attributes, and
>>>>>>> resolves identifiers to those in the least enclosing, non-anonymous
>>>>>>> struct. For example:
>>>>>>>
>>>>>>> struct foo {
>>>>>>> char count;
>>>>>>> struct bar {
>>>>>>> struct {
>>>>>>> int len;
>>>>>>> };
>>>>>>> struct {
>>>>>>> struct {
>>>>>>> int *valid_use __counted_by(len); // Valid.
>>>>>>> };
>>>>>>> };
>>>>>>> int *invalid_use __counted_by(count); // Invalid.
>>>>>>> } b;
>>>>>>> };
>>>>>>>
>>>>>>> Rationale: This is how '__guarded_by' currently resolves identifiers,
>>>>>>> so there's precedence. And if we can't force its usage in all
>>>>>>> situations, it's less a feature and more a "nicety" which will lead to
>>>>>>> a massive discrepancy between compiler implementations. Despite the
>>>>>>> fact that this introduces a new scoping mechanism to C, its use is not
>>>>>>> as extensive as C++'s instance scoping and will apply only to
>>>>>>> attributes. In the case where we have two different resolution
>>>>>>> techniquest happening within the same structure (e.g. VLAs), we can
>>>>>>> issue warnings as outlined in Yeoul's RFC[1].
>>>>>>>
>>>>>>> 2. A method of forward declaring variables will be added for variables
>>>>>>> that occur in the struct after the attribute. For example:
>>>>>>>
>>>>>>> A: Necessary usage:
>>>>>>>
>>>>>>> struct foo {
>>>>>>> int *buf __counted_by(char count; count);
>>>>>>> char count;
>>>>>>> };
>>>>>>>
>>>>>>> B: Unnecessary, but still valid, usage:
>>>>>>>
>>>>>>> struct foo {
>>>>>>> char count;
>>>>>>> int *buf __counted_by(char count; count);
>>>>>>> };
>>>>>>>
>>>>>>> * The forward declaration is required in (A) but not in (B).
>>>>>>> * The type of 'count' as declared in '__counted_by' *must* match the
>>>>>>> real type.
>>>>>>>
>>>>>>> Rationale: This alleviates the issues of "double parsing" for
>>>>>>> compilers that aren't able to handle it. (We can also remove the
>>>>>>> '-fexperimental-late-parse-attributes' flag in Clang.)
>>>>>>>
>>>>>>> 3. A new builtin '__builtin_global_ref()' (or similarly named) is
>>>>>>> added to refer to variables outside of the most-enclosing structure.
>>>>>>> Example:
>>>>>>>
>>>>>>> int count_that_will_never_change_we_promise;
>>>>>>>
>>>>>>> struct foo {
>>>>>>> int *bar
>>>>>>> __counted_by(__builtin_global_ref(count_that_will_never_change_we_promise));
>>>>>>> unsigned flags;
>>>>>>> };
>>>>>>>
>>>>>>> As Yeoul pointed out, there isn't a way to refer to variables that
>>>>>>> have been shadowed, so the 'global' in '__builtin_global_ref' is a bit
>>>>>>> of a misnomer as it could refer to a local variable.
>>>>>>>
>>>>>>> Rationale: For those who need the flexibility to use variables outside
>>>>>>> of the struct, this is an acceptable escape route. It does make bounds
>>>>>>> checking less strict, though, as we can't track any modifications to
>>>>>>> the global, so caution must be used.
>>>>>>>
>>>>>>> Bonus suggestion (by yours truly):
>>>>>>>
>>>>>>> I'd like the option to allow functions to calculate expressions (it
>>>>>>> can be used for a single identifier too, but that's too heavy-handed).
>>>>>>> It won't be required for an expression, but is a good way to avoid any
>>>>>>> issues regarding '__builtin_global_ref', like variables shadowing the
>>>>>>> global variable. Example:
>>>>>>>
>>>>>>> int global;
>>>>>>>
>>>>>>> struct foo;
>>>>>>> static int counted_by_calc(struct foo *);
>>>>>>>
>>>>>>> struct foo {
>>>>>>> char count;
>>>>>>> int fnord;
>>>>>>> int *buf __counted_by(counted_by_calc);
>>>>>>> };
>>>>>>>
>>>>>>> static int counted_by_calc(struct foo *ptr) __attribute__((pure)) {
>>>>>>> return ptr->count * (global << 42) - ptr->fnord;
>>>>>>> }
>>>>>>>
>>>>>>> A pointer to the current least enclosing, non-anonymous struct is
>>>>>>> passed into 'counted_by_calc' by the compiler.
>>>>>>>
>>>>>>> Rationale: This gets rid of all ambiguities when calculating an
>>>>>>> expression. It's marked 'pure' so there should be no side-effects.
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> I believe these suggestions cover everything we've discussed. Please
>>>>>>> comment with anything I missed and your opinions on each.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510
>>>>>>>
>>>>>>> Share and enjoy!
>>>>>>> -bw
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Univ.-Prof. Dr. rer. nat. Martin Uecker
>>> Graz University of Technology
>>> Institute of Biomedical Imaging