Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

John McCall Tue, 11 Mar 2025 15:00:00 -0700

On 7 Mar 2025, at 19:12, Yeoul Na wrote:

> Hi Kees,
>
>> On Mar 7, 2025, at 1:38 PM, Kees Cook <k...@kernel.org> wrote:
>>
>> On Thu, Mar 06, 2025 at 04:27:49PM -0800, Yeoul Na wrote:
>>> Thanks for writing up the RFC and keeping us in the loop. Are
>>> you planning to add “__self.” to GCC's C++ compiler as well in
>>
>> Isn't this strictly a C feature? [1]
>
> No, while -fbounds-safety itself is currently implemented for C only, the 
> bounds annotations will be used in C++, in order to secure boundaries between 
> C and C++. This means C++ should be able to parse the bounds annotations and 
> understand it. We already use it internally for that purpose and Devin 
> recently presented the vision as part his presentation in LLVM Memory Safety 
> Working Group. Therefore, compatibility with C++ is our hard requirement and 
> having to always write “__self.” would be a problem. To be clear I don’t 
> oppose to introducing “__self" and using it as a suppression mechanism for 
> warnings on name conflicts, but I opposed to requiring “__self” always.

Right. Idiomatic C++ APIs are less likely to use separate pointer/length
parameters, especially now that we have std::span, but C++ code still
often has to call C APIs that are written this way, and it’s not like
that C++ code is any more trustworthy than corresponding C code would
be. Using C++ should not be a way to bypass safety checks.

Additionally, std::spans do come from somewhere, and we do still want to
be able to check that they’re constructed safely in C++. For example,
std::span has both a pointer/length constructor and a start/end constructor.
Now, those specifically could maybe just be special-cased by the compiler,
but since std::span is a relatively recent feature (C++20), many projects
have their own equivalents (such as LLVM’s ArrayRef) that would certainly
need an annotation.

>>> the future? The problem we have with “__self” being a default
>>> way of annotating bounds is that C++ compatibility because bounds
>>> annotations are supposed to work in headers shared between C and C++
>>> and C++ should be able to parse it to secure the boundary between the
>>
>> But this is just a header macro definition issue, not a language issue.
>> There's plenty of C-only stuff in headers, but it's trivial to avoid
>> parsing it in C++:
>>
>> #ifndef __cplusplus
>> # define __counted_by(x)     __attribute__((__counted_by__(x)))
>> #else
>> # define __counted_by(x)
>> #endif
>>
>> Linux is actually already doing a form of this so we can use counted_by
>> in UAPI header files.
>>
>>> two languages. Another problem is the usability. The user will have to
>>> write more code “__self.” all the time in the most common use cases,
>>> which would be a huge regression for the usability of the language.
>>
>> I think it's an overstatement to consider it a "huge regression" when
>> this is a new feature. Also, again, this is trivially solved with
>> macros. I'm fully expecting to make the "__self" transition in Linux by
>> just adding a new macro for the expression-capable counted_by and
>> adjusting the existing macro:
>>
>> #define __counted_by(x)      __attribute__((__counted_by__(__self.(x))))
>> #define __counted_by_expr(x) __attribute__((__counted_by__(x)))
>>
>> This will immediately work for all users of the existing feature in
>> Linux.
>
> IMHO, once we require “__self”, hiding it under a macro like this will create 
> an extra confusion to the users because that will still teach the user the 
> version of the programming model that did not self. Maybe more confusing if 
> some macros are defined to hide it and some macros are not. This sounds like 
> we introduce “__self" only to hide it anyway. I think we need diagnostics and 
> provide suppression mechanisms (e.g., __self., __builtin_global_ref()), but 
> using them only when it is necessary.

Right. I personally am not at all worried that users will be confused
about what an unqualified field name means in this position, but to the
extent that we take that concern seriously, it’s about the readability
of the raw tokens in the source file, not about what the parser sees.

>>>> On Mar 6, 2025, at 2:03 PM, Yeoul Na <yeoul...@apple.com> wrote:
>>>>
>>>> + John & Félix & Patryk & Henrik
>>>>
>>>>> On Mar 6, 2025, at 1:44 PM, Qing Zhao <qing.z...@oracle.com> wrote:
>>>>> [...]
>>>>> This code is bad. The size expression "n+10" of the VLA "a" follows the 
>>>>> default
>>>>> scoping rule of C, as a result, "n" refers to the local variable "n" that 
>>>>> is defined
>>>>> outside of the structure "foo"; However, the argument "n" of the 
>>>>> counted_by
>>>>> attribute of the flexible array member b[] follows the new scoping rule, 
>>>>> it refers
>>>>> to the member variable "n" inside this structure.
>>>>>
>>>>> It's clear that the current design of the counted_by argument introduced 
>>>>> a new
>>>>> scoping rule into C, resulting an inconsistent scoping resolution 
>>>>> situation in
>>>>> C language.
>>>>>
>>>>> This is a design mistake, and should be fixed.
>>>
>>> We will have a different proposal based on reporting diagnostics on the
>>> name conflicts. We need to diagnose the name conflicts like above anyway
>>> because in code like that almost always the struct contains a buffer
>>> and its size as the fields. Given that the program’s intention would
>>> be more likely to pick up the member `n`, instead of some random global
>>> happened to be with the same name in the same translation unit. Therefore,
>>> we should diagnose such cases to avoid mistakes and avoid the program
>>> silently working with an unintended way with the user mistake.
>>
>> I'm all for better diagnostics, but since C doesn't have a way specify
>> scope for a named variable, I don't see how such a diagnostic would
>> be actionable.
>
> Right, when we provide diagnostics, there should be suppression mechanisms 
> (like __self, __builtin_global_ref), but that doesn’t have to be the default. 
> Another action could be to change the names. If a global variable has a name 
> like `n` conflicting with a struct field name `n`, then it’s not a good 
> variable naming practice anyway.

If we decide to diagnose conflicts as ambiguous, then yeah, that should be
totally actionable as long as we have some way to explicitly write a member
access. C programmers today don’t have a way to bypass shadowing; if they
want to use a global declaration, they have to either avoid shadowing it or
wrap it in their own declaration. It’s not a real problem, though, because
they can just rename local variables or add new functions and constants to
make it go away. The only hard restriction is that you can’t easily rename
fields, so if the programmer has a conflict and wants to name the field,
they’re stuck unless they have that member access syntax.

That said, my preference is still to just give preference to the field name,
which sidesteps any need for disambiguation syntax and avoids this whole
problem where structs can be broken by just adding a global variable that
happens to collide with a field.

John.
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Reply via email to