Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Qing Zhao Thu, 27 Mar 2025 09:17:53 -0700

Yeoul,

Thanks for the writeup.


So, basically, This writeup insisted on introducing a new “structure scope” 
(similar as the instance scope in C++) into C language ONLY for counted_by 
attribute:

1. Inside counted_by attribute, the name lookup starts:

    A. Inside the current structure first (the NEW structure scope added to C);
    B. Then outside the structure; (other current C scopes, local scope or 
global scope)

2. When trying to reference a variable outside of the structure scope that 
name_conflicts with
    a structure member, a new builtin function “__builtin_global_ref” is 
introduced for such 
    purpose. 

   ( I think that __builtin_global_ref might not accurate, because the outer 
scope might be either global scope or local scope)

3. Where there is confliction between counted_by and VLA such as:

constexpr int len = 10;

struct s {
  int len;
  int *__counted_by(len) buf; // refers to struct member `len`.
  int arr[len]; // refers to global constexpr `len`
};

Issue compiler warning to user to ask the user to use __builtin_global_ref to 
distinguish. 

Are the above the correct understanding of your writeup?


From my understanding:

1. This design started from the C++’s point of view by adding a new “structure 
scope” to C;
2. This design conflicts with the current VLA default scope rule (which based 
on the default C scopes) in C.
     In the above example that mixes counted_by and VLA, it’s so weird that  
there are two difference name
     lookup rules inside the same structure. 
     It’s clearly a design bug. Either VLA or counted_by need to be fixed to 
make them consistent. 


I personally do not completely object to introduce a new “structure scope” into 
C, but it’s so hard for me to accept
that there are two different name lookup rules inside the same structure: one 
rule for VLA, another rule for counted_by
attribute.  (If introducing a new “structure scope” to C,  I think it’s better 
to change VLA to “structure scope” too, not sure
whether this is feasible or not)

I still think that introduce a new keyword “__self” for referring member 
variable inside structure without adding 
a new “structure scope" should be the best approach to resolve this issue in C. 

However, I am really hoping that the discussion can be converged soon. So, I am 
okay with adding a new “structure scope”
If most of people agreed on that approach. 

Qing


> On Mar 26, 2025, at 12:59, Yeoul Na <yeoul...@apple.com> wrote:
> 
> Hi all,
> 
> Thanks for all the discussions.
> 
> I posted the design rationale for our current approach in 
> https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510.
>  This clarifies some of the questions that are asked in this thread. The 
> document also proposes diagnostics to mitigate potential ambiguity, and 
> propose new builtins that can be used as a suppression and disambiguation 
> mechanism.
> 
> Best regards,
> Yeoul
> 
>> On Mar 26, 2025, at 9:11 AM, Yeoul Na <yeoul...@apple.com> wrote:
>> 
>> Sorry for the delay.
>> 
>> I’m planning on sending out our design rationale of the current approach 
>> without the new syntax today.
>> 
>> - Yeoul
>> 
>>> On Mar 14, 2025, at 9:22 PM, John McCall <rjmcc...@apple.com> wrote:
>>> 
>>> On 14 Mar 2025, at 15:18, Martin Uecker wrote:
>>> Am Freitag, dem 14.03.2025 um 14:42 -0400 schrieb John McCall:
>>> On 14 Mar 2025, at 14:13, Martin Uecker wrote:
>>> Am Freitag, dem 14.03.2025 um 10:11 -0700 schrieb David Tarditi:
>>> Hi Martin,
>>> The C design of VLAs misunderstood dependent typing.
>>> They probably did not care about theory, but the design is 
>>> not inconsistent with theory.
>>> This is almost true, but for bad reasons. The theory of dependent types is 
>>> heavily concerned with deciding whether two types are the same, and C 
>>> simply sidesteps this question because type identity is largely meaningless 
>>> in C. Every value of variably-modified type is (or decays to) a pointer, 
>>> and all pointers in C freely convert to one another (within the 
>>> object/function categories). _Generic is based on type compatibility, not 
>>> equality. So in that sense, the standard doesn’t say anything inconsistent 
>>> with theory because it doesn’t even try to say anything.
>>> The reason it is not quite true is that C does have rules for compatible 
>>> and composite types, and alas, those rules for variably-modified types are 
>>> not consistent with theory. Two VLA types of compatible element type are 
>>> always statically considered compatible, and it’s simply UB if the sizes 
>>> aren’t the same. The composite type of a VLA and a fixed-size array type is 
>>> always the fixed-size array type. The standard is literally incomplete 
>>> about the composite type of two VLAs; if you use a ternary operator where 
>>> both operands are casts to VLA types, the standard just says it’s 
>>> straight-up just undefined behavior (because one of the types has a bound 
>>> that’s unevaluated) and doesn’t even bother telling us what the static type 
>>> is supposed to be.
>>> Yes, I guess this is all true.
>>> But let's rephrase my point a bit more precisely: One could take 
>>> a strict subset of C that includes variably modified types but 
>>> obviously has to forbid a lot other things (e.g. arbitrary pointer 
>>> conversions or unsafe down-casts and much more) and make this a 
>>> memory-safe language with dependent types. This would also 
>>> require adding run-time checks at certain places where there 
>>> is now UB, in particular where two VLA types need to be compatible.
>>> Mmm. You can certainly subset C to the point that it’s memory-safe, but
>>> it wouldn’t really be anything like C anymore. As long as C has a heap,
>>> I don’t see any path to achieving temporal safety without significant
>>> extensions to the language. But if we’re just talking about spatial safety,
>>> then sure, that could be a lot closer to C today.
>>> Is that your vision, then, that you’d like to see the same sort of checks
>>> that -fbounds-safety does, but you want them based firmly in the language
>>> as a dynamic check triggered by pointer type conversion, with bounds
>>> specified using variably-modified types? It’s a pretty elegant vision, and
>>> I can see the attraction. It has some real merits, which I’ll get to below.
>>> I do see at least two significant challenges, though.
>>> The first and biggest problem is that, in general, array bounds can only be
>>> expressed on a pointer value if it’s got pointer to array type. Most C array
>>> code today works primarily with pointers to elements; programmers just use
>>> array types to create concrete arrays, and they very rarely use pointers to
>>> array type at all. There are a bunch of reasons for that:
>>>     • Pointers to arrays have to be dereferenced twice: (*ptr)[idx] instead
>>> of ptr[idx].
>>>     • That makes them more error-prone, because it is easy to do pointer
>>> arithmetic at the wrong level, e.g. by writing ptr[idx], which will
>>> stride by multiples of the entire array size. That may even pass the
>>> compiler without complaint because of C’s laxness about conversions.
>>>     • Keeping the bound around in the pointer type is more work and doesn’t 
>>> do
>>> anything useful right now.
>>>     • A lot of C programmers dislike nested declarator syntax and can’t 
>>> remember
>>> how it works. Those of us who can write it off the top of our heads are
>>> quite atypical.
>>> Now, there is an exception: you can write a parameter using an array type,
>>> and it actually declares a pointer parameter. You could imagine using this
>>> as a syntax for an enforceable array bound for arguments, although the
>>> committee did already decide that these bounds were meaningless without
>>> static. Unfortunately, you can’t do this in any other position and still
>>> end up with just a pointer, so it’s not helpful as a general syntax for
>>> associating bounds with pointers.
>>> The upshot is that this isn’t really something people can just adopt by
>>> adding annotations. It’s not just a significant rewrite, it’s a rewrite that
>>> programmers will have very legitimate objections to. I think that makes this
>>> at best a complement to the “sidecar” approach taken by -fbounds-safety
>>> where we can track top-level bounds to a specific pointer value.
>>> The second problem is that there are some extralingual problems that
>>> -fbounds-safety has to solve around bounds that aren’t just local
>>> evaluations of bounds expressions, and a type-conversion-driven approach
>>> doesn’t help with any of them.
>>> As you mentioned, the design of variably-modified types is based on
>>> evaluating the bounds expression at some specific point in the program
>>> execution. Since these types can only be written locally, the evaluation
>>> point is obvious. If we wanted to dynamically enforce bounds during
>>> initialization, it would simply be another use of the same computed bound:
>>> int count = ...;
>>> int (*ptr)[count * 10] = source_ptr;
>>> 
>>> Here we would evaluate count * 10 exactly once and use it both as (1) part
>>> of the destination type when initializing ptr with source_ptr and (2)
>>> part of the type of ptr for all uses of it. For example, if source_ptr
>>> were of type int (*)[100], we would dynamically check that
>>> count * 10 <= 100. This all works perfectly with an arbitrary bounds
>>> expression; it could even contain an opaque function call.
>>> Note that we don’t need any special behavior specifically for
>>> initialization. If we later assign a new value into ptr, we will still be
>>> converting the new value to the type int (*)[< count * 10 >], using the
>>> value computed at the time of declaration of the variable. This model would
>>> simply require that conversion to validate the bounds during assignment just
>>> as it would during initialization.
>>> Now, with nested arrays, variance does become a problem. Let’s reduce
>>> bounds expression to their evaluated bounds to make this easier to write.
>>>     • int (*)[11] can be converted to int(*)[10] because we’re simply
>>> allowing fewer elements to be used.
>>>     • By the same token, int (*(*)[11])[5] can be converted to
>>> int (*(*)[10])[5]. This is the same logic as the above, just with an
>>> element type that happens to be a pointer to array type.
>>>     • But int (*(*)[11])[5] cannot be safely converted to int (*(*)[11])[4],
>>> because while it’s safe to read an int (*)[4] from this array, it’s
>>> not safe to assign one into it.
>>>     • int (* const (*)[11])[5] can be safely converted to
>>> int (* const (*)[11])[4], but only if this dialect also enforces const-
>>> correctness, at least on array pointers.
>>> Anyway, a lot of this changes if we want to use the same concept for
>>> non-local pointers to arrays, because we no longer have an obvious point of
>>> execution at which to evaluate the bounds expression. Instead, we are forced
>>> into re-evaluating it every time we access the variable holding the array.
>>> Consider:
>>> struct X {
>>> int count;
>>> int (*ptr)[count * 10]; // using my preferred syntax
>>> };
>>> 
>>> void test(struct X *xp) {
>>> // For the purposes of the conversion check here, the
>>> // source type is int (*)[< xp->count * 10 >], freshly
>>> // evaluated as part of the member access.
>>> int (*local)[100] = xp->ptr;
>>> }
>>> 
>>> This has several immediate consequences.
>>> Firstly, we need to already be able to compute the correct bound when we do
>>> the dynamic checks for assignments into this field. For local variably-
>>> modified types, everything in the expression was already in scope and
>>> presumably initialized, so this wasn’t a problem. Here, we’re not helped
>>> by scope, and we are dependent on the count field already having been
>>> initialized.
>>> Secondly, we must be very concerned about anything that could change the
>>> result of this evaluation. So we cannot allow an arbitrary expression;
>>> it must be something that we can fully analyze for what could change it.
>>> And if refers to variables or fields (which it presumably always will), we
>>> must prevent assignments to those, or at least validate that any
>>> assignments aren’t causing unsound changes to the bound expression.
>>> Thirdly, that concern must apply non-locally: if we allow the address of the
>>> pointer field to be taken (which is totally fine in the local case!),
>>> we can no directly reason about mutations through that pointer, so we
>>> have to prevent changes to the bounds variables/fields while the pointer is
>>> outstanding.
>>> And finally, we must be able to recognize combinations of assignments,
>>> because when we’re initializing (or completely rewriting) this structure,
>>> we will need to able to assign to both count and ptr and not have the
>>> same restrictions in place that we would for separate assignments.
>>> None of this falls out naturally from separate, local language rules; it
>>> all has to be invented for the purpose of serving this dynamic check. And
>>> in fact, -fbounds-safety has to do all of this already just to make
>>> basic checks involving pointers in structs work.
>>> If that can all be established, though, I think the type-conversion-based
>>> approach using variably-modified types has some very nice properties as a
>>> complement to what we’re doing in -fbounds-safety.
>>> For one, it interacts with the -fbounds-safety analysis very cleanly. If
>>> bounds in types are dynamically enforced (which is not true in normal C,
>>> but could be in this dialect), then the type becomes a source for reliable
>>> reliable information for the bounds-safety analysis. Conversely, if
>>> a pointer is converted to a variably-modified type, the analysis done
>>> by -bounds-safety could be used as an input to the conversion check.
>>> For another, I think it may lead towards an cleaner story for arrays of
>>> pointers to arrays than -fbounds-safety can achieve today, as long as
>>> the inner arrays are of uniform length.
>>> But ultimately, I think it’s still at best a complement to the attributes
>>> we need for -fbounds-safety.
>>> John.
>> 
>

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Reply via email to