Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Yeoul Na Wed, 26 Mar 2025 10:16:57 -0700

Hi all,

Thanks for all the discussions.


I posted the design rationale for our current approach in 
https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510.
 This clarifies some of the questions that are asked in this thread. The 
document also proposes diagnostics to mitigate potential ambiguity, and propose 
new builtins that can be used as a suppression and disambiguation mechanism.

Best regards,
Yeoul

> On Mar 26, 2025, at 9:11 AM, Yeoul Na <[email protected]> wrote:
> 
> Sorry for the delay.
> 
> I’m planning on sending out our design rationale of the current approach 
> without the new syntax today.
> 
> - Yeoul
> 
>> On Mar 14, 2025, at 9:22 PM, John McCall <[email protected]> wrote:
>> 
>> On 14 Mar 2025, at 15:18, Martin Uecker wrote:
>> 
>> Am Freitag, dem 14.03.2025 um 14:42 -0400 schrieb John McCall:
>> 
>> On 14 Mar 2025, at 14:13, Martin Uecker wrote:
>> 
>> Am Freitag, dem 14.03.2025 um 10:11 -0700 schrieb David Tarditi:
>> 
>> Hi Martin,
>> 
>> The C design of VLAs misunderstood dependent typing.
>> 
>> They probably did not care about theory, but the design is 
>> not inconsistent with theory.
>> 
>> This is almost true, but for bad reasons. The theory of dependent types is 
>> heavily concerned with deciding whether two types are the same, and C simply 
>> sidesteps this question because type identity is largely meaningless in C. 
>> Every value of variably-modified type is (or decays to) a pointer, and all 
>> pointers in C freely convert to one another (within the object/function 
>> categories). _Generic is based on type compatibility, not equality. So in 
>> that sense, the standard doesn’t say anything inconsistent with theory 
>> because it doesn’t even try to say anything.
>> 
>> The reason it is not quite true is that C does have rules for compatible and 
>> composite types, and alas, those rules for variably-modified types are not 
>> consistent with theory. Two VLA types of compatible element type are always 
>> statically considered compatible, and it’s simply UB if the sizes aren’t the 
>> same. The composite type of a VLA and a fixed-size array type is always the 
>> fixed-size array type. The standard is literally incomplete about the 
>> composite type of two VLAs; if you use a ternary operator where both 
>> operands are casts to VLA types, the standard just says it’s straight-up 
>> just undefined behavior (because one of the types has a bound that’s 
>> unevaluated) and doesn’t even bother telling us what the static type is 
>> supposed to be.
>> 
>> Yes, I guess this is all true.
>> 
>> But let's rephrase my point a bit more precisely: One could take 
>> a strict subset of C that includes variably modified types but 
>> obviously has to forbid a lot other things (e.g. arbitrary pointer 
>> conversions or unsafe down-casts and much more) and make this a 
>> memory-safe language with dependent types. This would also 
>> require adding run-time checks at certain places where there 
>> is now UB, in particular where two VLA types need to be compatible.
>> 
>> Mmm. You can certainly subset C to the point that it’s memory-safe, but
>> it wouldn’t really be anything like C anymore. As long as C has a heap,
>> I don’t see any path to achieving temporal safety without significant
>> extensions to the language. But if we’re just talking about spatial safety,
>> then sure, that could be a lot closer to C today.
>> 
>> Is that your vision, then, that you’d like to see the same sort of checks
>> that -fbounds-safety does, but you want them based firmly in the language
>> as a dynamic check triggered by pointer type conversion, with bounds
>> specified using variably-modified types? It’s a pretty elegant vision, and
>> I can see the attraction. It has some real merits, which I’ll get to below.
>> I do see at least two significant challenges, though.
>> 
>> The first and biggest problem is that, in general, array bounds can only be
>> expressed on a pointer value if it’s got pointer to array type. Most C array
>> code today works primarily with pointers to elements; programmers just use
>> array types to create concrete arrays, and they very rarely use pointers to
>> array type at all. There are a bunch of reasons for that:
>> 
>> Pointers to arrays have to be dereferenced twice: (*ptr)[idx] instead
>> of ptr[idx].
>> 
>> That makes them more error-prone, because it is easy to do pointer
>> arithmetic at the wrong level, e.g. by writing ptr[idx], which will
>> stride by multiples of the entire array size. That may even pass the
>> compiler without complaint because of C’s laxness about conversions.
>> 
>> Keeping the bound around in the pointer type is more work and doesn’t do
>> anything useful right now.
>> 
>> A lot of C programmers dislike nested declarator syntax and can’t remember
>> how it works. Those of us who can write it off the top of our heads are
>> quite atypical.
>> 
>> Now, there is an exception: you can write a parameter using an array type,
>> and it actually declares a pointer parameter. You could imagine using this
>> as a syntax for an enforceable array bound for arguments, although the
>> committee did already decide that these bounds were meaningless without
>> static. Unfortunately, you can’t do this in any other position and still
>> end up with just a pointer, so it’s not helpful as a general syntax for
>> associating bounds with pointers.
>> 
>> The upshot is that this isn’t really something people can just adopt by
>> adding annotations. It’s not just a significant rewrite, it’s a rewrite that
>> programmers will have very legitimate objections to. I think that makes this
>> at best a complement to the “sidecar” approach taken by -fbounds-safety
>> where we can track top-level bounds to a specific pointer value.
>> 
>> The second problem is that there are some extralingual problems that
>> -fbounds-safety has to solve around bounds that aren’t just local
>> evaluations of bounds expressions, and a type-conversion-driven approach
>> doesn’t help with any of them.
>> 
>> As you mentioned, the design of variably-modified types is based on
>> evaluating the bounds expression at some specific point in the program
>> execution. Since these types can only be written locally, the evaluation
>> point is obvious. If we wanted to dynamically enforce bounds during
>> initialization, it would simply be another use of the same computed bound:
>> 
>> int count = ...;
>> int (*ptr)[count * 10] = source_ptr;
>> Here we would evaluate count * 10 exactly once and use it both as (1) part
>> of the destination type when initializing ptr with source_ptr and (2)
>> part of the type of ptr for all uses of it. For example, if source_ptr
>> were of type int (*)[100], we would dynamically check that
>> count * 10 <= 100. This all works perfectly with an arbitrary bounds
>> expression; it could even contain an opaque function call.
>> 
>> Note that we don’t need any special behavior specifically for
>> initialization. If we later assign a new value into ptr, we will still be
>> converting the new value to the type int (*)[< count * 10 >], using the
>> value computed at the time of declaration of the variable. This model would
>> simply require that conversion to validate the bounds during assignment just
>> as it would during initialization.
>> 
>> Now, with nested arrays, variance does become a problem. Let’s reduce
>> bounds expression to their evaluated bounds to make this easier to write.
>> 
>> int (*)[11] can be converted to int(*)[10] because we’re simply
>> allowing fewer elements to be used.
>> By the same token, int (*(*)[11])[5] can be converted to
>> int (*(*)[10])[5]. This is the same logic as the above, just with an
>> element type that happens to be a pointer to array type.
>> But int (*(*)[11])[5] cannot be safely converted to int (*(*)[11])[4],
>> because while it’s safe to read an int (*)[4] from this array, it’s
>> not safe to assign one into it.
>> int (* const (*)[11])[5] can be safely converted to
>> int (* const (*)[11])[4], but only if this dialect also enforces const-
>> correctness, at least on array pointers.
>> Anyway, a lot of this changes if we want to use the same concept for
>> non-local pointers to arrays, because we no longer have an obvious point of
>> execution at which to evaluate the bounds expression. Instead, we are forced
>> into re-evaluating it every time we access the variable holding the array.
>> Consider:
>> 
>> struct X {
>>   int count;
>>   int (*ptr)[count * 10]; // using my preferred syntax
>> };
>> 
>> void test(struct X *xp) {
>>   // For the purposes of the conversion check here, the
>>   // source type is int (*)[< xp->count * 10 >], freshly
>>   // evaluated as part of the member access.
>>   int (*local)[100] = xp->ptr;
>> }
>> This has several immediate consequences.
>> 
>> Firstly, we need to already be able to compute the correct bound when we do
>> the dynamic checks for assignments into this field. For local variably-
>> modified types, everything in the expression was already in scope and
>> presumably initialized, so this wasn’t a problem. Here, we’re not helped
>> by scope, and we are dependent on the count field already having been
>> initialized.
>> 
>> Secondly, we must be very concerned about anything that could change the
>> result of this evaluation. So we cannot allow an arbitrary expression;
>> it must be something that we can fully analyze for what could change it.
>> And if refers to variables or fields (which it presumably always will), we
>> must prevent assignments to those, or at least validate that any
>> assignments aren’t causing unsound changes to the bound expression.
>> 
>> Thirdly, that concern must apply non-locally: if we allow the address of the
>> pointer field to be taken (which is totally fine in the local case!),
>> we can no directly reason about mutations through that pointer, so we
>> have to prevent changes to the bounds variables/fields while the pointer is
>> outstanding.
>> 
>> And finally, we must be able to recognize combinations of assignments,
>> because when we’re initializing (or completely rewriting) this structure,
>> we will need to able to assign to both count and ptr and not have the
>> same restrictions in place that we would for separate assignments.
>> 
>> None of this falls out naturally from separate, local language rules; it
>> all has to be invented for the purpose of serving this dynamic check. And
>> in fact, -fbounds-safety has to do all of this already just to make
>> basic checks involving pointers in structs work.
>> 
>> If that can all be established, though, I think the type-conversion-based
>> approach using variably-modified types has some very nice properties as a
>> complement to what we’re doing in -fbounds-safety.
>> 
>> For one, it interacts with the -fbounds-safety analysis very cleanly. If
>> bounds in types are dynamically enforced (which is not true in normal C,
>> but could be in this dialect), then the type becomes a source for reliable
>> reliable information for the bounds-safety analysis. Conversely, if
>> a pointer is converted to a variably-modified type, the analysis done
>> by -bounds-safety could be used as an input to the conversion check.
>> 
>> For another, I think it may lead towards an cleaner story for arrays of
>> pointers to arrays than -fbounds-safety can achieve today, as long as
>> the inner arrays are of uniform length.
>> 
>> But ultimately, I think it’s still at best a complement to the attributes
>> we need for -fbounds-safety.
>> 
>> John.
>> 
>

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Reply via email to