> On Mar 28, 2025, at 5:51 AM, Yeoul Na <yeoul...@apple.com> wrote: > > > >> On Mar 27, 2025, at 9:17 AM, Qing Zhao <qing.z...@oracle.com> wrote: >> >> Yeoul, >> >> Thanks for the writeup. >> >> So, basically, This writeup insisted on introducing a new “structure scope” >> (similar as the instance scope in C++) into C language ONLY for counted_by >> attribute: >> >> 1. Inside counted_by attribute, the name lookup starts: >> >> A. Inside the current structure first (the NEW structure scope added to >> C); >> B. Then outside the structure; (other current C scopes, local scope or >> global scope) >> >> 2. When trying to reference a variable outside of the structure scope that >> name_conflicts with >> a structure member, a new builtin function “__builtin_global_ref” is >> introduced for such >> purpose. >> >> ( I think that __builtin_global_ref might not accurate, because the outer >> scope might be either global scope or local scope) > > Clarification: __builtin_global_ref will see the global scope directly. This > is similar to global scope resolution syntax (‘::’) in C++. > > constexpr int len = 10; > > void foo (void) > { > const int len = 20; > > struct s { > int len; > int *__counted_by(__builtin_global_ref(len)) buf; // refers to global > ‘len' > }; > } > > Here are some reasons why we chose to provide a global scope resolution > builtin, not a builtin to see an outer scope or just a local scope: > > 1) The builtin is a substitute for some “scope resolution specifier”. Scope > specifiers typically meant to choose a “specific" scope. > 2) To the best of my knowledge there is no precedence in any other C family > language to provide a scope resolution for local scopes. > 3) Name conflicts with local variables can be easily renamed. > 4) If we provide a builtin that selects outer scope instead, there is no way > to choose a global ‘len' if it’s shadowed by a local variable, so then the > member name has to be renamed anyway in order to choose a global `len`. > 5) This way, code can be written compatibly both in C and C++. > >> >> 3. Where there is confliction between counted_by and VLA such as: >> >> constexpr int len = 10; >> >> struct s { >> int len; >> int *__counted_by(len) buf; // refers to struct member `len`. >> int arr[len]; // refers to global constexpr `len` >> }; >> >> Issue compiler warning to user to ask the user to use __builtin_global_ref >> to distinguish. > > Additionally, our proposal suggests __builtin_member_ref to explicitly use a > member in a similar situation. > The builtin could be replaced by ‘__self' or some other syntax once the > standard committee decides in the future, but earlier in the thread JeanHeyd > pointed out that: > > "I would like to gently push back about __self__, or __self, or self, > because all of these identifiers are fairly common identifiers in code. When > I writing the paper for __self_func ( > https://thephd.dev/_vendor/future_cxx/papers/C%20-%20__self_func.html ), I > searched GitHub and other source code indexing and repository services: > __self, __self__, and self has a substantial amount of uses. If there's an > alternative spelling to consider, I think that would be helpful."
Additionally, the above being said, once we agreed on what is the right syntax to use to access a member, our proposal doesn’t object to introducing it and using it optionally. > > Thus, I think instead of trying to stick to a certain syntax right now, using > some builtin will allow us to easily migrate to a new syntax by guarding the > current usage under a macro. > > Writing the builtin could be cumbersome but this shall be written only when > there is an ambiguity. Btw, I’m open to any other name suggestions for the > builtins! > >> >> Are the above the correct understanding of your writeup? > > Yes, it’s mostly correct, except some clarifications I made above. Thank you! > >> >> >> From my understanding: >> >> 1. This design started from the C++’s point of view by adding a new >> “structure scope” to C; >> 2. This design conflicts with the current VLA default scope rule (which >> based on the default C scopes) in C. >> In the above example that mixes counted_by and VLA, it’s so weird that >> there are two difference name >> lookup rules inside the same structure. >> It’s clearly a design bug. Either VLA or counted_by need to be fixed to >> make them consistent. >> >> >> I personally do not completely object to introduce a new “structure scope” >> into C, but it’s so hard for me to accept >> that there are two different name lookup rules inside the same structure: >> one rule for VLA, another rule for counted_by >> attribute. (If introducing a new “structure scope” to C, I think it’s >> better to change VLA to “structure scope” too, not sure >> whether this is feasible or not) >> >> I still think that introduce a new keyword “__self” for referring member >> variable inside structure without adding >> a new “structure scope" should be the best approach to resolve this issue in >> C. >> >> However, I am really hoping that the discussion can be converged soon. So, I >> am okay with adding a new “structure scope” >> If most of people agreed on that approach. > > Thanks for the flexibility! > >> >> Qing >> >> >>> On Mar 26, 2025, at 12:59, Yeoul Na <yeoul...@apple.com> wrote: >>> >>> Hi all, >>> >>> Thanks for all the discussions. >>> >>> I posted the design rationale for our current approach in >>> https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510. >>> This clarifies some of the questions that are asked in this thread. The >>> document also proposes diagnostics to mitigate potential ambiguity, and >>> propose new builtins that can be used as a suppression and disambiguation >>> mechanism. >>> >>> Best regards, >>> Yeoul >>> >>>> On Mar 26, 2025, at 9:11 AM, Yeoul Na <yeoul...@apple.com> wrote: >>>> >>>> Sorry for the delay. >>>> >>>> I’m planning on sending out our design rationale of the current approach >>>> without the new syntax today. >>>> >>>> - Yeoul >>>> >>>>> On Mar 14, 2025, at 9:22 PM, John McCall <rjmcc...@apple.com> wrote: >>>>> >>>>> On 14 Mar 2025, at 15:18, Martin Uecker wrote: >>>>> Am Freitag, dem 14.03.2025 um 14:42 -0400 schrieb John McCall: >>>>> On 14 Mar 2025, at 14:13, Martin Uecker wrote: >>>>> Am Freitag, dem 14.03.2025 um 10:11 -0700 schrieb David Tarditi: >>>>> Hi Martin, >>>>> The C design of VLAs misunderstood dependent typing. >>>>> They probably did not care about theory, but the design is >>>>> not inconsistent with theory. >>>>> This is almost true, but for bad reasons. The theory of dependent types >>>>> is heavily concerned with deciding whether two types are the same, and C >>>>> simply sidesteps this question because type identity is largely >>>>> meaningless in C. Every value of variably-modified type is (or decays to) >>>>> a pointer, and all pointers in C freely convert to one another (within >>>>> the object/function categories). _Generic is based on type compatibility, >>>>> not equality. So in that sense, the standard doesn’t say anything >>>>> inconsistent with theory because it doesn’t even try to say anything. >>>>> The reason it is not quite true is that C does have rules for compatible >>>>> and composite types, and alas, those rules for variably-modified types >>>>> are not consistent with theory. Two VLA types of compatible element type >>>>> are always statically considered compatible, and it’s simply UB if the >>>>> sizes aren’t the same. The composite type of a VLA and a fixed-size array >>>>> type is always the fixed-size array type. The standard is literally >>>>> incomplete about the composite type of two VLAs; if you use a ternary >>>>> operator where both operands are casts to VLA types, the standard just >>>>> says it’s straight-up just undefined behavior (because one of the types >>>>> has a bound that’s unevaluated) and doesn’t even bother telling us what >>>>> the static type is supposed to be. >>>>> Yes, I guess this is all true. >>>>> But let's rephrase my point a bit more precisely: One could take >>>>> a strict subset of C that includes variably modified types but >>>>> obviously has to forbid a lot other things (e.g. arbitrary pointer >>>>> conversions or unsafe down-casts and much more) and make this a >>>>> memory-safe language with dependent types. This would also >>>>> require adding run-time checks at certain places where there >>>>> is now UB, in particular where two VLA types need to be compatible. >>>>> Mmm. You can certainly subset C to the point that it’s memory-safe, but >>>>> it wouldn’t really be anything like C anymore. As long as C has a heap, >>>>> I don’t see any path to achieving temporal safety without significant >>>>> extensions to the language. But if we’re just talking about spatial >>>>> safety, >>>>> then sure, that could be a lot closer to C today. >>>>> Is that your vision, then, that you’d like to see the same sort of checks >>>>> that -fbounds-safety does, but you want them based firmly in the language >>>>> as a dynamic check triggered by pointer type conversion, with bounds >>>>> specified using variably-modified types? It’s a pretty elegant vision, and >>>>> I can see the attraction. It has some real merits, which I’ll get to >>>>> below. >>>>> I do see at least two significant challenges, though. >>>>> The first and biggest problem is that, in general, array bounds can only >>>>> be >>>>> expressed on a pointer value if it’s got pointer to array type. Most C >>>>> array >>>>> code today works primarily with pointers to elements; programmers just use >>>>> array types to create concrete arrays, and they very rarely use pointers >>>>> to >>>>> array type at all. There are a bunch of reasons for that: >>>>> • Pointers to arrays have to be dereferenced twice: (*ptr)[idx] instead >>>>> of ptr[idx]. >>>>> • That makes them more error-prone, because it is easy to do pointer >>>>> arithmetic at the wrong level, e.g. by writing ptr[idx], which will >>>>> stride by multiples of the entire array size. That may even pass the >>>>> compiler without complaint because of C’s laxness about conversions. >>>>> • Keeping the bound around in the pointer type is more work and >>>>> doesn’t do >>>>> anything useful right now. >>>>> • A lot of C programmers dislike nested declarator syntax and can’t >>>>> remember >>>>> how it works. Those of us who can write it off the top of our heads are >>>>> quite atypical. >>>>> Now, there is an exception: you can write a parameter using an array type, >>>>> and it actually declares a pointer parameter. You could imagine using this >>>>> as a syntax for an enforceable array bound for arguments, although the >>>>> committee did already decide that these bounds were meaningless without >>>>> static. Unfortunately, you can’t do this in any other position and still >>>>> end up with just a pointer, so it’s not helpful as a general syntax for >>>>> associating bounds with pointers. >>>>> The upshot is that this isn’t really something people can just adopt by >>>>> adding annotations. It’s not just a significant rewrite, it’s a rewrite >>>>> that >>>>> programmers will have very legitimate objections to. I think that makes >>>>> this >>>>> at best a complement to the “sidecar” approach taken by -fbounds-safety >>>>> where we can track top-level bounds to a specific pointer value. >>>>> The second problem is that there are some extralingual problems that >>>>> -fbounds-safety has to solve around bounds that aren’t just local >>>>> evaluations of bounds expressions, and a type-conversion-driven approach >>>>> doesn’t help with any of them. >>>>> As you mentioned, the design of variably-modified types is based on >>>>> evaluating the bounds expression at some specific point in the program >>>>> execution. Since these types can only be written locally, the evaluation >>>>> point is obvious. If we wanted to dynamically enforce bounds during >>>>> initialization, it would simply be another use of the same computed bound: >>>>> int count = ...; >>>>> int (*ptr)[count * 10] = source_ptr; >>>>> >>>>> Here we would evaluate count * 10 exactly once and use it both as (1) part >>>>> of the destination type when initializing ptr with source_ptr and (2) >>>>> part of the type of ptr for all uses of it. For example, if source_ptr >>>>> were of type int (*)[100], we would dynamically check that >>>>> count * 10 <= 100. This all works perfectly with an arbitrary bounds >>>>> expression; it could even contain an opaque function call. >>>>> Note that we don’t need any special behavior specifically for >>>>> initialization. If we later assign a new value into ptr, we will still be >>>>> converting the new value to the type int (*)[< count * 10 >], using the >>>>> value computed at the time of declaration of the variable. This model >>>>> would >>>>> simply require that conversion to validate the bounds during assignment >>>>> just >>>>> as it would during initialization. >>>>> Now, with nested arrays, variance does become a problem. Let’s reduce >>>>> bounds expression to their evaluated bounds to make this easier to write. >>>>> • int (*)[11] can be converted to int(*)[10] because we’re simply >>>>> allowing fewer elements to be used. >>>>> • By the same token, int (*(*)[11])[5] can be converted to >>>>> int (*(*)[10])[5]. This is the same logic as the above, just with an >>>>> element type that happens to be a pointer to array type. >>>>> • But int (*(*)[11])[5] cannot be safely converted to int >>>>> (*(*)[11])[4], >>>>> because while it’s safe to read an int (*)[4] from this array, it’s >>>>> not safe to assign one into it. >>>>> • int (* const (*)[11])[5] can be safely converted to >>>>> int (* const (*)[11])[4], but only if this dialect also enforces const- >>>>> correctness, at least on array pointers. >>>>> Anyway, a lot of this changes if we want to use the same concept for >>>>> non-local pointers to arrays, because we no longer have an obvious point >>>>> of >>>>> execution at which to evaluate the bounds expression. Instead, we are >>>>> forced >>>>> into re-evaluating it every time we access the variable holding the array. >>>>> Consider: >>>>> struct X { >>>>> int count; >>>>> int (*ptr)[count * 10]; // using my preferred syntax >>>>> }; >>>>> >>>>> void test(struct X *xp) { >>>>> // For the purposes of the conversion check here, the >>>>> // source type is int (*)[< xp->count * 10 >], freshly >>>>> // evaluated as part of the member access. >>>>> int (*local)[100] = xp->ptr; >>>>> } >>>>> >>>>> This has several immediate consequences. >>>>> Firstly, we need to already be able to compute the correct bound when we >>>>> do >>>>> the dynamic checks for assignments into this field. For local variably- >>>>> modified types, everything in the expression was already in scope and >>>>> presumably initialized, so this wasn’t a problem. Here, we’re not helped >>>>> by scope, and we are dependent on the count field already having been >>>>> initialized. >>>>> Secondly, we must be very concerned about anything that could change the >>>>> result of this evaluation. So we cannot allow an arbitrary expression; >>>>> it must be something that we can fully analyze for what could change it. >>>>> And if refers to variables or fields (which it presumably always will), we >>>>> must prevent assignments to those, or at least validate that any >>>>> assignments aren’t causing unsound changes to the bound expression. >>>>> Thirdly, that concern must apply non-locally: if we allow the address of >>>>> the >>>>> pointer field to be taken (which is totally fine in the local case!), >>>>> we can no directly reason about mutations through that pointer, so we >>>>> have to prevent changes to the bounds variables/fields while the pointer >>>>> is >>>>> outstanding. >>>>> And finally, we must be able to recognize combinations of assignments, >>>>> because when we’re initializing (or completely rewriting) this structure, >>>>> we will need to able to assign to both count and ptr and not have the >>>>> same restrictions in place that we would for separate assignments. >>>>> None of this falls out naturally from separate, local language rules; it >>>>> all has to be invented for the purpose of serving this dynamic check. And >>>>> in fact, -fbounds-safety has to do all of this already just to make >>>>> basic checks involving pointers in structs work. >>>>> If that can all be established, though, I think the type-conversion-based >>>>> approach using variably-modified types has some very nice properties as a >>>>> complement to what we’re doing in -fbounds-safety. >>>>> For one, it interacts with the -fbounds-safety analysis very cleanly. If >>>>> bounds in types are dynamically enforced (which is not true in normal C, >>>>> but could be in this dialect), then the type becomes a source for reliable >>>>> reliable information for the bounds-safety analysis. Conversely, if >>>>> a pointer is converted to a variably-modified type, the analysis done >>>>> by -bounds-safety could be used as an input to the conversion check. >>>>> For another, I think it may lead towards an cleaner story for arrays of >>>>> pointers to arrays than -fbounds-safety can achieve today, as long as >>>>> the inner arrays are of uniform length. >>>>> But ultimately, I think it’s still at best a complement to the attributes >>>>> we need for -fbounds-safety. >>>>> John. >>>> >>> >> > > Yeoul