On Thu, Mar 06, 2025 at 04:27:49PM -0800, Yeoul Na wrote: > Thanks for writing up the RFC and keeping us in the loop. Are > you planning to add “__self.” to GCC's C++ compiler as well in
Isn't this strictly a C feature? [1] > the future? The problem we have with “__self” being a default > way of annotating bounds is that C++ compatibility because bounds > annotations are supposed to work in headers shared between C and C++ > and C++ should be able to parse it to secure the boundary between the But this is just a header macro definition issue, not a language issue. There's plenty of C-only stuff in headers, but it's trivial to avoid parsing it in C++: #ifndef __cplusplus # define __counted_by(x) __attribute__((__counted_by__(x))) #else # define __counted_by(x) #endif Linux is actually already doing a form of this so we can use counted_by in UAPI header files. > two languages. Another problem is the usability. The user will have to > write more code “__self.” all the time in the most common use cases, > which would be a huge regression for the usability of the language. I think it's an overstatement to consider it a "huge regression" when this is a new feature. Also, again, this is trivially solved with macros. I'm fully expecting to make the "__self" transition in Linux by just adding a new macro for the expression-capable counted_by and adjusting the existing macro: #define __counted_by(x) __attribute__((__counted_by__(__self.(x)))) #define __counted_by_expr(x) __attribute__((__counted_by__(x))) This will immediately work for all users of the existing feature in Linux. Now, I suppose you might be trying to take into account the private use of differing non-public implementations. :) In that case, yeah, I could see there would be some work to do. But it should still be pretty easy to write a regex to do all the replacements on such a code base, and then build the results to find any globals that got included: perl -pi.old -e 's{\b__counted_by\((.*)\)([^\)]*)$}{ my $x = $1; my $end = $2; $x =~ s/([a-zA-Z_][a-zA-Z0-9_\.]*)/__self.$1/g; "counted_by($x)$end"; }ex;' > > On Mar 6, 2025, at 2:03 PM, Yeoul Na <yeoul...@apple.com> wrote: > > > > + John & Félix & Patryk & Henrik > > > >> On Mar 6, 2025, at 1:44 PM, Qing Zhao <qing.z...@oracle.com> wrote: > >> [...] > >> This code is bad. The size expression "n+10" of the VLA "a" follows the > >> default > >> scoping rule of C, as a result, "n" refers to the local variable "n" that > >> is defined > >> outside of the structure "foo"; However, the argument "n" of the > >> counted_by > >> attribute of the flexible array member b[] follows the new scoping rule, > >> it refers > >> to the member variable "n" inside this structure. > >> > >> It's clear that the current design of the counted_by argument introduced a > >> new > >> scoping rule into C, resulting an inconsistent scoping resolution > >> situation in > >> C language. > >> > >> This is a design mistake, and should be fixed. > > We will have a different proposal based on reporting diagnostics on the > name conflicts. We need to diagnose the name conflicts like above anyway > because in code like that almost always the struct contains a buffer > and its size as the fields. Given that the program’s intention would > be more likely to pick up the member `n`, instead of some random global > happened to be with the same name in the same translation unit. Therefore, > we should diagnose such cases to avoid mistakes and avoid the program > silently working with an unintended way with the user mistake. I'm all for better diagnostics, but since C doesn't have a way specify scope for a named variable, I don't see how such a diagnostic would be actionable. int nr; struct foo { int nr; u8 array[] __counted_by(nr); }; With this syntax (which would be using the "instance scope" that doesn't exist in C), there is no way in C to refer to the global "nr". Therefore, these expressions must be have an object reference, which is what __self provides. Now "__self.nr" and "nr" are distinct. If, however, you're saying that member names used in counted_by cannot alias with any globals, then I think that is overly restrictive. > Also, this program will have a different meaning in C++, so that’s another > reason to always diagnose with such ambiguity. But this feature is C only, so there is no "C++ meaning". > Also, the bounds annotation > user might have just forgotten to add “__self.” I don't have a solution for this risk, so from that perspective, it does appear that member names used in counted_by must not alias global names. But I think that is a new scope, with very weird semantics. It means that if a global appears in any header that happens to match the member name used in counted_by, the compilation fails. This seems risky too. But I think that such a construction results in a completely new scope (a namespace where all globals are merged into a given struct's member namespace) and creates a very fragile state. With __self, we are exclusively using global scope with a special "global" named __self. (It cannot be local scope because then the meaning of the counted_by expression would be expected to change if there were any local variables aliasing global variables, which is even more wild.) > because it’s so > intuitive to use the member name inside the attributes (I know what’s > “intuitive" depends on people’s background, but that’s what we > observed from massive adoption experience within Apple). This leaves > the feature error-prone, because the most intuitive syntax for bounds > annotations will be compiled into a different meaning (using the global > as the size instead of the peer member). So we should really diagnose > it even if we add “__self" to avoid the mistake. So, I think your earlier observation is correct about the "common" case for counted_by: it is just a single member name and nothing else: no expression, no globals, etc etc. I think the answer here is exactly what I suggested above, making it a macro issue, and leaving the feature itself to require __self. Then authors can easily deal with the common case not needing __self, but complex expressions can be constructed without them being ambiguous: #define __counted_by(x) __attribute__((__counted_by__(__self.(x)))) #define __counted_by_expr(x) __attribute__((__counted_by__(x))) struct easy { int length; u8 bytes[] __counted_by(length); }; struct complex { u64 long packet_info; u32 flags; u16 whole_size; u8 bytes[] __counted_by_expr(__self.whole_size - sizeof(struct complex)); }; u32 nr_cpus; struct sys_info { unsigned long details; struct cpu_info cpu[] __counted_by_expr(nr_cpus); }; > Now, if we always diagnose it, then the lookup order doesn’t really > matter anymore. That means we will have an option to keep the current > lookup rule of C, and pick up the member name only when the global name > is not available (just one possible option). I see “__self.” being > used as a suppression mechanism if the programmer cannot change the name > of the conflicting global or member. But that doesn’t mean “__self” > should be a default way of writing the code. Suppression mechanisms are > typically only used to suppress the warnings and disambiguate. And this > would mean we also need a way to disambiguate it to mean global. C++ > already has `::` but C doesn’t currently have a scope qualifier but > in order to use this new bounds safety feature, we may need to invent > something. Adding a new syntax is a risk so until we standardize it I > would suggest something like `__builtin_global_ref()` I'm not opposed to adding something like __builtin_global_ref(), but I don't think it really solves the confusion related to the existing scope resolution, as Qing showed with a VLA that uses local scope. void boo (int k) { const int n = 10; // a local variable n struct foo { int n; // a member variable n int a[n + 10]; // for VLA, this n refers to the local variable n. char b[] __counted_by(n); // aaagh }; ... } However, my "hide __self in the macro" solution doesn't make the above any less weird to read, so perhaps this isn't as useful an argument as I'd like. :) > [...] > We have a way to not change the meaning of the existing code without > introducing a new syntax, but diagnosing already error-prone code that > should apply to both VLAs and bounds annotations. We are planning to > write up a proposal to the C standard soon. What does "soon" mean? Can you send it out next week? It sounds like you're proposing adding "instance scope", which merges an object's member names with the global scope variable names. And to access global scope in the face of an aliased name, __builtin_global_ref(NAME) would be needed. (Such a builtin would be useful in other areas too, e.g. local scope to access an aliased name...) It does seem like this solves all the same problems, but I still don't like how fragile it is in the face of adding a global that might alias a used member name. When this happens in local scope, it's limited to the given function, and change any behaviors. For instance scope it means that suddenly no globals can be added that alias with used member names. And on the GCC side, all "instance scope" usage would just have parsing delayed until the end of the struct parisng. Thanks for the discussion! -Kees [1] https://clang.llvm.org/docs/BoundsSafety.html -- Kees Cook