On Thu, Mar 6, 2025 at 4:28 PM Yeoul Na <yeoul...@apple.com> wrote: > > Hi Qing, > > Thanks for writing up the RFC and keeping us in the loop. Are you planning to > add “__self.” to GCC's C++ compiler as well in the future? The problem we > have with “__self” being a default way of annotating bounds is that C++ > compatibility because bounds annotations are supposed to work in headers > shared between C and C++ and C++ should be able to parse it to secure the > boundary between the two languages. Another problem is the usability. The > user will have to write more code “__self.” all the time in the most common > use cases, which would be a huge regression for the usability of the language.
There're usability concerns but they aren't as important as conflicting with current symbol resolution methods. In the thread on Discourse, it was brought up (paraphrased badly) that using the declarator syntax in the suggested way may conflict with future C standard committee syntax decisions. What we have now conflicts with C's current syntax. > We are planning to write up alternative proposal without having to introduce > a new syntax to the C standard. We’ll discuss how we address problems raised > here. Please see my inlined comments. Thanks! :-) > Best, > Yeoul > > > > On Mar 6, 2025, at 2:03 PM, Yeoul Na <yeoul...@apple.com> wrote: > > > > + John & Félix & Patryk & Henrik > > > >> On Mar 6, 2025, at 1:44 PM, Qing Zhao <qing.z...@oracle.com> wrote: > >> > >> Hi, > >> > >> Since I sent the patch series for “extend counted_by attribute to pointer > >> fields of structure” two months ago, a lot of discussion were invoked both > >> in > >> GCC community and CLANG community: > >> > >> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html > >> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/131?u=gwelymernans > >> > >> After reading all these discussions, understanding, studying, more > >> discussions, > >> and finally making the whole picture clearer, we came up with a proposal > >> to change > >> the current design and add a new syntax for the argument of counted_by > >> attribute. > >> > >> The original idea of the new syntax was from Joseph, Michael and Martin, > >> Bill and Kees > >> involved in the whole process of the proposal, providing a lot of > >> suggestions and > >> comments. Really appreciate for the help from all of them. > >> > >> In this thread, I am also CC’ing several people from Apple who worked on > >> the -fbounds-safety > >> project on CLANG side: yeoul...@apple.com <mailto:yeoul...@apple.com>, > >> d_tard...@apple.com <mailto:d_tard...@apple.com>, dl...@apple.com > >> <mailto:dl...@apple.com>, > >> and dcough...@apple.com <mailto:dcough...@apple.com>. > >> > >> Please take a look at the proposal in below. > >> > >> Let me know if you have any comments and suggestions. > >> > >> Thanks. > >> > >> Qing. > >> > >> ========================================= > >> > >> New syntax for the argument of counted_by attribute > >> --An extension to C language > >> > >> Outline > >> > >> 0. A simple summary of the proposal > >> > >> 1. The motivation > >> 1.1 The current syntax of the counted_by argument might break existing > >> legal C code > >> 1.2 New requests from the users of the counted_by attribute > >> 1.2.1 Refer to a field in the nested structure > >> 1.2.2 Refer to globals or locals > >> 1.2.3 Represent simple expression > >> 1.2.4 Forward referencing > >> > >> 2. The requirement > >> > >> 3. The proposed new syntax > >> 3.1 Legal C code with VLA works correctly when mixing with counted_by > >> 3.2 Satisfy all the new requests > >> 3.2.1 Refer to a field in the nested structure > >> 3.2.2 Refer to globals or locals > >> 3.2.3 Represent simple expression > >> 3.3 How to resolve the forward reference issue in section 1.2.4? > >> > >> Appendix A: Scope of variables in C and C++ > >> --The hints to the design of counted_by in C > >> Appendix B: An example in linux kernel that the global cannot be "const" > >> qualified > >> > >> > >> 0. A simple summary of the proposal > >> > >> We propose a new syntax to the argument of the counted_by attribute: > >> * Introduce a new keyword, __self, to represent the new concept, > >> "the current object" of the nearest non-anonymous enclosing structure, > >> which allows the object of the structure to refer to its own member inside > >> the structure definition. > >> > >> * With the new keyword, __self, the member variable can be referenced > >> by appending the member access operator "." to "__self", such as, > >> __self.member. > >> > >> * This new keyword is invalid except in the bounds checking attributes, > >> such as "counted_by", etc., inside a structure definition. > >> > >> * Simple expression is enabled by this new keyword inside the attribute > >> counted_by with the following limitation: > >> A. no side-effect is allowed; > >> and > >> B. the operators of the expression are simple arithmetic operators, and the > >> operands could be one of: > >> B.1 __self.member or __self.member1.member2...(for nested structure); > >> B.2 constant; > >> B.3 locals that will not be changed after initialization; > >> B.4 globals that will not be changed after initialization; > >> > >> > >> 1. The motivation > >> > >> There are two major motivations for this new syntax. > >> > >> 1.1 The current syntax of the counted_by argument might break existing > >> legal C code > >> > >> The counted_by attribute is currently defined as: > >> (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-counted_005fby-variable-attribute) > >> > >> counted_by (count) > >> The counted_by attribute may be attached to the C99 flexible array member > >> of a structure. It indicates that the number of the elements of the array > >> is > >> given by the field "count" in the same structure as the flexible array > >> member. > >> > >> For example: > >> > >> int count; > >> struct X { > >> int count; > >> char array[] __attribute__ ((counted_by (count))); > >> }; > >> > >> In the above, the argument of the attribute "count" is an identifier that > >> will be > >> looked up in the scope of the enclosing structure "X". Due to this new > >> scope > >> of variable, the identifier "count" refers to the member variable "count" > >> of this > >> structure but not the global variable defined outside of the structure. > >> > >> This is a new scope of variable that is added to the C language. In C, the > >> default available scopes of variable include only two scopes, global scope > >> and local scope. > >> > >> The global scope refers to the region outside any function or block. > >> The variables declared here are accessible throughout the entire program. > >> > >> The local scope refers to the region enclosed between the { } braces, which > >> represent the boundary of a function or a block inside functions. The > >> variables > >> declared within a function or a block are only accessible locally inside > >> that > >> function or that block and other blocks nested inside. > >> > >> (Please see Appendix A for more details on scope of variables in C and C++ > >> and why the current design of counted_by attribute is a disaster to C) > >> > >> Note, the { } brace that marks the boundary of a structure does not change > >> the current scope of the variable with the default scoping rules in C. > >> > >> As a result, in the above example, with C's default scoping rule, the > >> "count” > >> inside counted_by attribute _should_ refer to the global variable "count" > >> but > >> not the member variable in the enclosing structure. > >> > >> A more compelling example is shown below when mixing counted_by attribute > >> with C's Variable Length Array (VLA). > >> > >> void boo (int k) > >> { > >> const int n = 10; // a local variable n > >> struct foo { > >> int n; // a member variable n > >> int a[n + 10]; // for VLA, this n refers to the local variable n. > >> char b[] __attribute__ ((counted_by(n))); > >> // for counted_by, this n refers to the member variable n. > >> }; > >> } > >> > >> This code is bad. The size expression "n+10" of the VLA "a" follows the > >> default > >> scoping rule of C, as a result, "n" refers to the local variable "n" that > >> is defined > >> outside of the structure "foo"; However, the argument "n" of the counted_by > >> attribute of the flexible array member b[] follows the new scoping rule, > >> it refers > >> to the member variable "n" inside this structure. > >> > >> It's clear that the current design of the counted_by argument introduced a > >> new > >> scoping rule into C, resulting an inconsistent scoping resolution > >> situation in > >> C language. > >> > >> This is a design mistake, and should be fixed. > > We will have a different proposal based on reporting diagnostics on the name > conflicts. We need to diagnose the name conflicts like above anyway because > in code like that almost always the struct contains a buffer and its size as > the fields. Given that the program’s intention would be more likely to pick > up the member `n`, instead of some random global happened to be with the same > name in the same translation unit. Therefore, we should diagnose such cases > to avoid mistakes and avoid the program silently working with an unintended > way with the user mistake. Also, this program will have a different meaning > in C++, so that’s another reason to always diagnose with such ambiguity. > Also, the bounds annotation user might have just forgotten to add “__self.” > because it’s so intuitive to use the member name inside the attributes (I > know what’s “intuitive" depends on people’s background, but that’s what we > observed from massive adoption experience within Apple). This leaves the > feature error-prone, because the most intuitive syntax for bounds annotations > will be compiled into a different meaning (using the global as the size > instead of the peer member). So we should really diagnose it even if we add > “__self" to avoid the mistake. We can, of course, address any error prone mistakes with diagnostics. The issue is that the code above is perfectly correct with our current syntax. So what kind of diagnostics would you add for it that makes the scoping issue go away? > Now, if we always diagnose it, then the lookup order doesn’t really matter > anymore. That means we will have an option to keep the current lookup rule of > C, and pick up the member name only when the global name is not available > (just one possible option). I see “__self.” being used as a suppression > mechanism if the programmer cannot change the name of the conflicting global > or member. But that doesn’t mean “__self” should be a default way of writing > the code. Suppression mechanisms are typically only used to suppress the > warnings and disambiguate. And this would mean we also need a way to > disambiguate it to mean global. C++ already has `::` but C doesn’t currently > have a scope qualifier but in order to use this new bounds safety feature, we > may need to invent something. Adding a new syntax is a risk so until we > standardize it I would suggest something like `__builtin_global_ref()` Using '__self' only to disambiguate a potential conflict may be enough, and would address the usability issue (even though I said it wasn't as important, I don't believe it's something we should simply ignore either). Though we're still essentially introducing an entirely new scoping rule into C. That's a far larger undertaking than adding a syntax to attributes, and I for one am nervous about doing so, because far lesser features are debated by the standards committee to ensure that they're done correctly. > >> 1.2 New requests from the users of the counted_by attribute > >> > >> The counted_by attribute for Flexible Array Member (FAM) has been adopted > >> in > >> Linux Kernel extensively. New requests came in in order to cover more > >> cases. > >> > >> 1.2.1 Refer to a field in the nested structure > >> > >> This was requested from linux kernel. > >> https://www.spinics.net/lists/linux-rdma/msg127560.html > >> > >> A simplified testing case is: > >> > >> struct Y { > >> int n; > >> int other; > >> } > >> > >> struct Z { > >> struct Y y; > >> int array[] __attribute__ ((counted_by(?y.n))); > >> }; > >> > >> in the above, what should be put instead of "?" to refer to the field "n" > >> of the > >> field "y" of the current object of this struct Z? > >> > >> NOTE, we should completely reject the use cases that refer to a field in an > >> outer structure from an inner non-anonymous structure, such as: > >> > >> struct A { > >> int count; > >> struct B { > >> int other; > >> int z[] __attribute__ ((counted_by(?))); > >> } b; > >> }; > >> > >> In the above, we should not allow the counted_by "?" of the FAM field "z" > >> of > >> the struct B to refer to the member variable "count" of the outer struct A. > >> Otherwise, when an object with the struct B is passed to a function, there > >> will be error when refer to the counted_by of its field "z". > >> > >> However, the counted_by attribute of a field in the inner anonymous > >> structure > >> should be allowed to refer to a field of the outer structure. Since the > >> inner > >> anonymous structure can not be used independently of its enclosing > >> structure, > >> such as: > >> > >> struct A { > >> int count; > >> struct { > >> int other; > >> int z[] __attribute__ ((counted_by(count))); > >> }; > >> } a; > >> > >> In the above testing case, the counted_by attribute for the field "z" of > >> the inner > >> anonymous structure should be able to refer to the field of the outer > >> structure. > > I couldn’t get the relation between the named nested struct and anonymous > struct here. Members of anonymous structure are essentially part of the outer > struct. And the members are already accessed the same as direct members of > the outer struct. It should work as below: > > > struct A { > int count; > struct B { > int other; > int z[] __attribute__ ((counted_by(count))); // error: reference to > undefined identifier `count`. > } b; > }; > > > struct A { > int count; > struct { > int other; > int z[] __attribute__ ((counted_by(count))); // works as members of > anonymous structure is part of structure A > }; > } a; > > > So I don’t see why this will prevent us from doing (counted_by(y.n)) without > needing any additional prefix. I think she said that it *would* work that way. :-) > >> 1.2.2 Refer to globals or locals > >> > >> One request from linux kernel is here: > >> https://lore.kernel.org/all/202309221128.6AC35E3@keescook/ > >> > >> A simple example is: > >> > >> int count;// global variable > >> struct X { > >> int count; // member variable > >> char array[] __attribute__ ((counted_by(??count))); > >> // How to refer to the global variable "count" > >> // but not the member variable "count" of the struct X? > >> } > >> > >> when the counted_by attribute tries to refer to the global variable > >> "count” outside > >> the structure, how to distinguish it with its member variable "count"? > > Again, this should be diagnosed and the programmer either needs to change the > name or use a suppression mechanism. As I suggested earlier we can introduce > something like __builtin_global_ref(), until we get a blessing from the C > committee to add a scope qualifier syntax in C. > > >> > >> NOTE, Users need to make sure that the global or local variables should > >> not be > >> changed after they are initialized; otherwise, the results of the array > >> bound > >> sanitizer and the __builtin_dynamic_object_size is undefined. > >> > >> Theoretically, We should limit the globals and locals ONLY to const > >> qualified > >> globals and locals to avoid abusing of this feature in the future. > >> However, due > >> to the existing code in linux kernel cannot be easily changed with const > >> qualifier. > >> We have to relax the limitation. See Appendix B for such an example in > >> linux kernel. > >> > >> In the future language extension, We should limit the globals and locals > >> ONLY > >> to const qualified globals and locals. > >> > >> 1.2.3 Represent simple expression > >> > >> This was requested multiple times from Linux kernel. One of the requests > >> is: > >> https://lore.kernel.org/lkml/20210727205855.411487-63-keesc...@chromium.org/ > >> > >> For example: > >> > >> int elm_size; > >> struct X { > >> int count; > >> char array[] __attribute__ ((counted_by(?count * elm_size))); > >> } > >> > >> in the above, what should be put instead of "?" to represent this simple > >> expression? > > It should just work without any prefix because there’s no name conflict here, > it will be clear what each unqualified name is referring to. As stated above, it's not about resolving symbol conflicts but about how we've added a completely new scoping rule to C. It's modeled off of C++'s class scoping rules, but it's a new concept to C. > constexpr int elm_size; > struct X { > int count; > char array[] __attribute__ ((counted_by(count * elm_size))); > } > > I think this is not too different from this: > > int elem_size; > int foo(void) { > int count; > return count * elem_size; > }; > > > > >> > >> NOTE, We should limit simple expressions to: > >> > >> A. no side-effect is allowed, > >> and > >> B. the operators of the expression are simple arithmetic operators, and > >> the operands > >> could be one of the following: > >> B.1 the member variable of the enclosing structure or inner structure of > >> the enclosing structure; > >> B.2 constant; > >> B.3 locals that will not be changed after initialization; > >> B.4 globals that will not be changed after initialization; > >> > >> 1.2.4 Forward referencing > >> > >> This request is only for counted_by attribute of pointers. Since the > >> flexible array > >> members(FAM) are always the last field of the containing structure, forward > >> reference issue does not exist for counted_by of FAM. > >> > >> How should we handle the situation when the counted_by attribute refers to > >> a member variable that is declared after the pointer field in the > >> structure? > >> > >> For example: > >> > >> struct bar { > >> char *array __attribute__ ((counted_by(??count))); > >> int count; } > >> > >> in the above, how can we refer to the field "count" that is declared after > >> the > >> pointer field "array" in the structure? > > We should be able to refer to an undeclared field anyway even with “__self." > no? “__self.” doesn’t solve the problem that you should still be able to > forward reference a member. No, it doesn't. Section 3.3 below offers some suggestions on how to deal with forward references. > >> 2. The requirement: > >> > >> This is an extension to C language, We should avoid adding a new scope of > >> variable (as the current syntax of the counted_by attribute for FAM) to > >> break > >> the existing legal C code. We should follow the default C language scoping > >> rules, keep the current valid C code working properly. > > We have a way to not change the meaning of the existing code without > introducing a new syntax, but diagnosing already error-prone code that should > apply to both VLAs and bounds annotations. We are planning to write up a > proposal to the C standard soon. I'm not sure how diagnostics alone will solve the issues here. Some diagnostics can be suppressed. If you make them unsuppressable, then you're effectively added a new syntax. > >> 3. The proposed new syntax: > >> > >> * Keep the default C scoping rules. > >> > >> * Introduce a new keyword, __self, to represent the new concept, "the > >> current object” > >> of the nearest non-anonymous enclosing structure, which allows the > >> object of the > >> structure to refer to its own member inside the structure definition. > >> This is similar > >> as the concept of "this" in C++, except that __self should be treated as > >> a special > >> variable but not a pointer. > >> > >> * With the new keyword, __self, the member variable can be referenced by > >> appending > >> the member access operator "." to "__self", such as, __self.member. This > >> is similar > >> as referring a member variable through a variable with the structure > >> type in the C > >> language. > >> > >> * This new keyword is invalid except in the bounds checking attributes, > >> such as > >> "counted_by", etc., inside a structure definition. > >> > >> * Simple expression is allowed inside the attribute counted_by with the > >> following limitation: > >> > >> A. no side-effect is allowed, > >> and > >> B. the operators of the expression are simple arithmetic operators, and > >> the operands > >> could be one of: > >> B.1 __self.member or __self.member1.member2...(for nested structure); > >> B.2 constant; > >> B.3 locals that will not be changed after initialization; > >> B.4 globals that will not be changed after initialization; > >> > >> With the new syntax, the problems 1.1, 1.2.1 and 1.2.2 and 1.2.3 can be > >> resolved > >> naturally as following: > >> > >> 3.1 Legal C code with VLA works correctly when mixing with counted_by > >> > >> The previously bad code mixing with VLA is now: > >> > >> void boo (int k) > >> { > >> const int n = 10; // a local variable n > >> struct foo { > >> int n; // a member variable n > >> int a[n + 10]; // for VLA, this n refers to the local variable n. > >> char b[] __attribute__ ((counted_by(__self.n))); > >> // for counted_by, this __self.n refers to the member variable n. > >> }; > >> } > >> > >> Now, We keep the default C scoping rule and make the counted_by referring > >> to member variable in the same structure correctly without ambiguity. > >> > >> 3.2 Satisfy all the new requests > >> > >> With this new syntax, all the new requests in section 1.2 (except 1.2.4 > >> Forward > >> referencing) are resolved naturally. > >> > >> 3.2.1 Refer to a field in the nested structure > >> > >> struct Y { > >> int n; > >> int other; > >> } > >> > >> struct Z { > >> struct Y y; > >> int *array __attribute__ ((counted_by(__self.y.n))); > >> }; > >> > >> 3.2.2 Refer to globals or locals > >> > >> int count; > >> struct X { > >> char others; > >> char array[] __attribute__ ((counted_by(count))); > >> } > >> > >> Since the new syntax keeps the default scoping rule of C language, the > >> "count” > >> without any prefix inside the counted_by attribute refers to the current > >> visible > >> variable in the current scope, that is the global variable "count”. > >> > >> 3.2.3 Represent simple expression > >> > >> When we can distinguish globals/locals from the member variables with this > >> new syntax, simple expressions are represented naturally: > >> > >> int elm_size; > >> struct X { > >> int count; > >> int *array __attribute__ ((counted_by(__self.count * elm_size))); > >> } > >> > >> More complicated example: > >> > >> struct foo { > >> int n; > >> float f; > >> } > >> > >> A. > >> #define NETLINK_HEADER_BYTES 8 > >> struct bar1 { > >> struct foo y[5][10]; > >> char *array __attribute__ ((counted_by(__self.y[1][3].n - > >> NETLINK_HEADER_BYTES))); > >> } > >> > >> B. struct bar2 { > >> int n; > >> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n))); > >> }; > >> > >> C. > >> struct bar3 { > >> int n; > >> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n + > >> __self.n))); > >> }; > >> > >> > >> 3.3 How to resolve the forward reference issue in section 1.2.4? > >> > >> The new syntax naturally resolved all the problems we listed in section 1.2 > >> except the forward reference issue: > >> > >> If the member variable that is referred inside the counted_by is declared > >> after > >> the pointer field with the counted_by attribute, such as: > >> > >> struct bar { > >> char *array __attribute__ ((counted_by(__self.count))); > >> int count; } > >> > >> In the above code, when "__self.count" is referred, its declaration is not > >> available, > >> compiler doesn't know its type yet. > >> > >> If it is a regular global or a local variable, this is a source code > >> error, C FE reports > >> an error and aborts. User should fix this coding error by adding the > >> declaration > >> of the variable before its first reference in the source code. > >> > >> Theoretically, in C, we should treat this as a source code error too. > >> However, due to existing cases in the application (i.e, Linux Kernel), in > >> order to > >> avoid the source code change which might be painful or impossible due to > >> existing ABI, can we accept such cases and handle it in compiler? > >> > >> I think this might be doable during the implementation of the counted_by > >> attribute > >> in C FE: > >> > >> A. when C FE parses the new keyword __self, the whole containing structure > >> has > >> not yet been seen completely, as a result, the FE has to insert a > >> placeholder for > >> __self, and delay the real IR generation after the whole structure being > >> parsed. > >> So, a small late handling ONLY for this placeholder _cannot_ be avoided. > >> > >> B. Then during this late handling of the placeholder, the C FE already > >> parses the > >> whole structure, the declaration of the field is known at that time, the > >> forward > >> reference issue can be resolved naturally. > >> > >> This can be illustrated in the following small example: > >> > >> struct bar { > >> char *array __attribute__ ((counted_by(__self.count))); > >> /* We haven't encountered 'count' yet, so we assume it's something like > >> 'size_t' for now when inserting the placeholder for "__self". */ > >> int count; > >> }; /* At this point, we know everything about the struct, we can handle > >> the placeholder for "__self" and also go back and use 'int" for > >> the type to refer count */ > >> > >> > >> Appendix A: Scope of variables in C and C++ > >> --The hints to the design of counted_by in C > >> > >> Scope of a variable defines the region of the code in which this variable > >> can > >> be accessed and modified. > >> > >> 1. What's common on the scope of variables between C and C++? > >> > >> **First, there are mainly two types of variable scopes: > >> > >> A. Global Scope > >> The global scope refers to the region outside any function or block. The > >> variables declared here are accessible throughout the entire program > >> and are called Global Variables. > >> > >> B. Local Scope > >> The local scope refers to the region enclosed between the { } braces, > >> which represent the boundary of a function or a block inside functions. > >> The variables declared within a function or a block are only accessible > >> locally inside that function or that block and other blocks nested inside. > >> > >> NOTE 1: the {} brace that mark the boundary of a structure/class does > >> not change whether the current scope is global or local. > >> > >> **Second, if two variables with same name are defined in different scopes, > >> one in local scope and the other in global scope, the precedence is given > >> to the local variable: > >> > >> [opc@qinzhao~]$ cat t1.c > >> // Global variable > >> int a = 5; > >> int main() { > >> // Local variable with same name as that of > >> // global variable > >> int a = 100; > >> // Accessing a > >> __builtin_printf ("a is %d\n", a); return 0; > >> } > >> [opc@qinzhao~]$ gcc t1.c; ./a.out > >> a is 100 > >> [opc@qinzhao~]$ g++ t1.c; ./a.out > >> a is 100 > >> > >> > >> 2. What's different on the scope of variables between C and C++? > >> > >> C++ has 3 additional variations of scopes: > >> > >> A. Instance Scope (member scope): > >> > >> The instance scope, also called member scope, refers to the region inside > >> a class/structure but outside any member function of the class/structure. > >> The variables, i.e, the data members, declared here are accessible to the > >> whole class/structure. They can be accessed by the object (i.e., the > >> instance) > >> of the class/structure. > >> > >> [opc@qinzhao~]$ cat t2.C > >> struct foo { > >> int bar1(void) { return m; }; // m refers to the member variable > >> int bar2(void) { int m = 20; return m; }; // return m refers to the > >> local variable m = 20 > >> int bar3(void) { int m = 30; return this->m; }; // this->m refers to > >> the member variable > >> foo (int val) { m = val; }; // m refers to the member variable > >> int m; // Member variable with instance scope, accessible to the > >> whole structure/class > >> }; > >> > >> int main () > >> { > >> struct foo f(10); > >> __builtin_printf (" bar1 is %d \n", f.bar1()); > >> __builtin_printf (" bar2 is %d \n", f.bar2()); > >> __builtin_printf (" bar3 is %d \n", f.bar3()); > >> return 0; > >> } > >> [opc@qinzhao~]$ g++ t2.C; ./a.out > >> bar1 is 10 bar2 is 20 bar3 is 10 > >> > >> Explanation: The member variable "m" is declared inside the structure > >> "foo" but > >> outside any member function of "foo", it has instance scope. This variable > >> is > >> visible to all the member functions of the structure "foo". when there is > >> a name > >> conflict with a local variable inside a member function, for example, > >> "bar2”, > >> the local variable has higher precedence. When trying to explicitly refer > >> to the > >> member variable in the member function, adding the C++ "this" pointer > >> before > >> it, for example, "bar3”. > >> > >> NOTE 2: the {} brace that marks the boundary of a structure/class changes > >> the > >> variable scope to "instance scope" in C++. > >> > >> B. Static Member Scope > >> > >> The static member scope refers to variables declared with the static > >> keyword > >> within the class/structure. These variables can be accessed using the class > >> name without creating the instance. > >> > >> [opc@qinzhao~]$ cat t3.C > >> struct foo { > >> static int m; // Static member variable with static member scope, > >> // accessible in whole structure/class > >> }; > >> int foo::m = 10; > >> int main () > >> { > >> __builtin_printf (" foo::m is %d\n", foo::m); > >> return 0; > >> } > >> [opc@qinzhao~]$ g++ t3.C; ./a.out > >> foo::m is 10 > >> > >> NOTE 3: static member in structure is not available in C. > >> > >> C. Namespace Scope > >> > >> A namespace in C++ is a container that allows users to create a separate > >> scope > >> where the given variables are defined. It is used to avoid name conflicts > >> and group > >> related code together. These variables can be accessed using their > >> namespace > >> name and scope resolution operator. > >> > >> [opc@qinzhao~]$ cat t4.C > >> namespace foo { > >> int m = 10; // Namespace scope variable > >> }; > >> int main () > >> { > >> __builtin_printf (" foo::m is %d\n", foo::m); > >> return 0; > >> } > >> [opc@qinzhao~]$ g++ t4.C; ./a.out > >> foo::m is 10 > >> > >> NOTE 4: namespaces are not available in C language. > >> > >> 3. A simple summary comparing C to C++ > >> > >> A. there are only two variable scopes in C: > >> > >> global scope > >> local scope > >> > >> all the other 3 variant variable scopes in C++,i.e., instance scope > >> (member scope), > >> static member scope, namespace scope, are not available in C. > >> > >> Since there is no static member and namespace in C language, accessing to > >> static > >> member variables of a structure or variables declared in another namespace > >> is > >> not needed in C at all. > >> > >> NOTE 5: However, accessing the member of a structure inside the structure > >> is > >> needed for the purpose of counted_by extension in C. > >> > >> B. the {} brace that represents the boundary of the structure does not > >> change the > >> scope of the variable in C since C doesn't have instance scope > >> (i.e.,member scope); > >> > >> The following examples can show these limitation in C language. > >> > >> C currently support variable length array (VLA), whose array size could be > >> a > >> variable expression. VLA is only supported in local scopes in C. > >> > >> [opc@qinzhao~]$ cat t5.c > >> void boo (int k) > >> { > >> const int n = 10; > >> struct foo { > >> int m; > >> int a[n + k]; > >> }; > >> } > >> [opc@qinzhao~]$ gcc t5.c -S > >> > >> Explanation: This is good. The {} brace that marks the boundary of the > >> structure "foo” > >> does NOT change the scope of the variable n and k, their definitions reach > >> the > >> declaration of the array member field a[n + k]. > >> > >> However, when changing the testing case as: > >> [opc@qinzhao~]$ cat t6.c > >> void boo (int k) > >> { > >> const int n = 10; > >> struct foo { > >> int m; > >> int a[n + m]; > >> }; > >> } > >> [opc@qinzhao~]$ gcc t6.c -S > >> t6.c: In function ‘boo’: > >> t6.c:6:15: error: ‘m’ undeclared (first use in this function) > >> 6 | int a[n + m]; > >> | ^ > >> > >> Explanation: C does not have the concept of instance scope (member scope), > >> there is no syntax provided to access the instance scope (member scope) > >> variables inside the structures. Therefore, the reference to the member > >> variable > >> "m" inside the declaration of the array member field a[n + m] is not > >> visible. > >> > >> 4. What's the possible approaches for the counted_by attribute as a C > >> extension. > >> > >> The major thing for this extension is: > >> Adding a new language feature in C to access the member variables inside a > >> structure. > >> > >> Based on the previous comparison between C and C++, there are two possible > >> approaches: > >> > >> A. Add a new variable scope: instance scope (member scope) into C > >> > >> The definition of the new instance scope of C is: > >> > >> The instance scope, also called member scope, refers to the region inside > >> a structure. > >> The variables, i.e, the members, declared here are accessible to the whole > >> structure. > >> They can be accessed by the object (i.e., the instance) of the structure. > >> > >> The {} brace that marks the boundary of a structure will change the > >> variable scope > >> to "instance scope"; a variable name confliction between other scopes > >> (including global/local) and instance scope will give precedence to > >> instance scope. > >> > >> The compiler's implementation on this approach could be: > >> ** a new variable scope, "instance scope" is added into C FE; > >> ** the "instance scope" has the higher precedence than the current > >> global/local scope; > >> ** the {} brace for the boundary of a structure is the boundary for the > >> "instance scope"; > >> ** a member variable that is referenced inside this structure could be > >> treated as this->member. > >> ** reference to a global variable inside the structure need a new syntax. > >> > >> B. Add a new syntax to access instance scope (member scope) variable within > >> the structure while keeping C's default scoping rules. > >> > >> The {} brace that marks the boundary of a structure will NOT change the > >> variable > >> scope. There are still only two variable scoping, global and local. > >> > >> In order to explicitly access a member inside a structure, a new syntax > >> need to > >> be added. This new syntax could reuse the current designator syntax in C > >> (prefixing the member variable with "."), or adding a new keyword similar > >> as "this”, > >> such as, "__self", and prefixing the member variable with “__self." > >> > >> With the above approach A, we can keep the current syntax for counted_by; > >> but not sure how easy to extend it for simple expression and nested > >> structure. > >> > >> However, the major problem with this approach is: it changes the default > >> scoping > >> rule in C languages. this additional variable scoping will break existing > >> legal C code: > >> > >> [opc@qinzhao~]$ cat t7.c > >> void boo (int k) > >> { > >> const int n = 10; // a local variable n > >> struct foo { > >> int n; // a member variable n > >> int a[n + 10]; // currently, this n refers to the local variable n. > >> }; > >> } > >> > >> When we take the approach A, within the structure "foo", the VLA a[n+10] > >> will refer to the member variable n, but not the local variable n anymore. > >> The existing code with VLA might work incorrectly. > >> > >> You can argue to only add the new variable scope for counted_by attribute, > >> not for VLA, then how to handle the following case: > >> > >> [opc@qinzhao~]$ cat t8.c > >> void boo (int k) > >> { > >> const int n = 10; // a local variable n > >> struct foo { > >> int n; // a member variable n > >> int a[n + 10]; // for VLA, this n refers to the local variable n. > >> char *b __attribute__ ((counted_by(n + 10))) > >> // for counted_by, this n refers to the member variable n. > >> }; > >> } > >> > >> This will be a disaster. > >> > >> So, I think that the approach A is not the right direction for a C > >> extension. > >> > >> With the above approach B, a new syntax need to be implemented, > >> and all the previous source code change in the application need to be > >> modified. > >> > >> But I still think that approach B is the right direction to go. > >> (Please refer to: > >> ******Scope of variables in C++ > >> https://www.geeksforgeeks.org/scope-of-variables-in-c/ > >> ******Scope of variables in C > >> https://www.geeksforgeeks.org/scope-rules-in-c/) > >> > >> > >> Appendix B: An example in linux kernel that the global cannot be "const" > >> qualified > >> > >> In linux kernel, the globals that will be referred inside counted_by > >> attribute don’t > >> change value, but they cannot be marked "const" since they are initialized > >> during > >> very early kernel boot. > >> > >> they _become_ architecturally read-only. i.e. they are in a memory region > >> that > >> is flipped to read-only after boot is finished. > >> > >> > >> > >> > >> > >> > > >