Hi Qing,

Thanks for writing up the RFC and keeping us in the loop. Are you planning to 
add “__self.” to GCC's C++ compiler as well in the future? The problem we have 
with “__self” being a default way of annotating bounds is that C++ 
compatibility because bounds annotations are supposed to work in headers shared 
between C and C++ and C++ should be able to parse it to secure the boundary 
between the two languages. Another problem is the usability. The user will have 
to write more code “__self.” all the time in the most common use cases, which 
would be a huge regression for the usability of the language.

We are planning to write up alternative proposal without having to introduce a 
new syntax to the C standard. We’ll discuss how we address problems raised 
here. Please see my inlined comments.

Best,
Yeoul


> On Mar 6, 2025, at 2:03 PM, Yeoul Na <yeoul...@apple.com> wrote:
> 
> + John & Félix & Patryk & Henrik
> 
>> On Mar 6, 2025, at 1:44 PM, Qing Zhao <qing.z...@oracle.com> wrote:
>> 
>> Hi,
>> 
>> Since I sent the patch series for “extend counted_by attribute to pointer 
>> fields of structure” two months ago, a lot of discussion were invoked both in
>> GCC community and CLANG community:
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/131?u=gwelymernans
>> 
>> After reading all these discussions, understanding, studying, more 
>> discussions,  
>> and finally making the whole picture clearer, we came up with a proposal to 
>> change
>> the current design and add a new syntax for the argument of counted_by 
>> attribute. 
>> 
>> The original idea of the new syntax was from Joseph, Michael and Martin, 
>> Bill and Kees
>> involved in the whole process of the proposal, providing a lot of 
>> suggestions and
>> comments. Really appreciate for the help from all of them. 
>> 
>> In this thread, I am also CC’ing several people from Apple who worked on the 
>> -fbounds-safety
>> project on CLANG side: yeoul...@apple.com <mailto:yeoul...@apple.com>, 
>> d_tard...@apple.com <mailto:d_tard...@apple.com>, dl...@apple.com 
>> <mailto:dl...@apple.com>,
>> and dcough...@apple.com <mailto:dcough...@apple.com>.  
>> 
>> Please take a look at the proposal in below.
>> 
>> Let me know if you have any comments and suggestions.
>> 
>> Thanks.
>> 
>> Qing.
>> 
>> =========================================
>> 
>> New syntax for the argument of counted_by attribute
>> --An extension to C language  
>> 
>> Outline
>> 
>> 0. A simple summary of the proposal
>> 
>> 1. The motivation
>> 1.1 The current syntax of the counted_by argument might break existing legal 
>> C code
>> 1.2 New requests from the users of the counted_by attribute
>> 1.2.1 Refer to a field in the nested structure
>> 1.2.2 Refer to globals or locals
>> 1.2.3 Represent simple expression
>> 1.2.4 Forward referencing
>> 
>> 2. The requirement
>> 
>> 3. The proposed new syntax
>> 3.1 Legal C code with VLA works correctly when mixing with counted_by
>> 3.2 Satisfy all the new requests
>> 3.2.1  Refer to a field in the nested structure
>> 3.2.2 Refer to globals or locals
>> 3.2.3 Represent simple expression
>> 3.3 How to resolve the forward reference issue in section 1.2.4?
>> 
>> Appendix A: Scope of variables in C and C++
>>     --The hints to the design of counted_by in C
>> Appendix B: An example in linux kernel that the global cannot be "const" 
>> qualified
>> 
>> 
>> 0. A simple summary of the proposal
>> 
>> We propose a new syntax to the argument of the counted_by attribute:  
>> * Introduce a new keyword, __self, to represent the new concept, 
>>  "the current object" of the nearest non-anonymous enclosing structure, 
>>  which allows the object of the structure to refer to its own member inside
>>  the structure definition.  
>> 
>> * With the new keyword, __self, the member variable can be referenced 
>>   by appending the member access operator "." to "__self", such as, 
>>   __self.member.
>> 
>> * This new keyword is invalid except in the bounds checking attributes, 
>>   such as "counted_by", etc., inside a structure definition.
>> 
>> * Simple expression is enabled by this new keyword inside the attribute 
>>   counted_by with the following limitation:
>> A. no side-effect is allowed;
>> and
>> B. the operators of the expression are simple arithmetic operators, and the 
>>    operands could be one of:
>>  B.1 __self.member or __self.member1.member2...(for nested structure);
>>  B.2 constant;
>>  B.3 locals that will not be changed after initialization;
>>  B.4 globals that will not be changed after initialization;
>> 
>> 
>> 1. The motivation  
>> 
>> There are two major motivations for this new syntax.  
>> 
>> 1.1 The current syntax of the counted_by argument might break existing legal 
>> C code
>> 
>> The counted_by attribute is currently defined as:  
>> (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-counted_005fby-variable-attribute)
>> 
>> counted_by (count)
>> The counted_by attribute may be attached to the C99 flexible array member 
>> of a structure. It indicates that the number of the elements of the array is 
>> given by the field "count" in the same structure as the flexible array 
>> member.
>> 
>> For example:
>> 
>> int count;
>> struct X {
>> int count;
>> char array[] __attribute__ ((counted_by (count)));  
>> };
>> 
>> In the above, the argument of the attribute "count" is an identifier that 
>> will be 
>> looked up in the scope of the enclosing structure "X". Due to this new scope 
>> of variable, the identifier "count" refers to the member variable "count" of 
>> this 
>> structure but not the global variable defined outside of the structure.   
>> 
>> This is a new scope of variable that is added to the C language. In C, the 
>> default available scopes of variable include only two scopes, global scope 
>> and local scope.  
>> 
>> The global scope refers to the region outside any function or block. 
>> The variables declared here are accessible throughout the entire program.
>> 
>> The local scope refers to the region enclosed between the { } braces, which 
>> represent the boundary of a function or a block inside functions. The 
>> variables 
>> declared within a function or a block are only accessible locally inside 
>> that 
>> function or that block and other blocks nested inside.
>> 
>> (Please see Appendix A for more details on scope of variables in C and C++ 
>> and why the current design of counted_by attribute is a disaster to C)
>> 
>> Note, the { } brace that marks the boundary of a structure does not change
>> the current scope of the variable with the default scoping rules in C.
>> 
>> As a result, in the above example, with C's default scoping rule, the 
>> "count” 
>> inside counted_by attribute _should_ refer to the global variable "count" 
>> but 
>> not the member variable in the enclosing structure.  
>> 
>> A more compelling example is shown below when mixing counted_by attribute 
>> with C's Variable Length Array (VLA).
>> 
>> void boo (int k)
>> {
>> const int n = 10; // a local variable n
>> struct foo {
>>   int n;          // a member variable n
>>   int a[n + 10];  // for VLA, this n refers to the local variable n.
>>   char b[] __attribute__ ((counted_by(n)));
>>     // for counted_by, this n refers to the member variable n.
>> };
>> }
>> 
>> This code is bad. The size expression "n+10" of the VLA "a" follows the 
>> default 
>> scoping rule of C, as a result, "n" refers to the local variable "n" that is 
>> defined 
>> outside of the structure "foo"; However, the argument "n" of the counted_by 
>> attribute of the flexible array member b[] follows the new scoping rule, it 
>> refers 
>> to the member variable "n" inside this structure.   
>> 
>> It's clear that the current design of the counted_by argument introduced a 
>> new 
>> scoping rule into C, resulting an inconsistent scoping resolution situation 
>> in 
>> C language.
>> 
>> This is a design mistake, and should be fixed.

We will have a different proposal based on reporting diagnostics on the name 
conflicts. We need to diagnose the name conflicts like above anyway because in 
code like that almost always the struct contains a buffer and its size as the 
fields. Given that the program’s intention would be more likely to pick up the 
member `n`, instead of some random global happened to be with the same name in 
the same translation unit. Therefore, we should diagnose such cases to avoid 
mistakes and avoid the program silently working with an unintended way with the 
user mistake. Also, this program will have a different meaning in C++, so 
that’s another reason to always diagnose with such ambiguity. Also, the bounds 
annotation user might have just forgotten to add “__self.” because it’s so 
intuitive to use the member name inside the attributes (I know what’s 
“intuitive" depends on people’s background, but that’s what we observed from 
massive adoption experience within Apple). This leaves the feature error-prone, 
because the most intuitive syntax for bounds annotations will be compiled into 
a different meaning (using the global as the size instead of the peer member). 
So we should really diagnose it even if we add “__self" to avoid the mistake.

Now, if we always diagnose it, then the lookup order doesn’t really matter 
anymore. That means we will have an option to keep the current lookup rule of 
C, and pick up the member name only when the global name is not available (just 
one possible option). I see “__self.” being used as a suppression mechanism if 
the programmer cannot change the name of the conflicting global or member. But 
that doesn’t mean “__self” should be a default way of writing the code. 
Suppression mechanisms are typically only used to suppress the warnings and 
disambiguate. And this would mean we also need a way to disambiguate it to mean 
global. C++ already has `::` but C doesn’t currently have a scope qualifier but 
in order to use this new bounds safety feature, we may need to invent 
something. Adding a new syntax is a risk so until we standardize it I would 
suggest something like `__builtin_global_ref()`


>> 
>> 1.2 New requests from the users of the counted_by attribute
>> 
>> The counted_by attribute for Flexible Array Member (FAM) has been adopted in 
>> Linux Kernel extensively. New requests came in in order to cover more cases. 
>>  
>> 
>> 1.2.1 Refer to a field in the nested structure
>> 
>> This was requested from linux kernel.
>> https://www.spinics.net/lists/linux-rdma/msg127560.html
>> 
>> A simplified testing case is:
>> 
>> struct Y {
>> int n;
>> int other;
>> }
>> 
>> struct Z {
>> struct Y y;
>> int array[]  __attribute__ ((counted_by(?y.n)));
>> };
>> 
>> in the above, what should be put instead of "?" to refer to the field "n" of 
>> the 
>> field "y" of the current object of this struct Z?
>> 
>> NOTE, we should completely reject the use cases that refer to a field in an 
>> outer structure from an inner non-anonymous structure, such as:
>> 
>> struct A {
>> int count;
>> struct B {
>>  int other;
>>  int z[] __attribute__ ((counted_by(?)));
>> } b;
>> };
>> 
>> In the above, we should not allow the counted_by "?" of the FAM field "z" of 
>> the struct B to refer to the member variable "count" of the outer struct A. 
>> Otherwise, when an object with the struct B is passed to a function, there 
>> will be error when refer to the counted_by of its field "z".
>> 
>> However, the counted_by attribute of a field in the inner anonymous 
>> structure 
>> should be allowed to refer to a field of the outer structure. Since the 
>> inner 
>> anonymous structure can not be used independently of its enclosing 
>> structure, 
>> such as:  
>> 
>> struct A {
>> int count;
>> struct {
>>  int other;
>>  int z[] __attribute__ ((counted_by(count)));
>> };
>> } a;
>> 
>> In the above testing case, the counted_by attribute for the field "z" of the 
>> inner 
>> anonymous structure should be able to refer to the field of the outer 
>> structure.

I couldn’t get the relation between the named nested struct and anonymous 
struct here. Members of anonymous structure are essentially part of the outer 
struct. And the members are already accessed the same as direct members of the 
outer struct. It should work as below:


struct A {
int count;
struct B {
 int other;
 int z[] __attribute__ ((counted_by(count))); // error: reference to undefined 
identifier `count`. 
} b;
};


struct A {
int count;
struct {
 int other;
 int z[] __attribute__ ((counted_by(count))); // works as members of anonymous 
structure is part of structure A
};
} a;


So I don’t see why this will prevent us from doing (counted_by(y.n)) without 
needing any additional prefix.


>>  
>> 
>> 1.2.2 Refer to globals or locals  
>> 
>> One request from linux kernel is here:
>> https://lore.kernel.org/all/202309221128.6AC35E3@keescook/
>> 
>> A simple example is:
>> 
>> int count;// global variable
>> struct X {
>> int count; // member variable
>> char array[] __attribute__ ((counted_by(??count)));      
>>    //  How to refer to the global variable "count"      
>>    //  but not the member variable "count" of the struct X?
>> }
>> 
>> when the counted_by attribute tries to refer to the global variable "count” 
>> outside
>> the structure, how to distinguish it with its member variable "count"?

Again, this should be diagnosed and the programmer either needs to change the 
name or use a suppression mechanism. As I suggested earlier we can introduce 
something like __builtin_global_ref(), until we get a blessing from the C 
committee to add a scope qualifier syntax in C.

>> 
>> NOTE, Users need to make sure that the global or local variables should not 
>> be 
>> changed after they are initialized; otherwise, the results of the array 
>> bound 
>> sanitizer and the __builtin_dynamic_object_size is undefined.
>> 
>> Theoretically, We should limit the globals and locals ONLY to const 
>> qualified 
>> globals and locals to avoid abusing of this feature in the future. However, 
>> due 
>> to the existing code in linux kernel cannot be easily changed with const 
>> qualifier. 
>> We have to relax the limitation. See Appendix B for such an example in linux 
>> kernel.  
>> 
>> In the future language extension, We should limit the globals and locals 
>> ONLY 
>> to const qualified globals and locals.
>> 
>> 1.2.3 Represent simple expression
>> 
>> This was requested multiple times from Linux kernel. One of the requests is:
>> https://lore.kernel.org/lkml/20210727205855.411487-63-keesc...@chromium.org/
>> 
>> For example:
>> 
>> int elm_size;
>> struct X {
>> int count;
>> char array[] __attribute__ ((counted_by(?count * elm_size)));
>> }
>> 
>> in the above, what should be put instead of "?" to represent this simple 
>> expression?

It should just work without any prefix because there’s no name conflict here, 
it will be clear what each unqualified name is referring to.

constexpr int elm_size;
struct X {
int count;
char array[] __attribute__ ((counted_by(count * elm_size)));
}

I think this is not too different from this:

int elem_size;
int foo(void) {
  int count;
  return count * elem_size;
};



>> 
>> NOTE, We should limit simple expressions to:
>> 
>> A. no side-effect is allowed,
>> and
>> B. the operators of the expression are simple arithmetic operators, and the 
>> operands
>>    could be one of the following:
>>  B.1 the member variable of the enclosing structure or inner structure of 
>> the enclosing structure;
>>  B.2 constant;
>>  B.3 locals that will not be changed after initialization;
>>  B.4 globals that will not be changed after initialization;      
>> 
>> 1.2.4 Forward referencing
>> 
>> This request is only for counted_by attribute of pointers. Since the 
>> flexible array 
>> members(FAM) are always the last field of the containing structure, forward 
>> reference issue does not exist for counted_by of FAM.  
>> 
>> How should we handle the situation when the counted_by attribute refers to
>> a member variable that is declared after the pointer field in the structure?
>> 
>> For example:
>> 
>> struct bar {
>> char *array __attribute__ ((counted_by(??count)));
>> int count;  }
>> 
>> in the above, how can we refer to the field "count" that is declared after 
>> the 
>> pointer field "array" in the structure?

We should be able to refer to an undeclared field anyway even with “__self." 
no? “__self.” doesn’t solve the problem that you should still be able to 
forward reference a member.

>> 
>> 2. The requirement:
>> 
>> This is an extension to C language, We should avoid adding a new scope of 
>> variable (as the current syntax of the counted_by attribute for FAM) to 
>> break 
>> the existing legal C code. We should follow the default C language scoping 
>> rules, keep the current valid C code working properly.

We have a way to not change the meaning of the existing code without 
introducing a new syntax, but diagnosing already error-prone code that should 
apply to both VLAs and bounds annotations. We are planning to write up a 
proposal to the C standard soon.

>> 
>> 3. The proposed new syntax:
>> 
>> * Keep the default C scoping rules.
>> 
>> * Introduce a new keyword, __self, to represent the new concept, "the 
>> current object”
>>   of the nearest non-anonymous enclosing structure, which allows the object 
>> of the 
>>   structure  to refer to its own member inside the structure definition. 
>> This is similar
>>   as the concept of "this" in C++, except that __self should be treated as a 
>> special 
>>   variable but not a pointer.  
>> 
>> * With the new keyword, __self, the member variable can be referenced by 
>> appending
>>   the member access operator "." to "__self", such as, __self.member. This 
>> is similar 
>>   as referring a member variable through a variable with the structure type 
>> in the C 
>>   language.   
>> 
>> * This new keyword is invalid except in the bounds checking attributes, such 
>> as 
>>  "counted_by", etc.,  inside a structure definition.
>> 
>> * Simple expression is allowed inside the attribute counted_by with the 
>> following limitation:
>> 
>> A. no side-effect is allowed,
>> and
>> B. the operators of the expression are simple arithmetic operators, and the 
>> operands 
>>   could be one of:
>>  B.1 __self.member or __self.member1.member2...(for nested structure);
>>  B.2 constant;
>>  B.3 locals that will not be changed after initialization;
>>  B.4 globals that will not be changed after initialization;
>> 
>> With the new syntax, the problems 1.1, 1.2.1 and 1.2.2 and 1.2.3 can be 
>> resolved 
>> naturally as following:
>> 
>> 3.1 Legal C code with VLA works correctly when mixing with counted_by
>> 
>> The previously bad code mixing with VLA is now:
>> 
>> void boo (int k)
>> {
>> const int n = 10; // a local variable n
>> struct foo {
>>   int n;          // a member variable n
>>   int a[n + 10];  // for VLA, this n refers to the local variable n.
>>   char b[] __attribute__ ((counted_by(__self.n)));
>>     // for counted_by, this __self.n refers to the member variable n.
>> };
>> }
>> 
>> Now, We keep the default C scoping rule and make the counted_by referring 
>> to member variable in the same structure correctly without ambiguity.
>> 
>> 3.2 Satisfy all the new requests
>> 
>> With this new syntax, all the new requests in section 1.2 (except 1.2.4 
>> Forward 
>> referencing) are resolved naturally.
>> 
>> 3.2.1 Refer to a field in the nested structure
>> 
>> struct Y {
>> int n;
>> int other;
>> }
>> 
>> struct Z {
>> struct Y y;
>> int *array  __attribute__ ((counted_by(__self.y.n)));
>> };
>> 
>> 3.2.2 Refer to globals or locals  
>> 
>> int count;
>> struct X {
>> char others;
>> char array[] __attribute__ ((counted_by(count)));
>> }
>> 
>> Since the new syntax keeps the default scoping rule of C language, the 
>> "count” 
>> without any prefix inside the counted_by attribute refers to the current 
>> visible 
>> variable in the current scope, that is the global variable "count”.
>> 
>> 3.2.3 Represent simple expression
>> 
>> When we can distinguish globals/locals from the member variables with this 
>> new syntax, simple expressions are represented naturally:
>> 
>> int elm_size;
>> struct X {
>> int count;
>> int *array __attribute__ ((counted_by(__self.count * elm_size)));
>> }
>> 
>> More complicated example:
>> 
>> struct foo {
>> int n;
>> float f;
>> }
>> 
>> A.
>> #define NETLINK_HEADER_BYTES 8
>> struct bar1 {
>> struct foo y[5][10];
>> char *array __attribute__ ((counted_by(__self.y[1][3].n - 
>> NETLINK_HEADER_BYTES)));
>> }
>> 
>> B.  struct bar2 {
>> int n;
>> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n)));
>> };
>> 
>> C.
>> struct bar3 {
>> int n;
>> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n + __self.n)));
>> };
>> 
>> 
>> 3.3 How to resolve the forward reference issue in section 1.2.4?
>> 
>> The new syntax naturally resolved all the problems we listed in section 1.2 
>> except the forward reference issue:
>> 
>> If the member variable that is referred inside the counted_by is declared 
>> after 
>> the pointer field with the counted_by attribute, such as:
>> 
>> struct bar {
>> char *array __attribute__ ((counted_by(__self.count)));
>> int count;  }
>> 
>> In the above code, when "__self.count" is referred, its declaration is not 
>> available, 
>> compiler doesn't know its type yet.  
>> 
>> If it is a regular global or a local variable, this is a source code error, 
>> C FE reports 
>> an error and aborts. User should fix this coding error by adding the 
>> declaration 
>> of the variable before its first reference in the source code.
>> 
>> Theoretically, in C, we should treat this as a source code error too.  
>> However, due to existing cases in the application (i.e, Linux Kernel), in 
>> order to 
>> avoid the source code change which might be painful or impossible due to 
>> existing ABI, can we accept such cases and handle it in compiler?   
>> 
>> I think this might be doable during the implementation of the counted_by 
>> attribute
>> in C FE:
>> 
>> A. when C FE parses the new keyword __self, the whole containing structure 
>> has
>> not yet been seen completely, as a result, the FE has to insert a 
>> placeholder for 
>> __self, and delay the real IR generation after the whole structure being 
>> parsed. 
>> So, a small late handling ONLY for this placeholder _cannot_ be avoided.  
>> 
>> B. Then during this late handling of the placeholder, the C FE already 
>> parses the
>> whole structure, the declaration of the field is known at that time, the 
>> forward 
>> reference issue can be resolved naturally.   
>> 
>> This can be illustrated in the following small example:
>> 
>> struct bar {
>> char *array __attribute__ ((counted_by(__self.count)));      
>>    /* We haven't encountered 'count' yet, so we assume it's something like
>>      'size_t' for now when inserting the placeholder for "__self". */
>> int count;
>> };  /* At this point, we know everything about the struct, we can handle
>>      the placeholder for "__self" and also go back and use 'int" for
>>      the type to refer count */
>> 
>> 
>> Appendix A: Scope of variables in C and C++  
>> --The hints to the design of counted_by in C
>> 
>> Scope of a variable defines the region of the code in which this variable 
>> can 
>> be accessed and modified.  
>> 
>> 1. What's common on the scope of variables between C and C++?
>> 
>> **First, there are mainly two types of variable scopes:  
>> 
>> A. Global Scope
>> The global scope refers to the region outside any function or block. The 
>> variables declared here are accessible throughout the entire program 
>> and are called Global Variables.
>> 
>> B. Local Scope
>> The local scope refers to the region enclosed between the { } braces, 
>> which represent the boundary of a function or a block inside functions. 
>> The variables declared within a function or a block are only accessible 
>> locally inside that function or that block and other blocks nested inside.  
>> 
>> NOTE 1: the {} brace that mark the boundary of a structure/class does 
>> not change whether the current scope is global or local.
>> 
>> **Second, if two variables with same name are defined in different scopes, 
>> one in local scope and the other in global scope, the precedence is given 
>> to the local variable:
>> 
>> [opc@qinzhao~]$ cat t1.c
>> // Global variable
>> int a = 5;
>> int main() {
>> // Local variable with same name as that of
>> // global variable
>> int a = 100;
>> // Accessing a
>> __builtin_printf ("a is %d\n", a);    return 0;
>> }
>> [opc@qinzhao~]$ gcc t1.c; ./a.out
>> a is 100
>> [opc@qinzhao~]$ g++ t1.c; ./a.out
>> a is 100
>> 
>> 
>> 2. What's different on the scope of variables between C and C++?
>> 
>> C++ has 3 additional variations of scopes:
>> 
>> A. Instance Scope (member scope):
>> 
>> The instance scope, also called member scope, refers to the region inside 
>> a class/structure but outside any member function of the class/structure. 
>> The variables, i.e, the data members, declared here are accessible to the 
>> whole class/structure. They can be accessed by the object (i.e., the 
>> instance) 
>> of the class/structure.   
>> 
>> [opc@qinzhao~]$ cat t2.C
>> struct foo {
>> int bar1(void) { return m;  };      // m refers to the member variable
>> int bar2(void) { int m = 20; return m;  };      // return m refers to the 
>> local variable m = 20
>> int bar3(void) { int m = 30; return this->m;  };      // this->m refers to 
>> the member variable
>> foo (int val) { m = val; };      // m refers to the member variable
>> int m;      // Member variable with instance scope, accessible to the whole 
>> structure/class
>> };
>> 
>> int main ()
>> {
>> struct foo f(10);
>> __builtin_printf (" bar1 is %d \n", f.bar1());
>> __builtin_printf (" bar2 is %d \n", f.bar2());
>> __builtin_printf (" bar3 is %d \n", f.bar3());
>> return 0;
>> }
>> [opc@qinzhao~]$ g++ t2.C; ./a.out
>> bar1 is 10   bar2 is 20   bar3 is 10  
>> 
>> Explanation: The member variable "m" is declared inside the structure "foo" 
>> but 
>> outside any member function of "foo", it has instance scope. This variable 
>> is 
>> visible to all the member functions of the structure "foo". when there is a 
>> name 
>> conflict with a local variable inside a member function, for example, 
>> "bar2”, 
>> the local variable has higher precedence. When trying to explicitly refer to 
>> the 
>> member variable in the member function, adding the C++ "this" pointer before 
>> it, for example, "bar3”.    
>> 
>> NOTE 2: the {} brace that marks the boundary of a structure/class changes the
>> variable scope to "instance scope" in C++.  
>> 
>> B. Static Member Scope
>> 
>> The static member scope refers to variables declared with the static keyword 
>> within the class/structure. These variables can be accessed using the class 
>> name without creating the instance.
>> 
>> [opc@qinzhao~]$ cat t3.C
>> struct foo {
>> static int m; // Static member variable with static member scope,
>> // accessible in whole structure/class
>> };
>> int foo::m = 10;
>> int main ()
>> {
>> __builtin_printf (" foo::m is %d\n", foo::m);
>> return 0;
>> }
>> [opc@qinzhao~]$ g++ t3.C; ./a.out
>> foo::m is 10
>> 
>> NOTE 3: static member in structure is not available in C.   
>> 
>> C. Namespace Scope
>> 
>> A namespace in C++ is a container that allows users to create a separate 
>> scope 
>> where the given variables are defined. It is used to avoid name conflicts 
>> and group 
>> related code together. These variables can be accessed using their namespace 
>> name and scope resolution operator.
>> 
>> [opc@qinzhao~]$ cat t4.C
>> namespace foo {
>> int m = 10; // Namespace scope variable
>> };
>> int main ()
>> {
>> __builtin_printf (" foo::m is %d\n", foo::m);
>> return 0;
>> }
>> [opc@qinzhao~]$ g++ t4.C; ./a.out
>> foo::m is 10
>> 
>> NOTE 4: namespaces are not available in C language.  
>> 
>> 3. A simple summary comparing C to C++
>> 
>> A. there are only two variable scopes in C:
>> 
>> global scope
>> local scope
>> 
>> all the other 3 variant variable scopes in C++,i.e., instance scope (member 
>> scope), 
>> static member scope, namespace scope,  are not available in C.  
>> 
>> Since there is no static member and namespace in C language, accessing to 
>> static 
>> member variables of a structure or variables declared in another namespace 
>> is 
>> not needed in C at all. 
>> 
>> NOTE 5: However, accessing the member of a structure inside the structure is 
>> needed for the purpose of counted_by extension in C.  
>> 
>> B. the {} brace that represents the boundary of the structure does not 
>> change the 
>> scope of the variable in C since C doesn't have instance scope (i.e.,member 
>> scope);
>> 
>> The following examples can show these limitation in C language.
>> 
>> C currently support variable length array (VLA), whose array size could be a 
>> variable expression.  VLA is only supported in local scopes in C.
>> 
>> [opc@qinzhao~]$ cat t5.c
>> void boo (int k)
>> {
>> const int n = 10;
>> struct foo {
>>   int m;
>>   int a[n + k];
>> };
>> }
>> [opc@qinzhao~]$ gcc t5.c -S
>> 
>> Explanation: This is good. The {} brace that marks the boundary of the 
>> structure "foo” 
>> does NOT change the scope of the variable n and k, their definitions reach 
>> the 
>> declaration of the array member field a[n + k].
>> 
>> However, when changing the testing case as:
>> [opc@qinzhao~]$ cat t6.c
>> void boo (int k)
>> {
>> const int n = 10;
>> struct foo {
>>   int m;
>>   int a[n + m];
>> };
>> }
>> [opc@qinzhao~]$ gcc t6.c -S
>> t6.c: In function ‘boo’:
>> t6.c:6:15: error: ‘m’ undeclared (first use in this function)
>>   6 |     int a[n + m];
>>     |               ^
>> 
>> Explanation: C does not have the concept of instance scope (member scope), 
>> there is no syntax provided to access the instance scope (member scope) 
>> variables inside the structures. Therefore, the reference to the member 
>> variable 
>> "m" inside the declaration of the array member field a[n + m] is not visible.
>> 
>> 4. What's the possible approaches for the counted_by attribute as a C 
>> extension.
>> 
>> The major thing for this extension is:  
>> Adding a new language feature in C to access the member variables inside a 
>> structure.
>> 
>> Based on the previous comparison between C and C++, there are two possible 
>> approaches:
>> 
>> A. Add a new variable scope: instance scope (member scope) into C  
>> 
>> The definition of the new instance scope of C is:
>> 
>> The instance scope, also called member scope, refers to the region inside a 
>> structure. 
>> The variables, i.e, the members, declared here are accessible to the whole 
>> structure. 
>> They can be accessed by the object (i.e., the instance) of the structure. 
>> 
>> The {} brace that marks the boundary of a structure will change the variable 
>> scope 
>> to "instance scope"; a variable name confliction between other scopes 
>> (including global/local) and instance scope will give precedence to instance 
>> scope.  
>> 
>> The compiler's implementation on this approach could be:
>> ** a new variable scope, "instance scope" is added into C FE;
>> ** the "instance scope" has the higher precedence than the current 
>> global/local scope;
>> ** the {} brace for the boundary of a structure is the boundary for the 
>> "instance scope";
>> ** a member variable that is referenced inside this structure could be 
>> treated as this->member.   
>> ** reference to a global variable inside the structure need a new syntax.  
>> 
>> B. Add a new syntax to access instance scope (member scope) variable within
>>    the structure while keeping C's default scoping rules.
>> 
>> The {} brace that marks the boundary of a structure will NOT change the 
>> variable 
>> scope. There are still only two variable scoping, global and local.  
>> 
>> In order to explicitly access a member inside a structure, a new syntax need 
>> to 
>> be added.  This new syntax could reuse the current designator syntax in C 
>> (prefixing the member variable with "."), or adding a new keyword similar as 
>> "this”, 
>> such as, "__self", and prefixing the member variable with “__self."  
>> 
>> With the above approach A, we can keep the current syntax for counted_by;
>> but not sure how easy to extend it for simple expression and nested 
>> structure.
>> 
>> However, the major problem with this approach is: it changes the default 
>> scoping 
>> rule in C languages. this additional variable scoping will break existing 
>> legal C code:
>> 
>> [opc@qinzhao~]$ cat t7.c
>> void boo (int k)
>> {
>> const int n = 10; // a local variable n
>> struct foo {
>>   int n;     // a member variable n
>>   int a[n + 10];  // currently, this n refers to the local variable n.
>> };
>> }
>> 
>> When we take the approach A, within the structure "foo", the VLA a[n+10] 
>> will refer to the member variable n, but not the local variable n anymore. 
>> The existing code with VLA might work incorrectly.
>> 
>> You can argue to only add the new variable scope for counted_by attribute,
>> not for VLA, then how to handle the following case:
>> 
>> [opc@qinzhao~]$ cat t8.c
>> void boo (int k)
>> {
>> const int n = 10; // a local variable n
>> struct foo {
>>   int n;     // a member variable n
>>   int a[n + 10];  // for VLA, this n refers to the local variable n.
>>   char *b __attribute__ ((counted_by(n + 10)))        
>>     // for counted_by, this n refers to the member variable n.
>> };
>> }
>> 
>> This will be a disaster.  
>> 
>> So, I think that the approach A is not the right direction for a C extension.
>> 
>> With the above approach B, a new syntax need to be implemented, 
>> and all the previous source code change in the application need to be 
>> modified.
>> 
>> But I still think that approach B is the right direction to go.  
>> (Please refer to:
>> ******Scope of variables in C++
>> https://www.geeksforgeeks.org/scope-of-variables-in-c/
>> ******Scope of variables in C
>> https://www.geeksforgeeks.org/scope-rules-in-c/)
>> 
>> 
>> Appendix B: An example in linux kernel that the global cannot be "const" 
>> qualified
>> 
>> In linux kernel, the globals that will be referred inside counted_by 
>> attribute don’t 
>> change value, but they cannot be marked "const" since they are initialized 
>> during 
>> very early kernel boot.
>> 
>> they _become_ architecturally read-only. i.e. they are in a memory region 
>> that 
>> is flipped to read-only after boot is finished.
>> 
>> 
>> 
>> 
>> 
>> 
> 

Reply via email to