Hi, Henrik, John, 

Sorry for my late reply to the thread. 

Before I wrote this proposal, I spent quite some time try to understand why 
people
from C community and C++ community have so different views on the current syntax
of counted_by. 

Then I spent some time to study and compare the variable scoping rules in C and 
C++.
And come up with a small writeup which is currently put into the proposal as:

Appendix A: Scope of variables in C and C++
      --The hints to the design of counted_by in C

I am not sure whether you have read through my proposal and specially this 
Appendix A?
If not, could you please read the Appendix A, and let me know any comments and
Suggestion on it?  (I am not a language expert, so if I made any mistake in the 
writeup, please let me know)

For your convenience, I am copying the Appendix A below:

Thanks a lot!

Qing

==============================

Appendix A: Scope of variables in C and C++  
--The hints to the design of counted_by in C

Scope of a variable defines the region of the code in which this variable can 
be accessed and modified.  

1. What's common on the scope of variables between C and C++?

**First, there are mainly two types of variable scopes:  

A. Global Scope
The global scope refers to the region outside any function or block. The 
variables declared here are accessible throughout the entire program 
and are called Global Variables.

B. Local Scope
The local scope refers to the region enclosed between the { } braces, 
which represent the boundary of a function or a block inside functions. 
The variables declared within a function or a block are only accessible 
locally inside that function or that block and other blocks nested inside.  

NOTE 1: the {} brace that mark the boundary of a structure/class does 
not change whether the current scope is global or local.

**Second, if two variables with same name are defined in different scopes, 
one in local scope and the other in global scope, the precedence is given 
to the local variable:

[opc@qinzhao~]$ cat t1.c
// Global variable
int a = 5;
int main() {
 // Local variable with same name as that of
 // global variable
 int a = 100;
 // Accessing a
 __builtin_printf ("a is %d\n", a);    return 0;
}
[opc@qinzhao~]$ gcc t1.c; ./a.out
a is 100
[opc@qinzhao~]$ g++ t1.c; ./a.out
a is 100


2. What's different on the scope of variables between C and C++?

C++ has 3 additional variations of scopes:

A. Instance Scope (member scope):

The instance scope, also called member scope, refers to the region inside 
a class/structure but outside any member function of the class/structure. 
The variables, i.e, the data members, declared here are accessible to the 
whole class/structure. They can be accessed by the object (i.e., the instance) 
of the class/structure.   

[opc@qinzhao~]$ cat t2.C
struct foo {
 int bar1(void) { return m;  };      // m refers to the member variable
 int bar2(void) { int m = 20; return m;  };      // return m refers to the 
local variable m = 20
 int bar3(void) { int m = 30; return this->m;  };      // this->m refers to the 
member variable
 foo (int val) { m = val; };      // m refers to the member variable
 int m;      // Member variable with instance scope, accessible to the whole 
structure/class
};

int main ()
{
 struct foo f(10);
 __builtin_printf (" bar1 is %d \n", f.bar1());
 __builtin_printf (" bar2 is %d \n", f.bar2());
 __builtin_printf (" bar3 is %d \n", f.bar3());
 return 0;
}
[opc@qinzhao~]$ g++ t2.C; ./a.out
bar1 is 10   bar2 is 20   bar3 is 10  

Explanation: The member variable "m" is declared inside the structure "foo" but 
outside any member function of "foo", it has instance scope. This variable is 
visible to all the member functions of the structure "foo". when there is a 
name 
conflict with a local variable inside a member function, for example, "bar2”, 
the local variable has higher precedence. When trying to explicitly refer to 
the 
member variable in the member function, adding the C++ "this" pointer before 
it, for example, "bar3”.    

NOTE 2: the {} brace that marks the boundary of a structure/class changes the
variable scope to "instance scope" in C++.  

B. Static Member Scope

The static member scope refers to variables declared with the static keyword 
within the class/structure. These variables can be accessed using the class 
name without creating the instance.

[opc@qinzhao~]$ cat t3.C
struct foo {
 static int m; // Static member variable with static member scope,
// accessible in whole structure/class
};
int foo::m = 10;
int main ()
{
 __builtin_printf (" foo::m is %d\n", foo::m);
 return 0;
}
[opc@qinzhao~]$ g++ t3.C; ./a.out
foo::m is 10

NOTE 3: static member in structure is not available in C.   

C. Namespace Scope

A namespace in C++ is a container that allows users to create a separate scope 
where the given variables are defined. It is used to avoid name conflicts and 
group 
related code together. These variables can be accessed using their namespace 
name and scope resolution operator.

[opc@qinzhao~]$ cat t4.C
namespace foo {
 int m = 10; // Namespace scope variable
};
int main ()
{
 __builtin_printf (" foo::m is %d\n", foo::m);
 return 0;
}
[opc@qinzhao~]$ g++ t4.C; ./a.out
foo::m is 10

NOTE 4: namespaces are not available in C language.  

3. A simple summary comparing C to C++

A. there are only two variable scopes in C:

global scope
local scope

all the other 3 variant variable scopes in C++,i.e., instance scope (member 
scope), 
static member scope, namespace scope,  are not available in C.  

Since there is no static member and namespace in C language, accessing to 
static 
member variables of a structure or variables declared in another namespace is 
not needed in C at all. 

NOTE 5: However, accessing the member of a structure inside the structure is 
needed for the purpose of counted_by extension in C.  

B. the {} brace that represents the boundary of the structure does not change 
the 
scope of the variable in C since C doesn't have instance scope (i.e.,member 
scope);

The following examples can show these limitation in C language.

C currently support variable length array (VLA), whose array size could be a 
variable expression.  VLA is only supported in local scopes in C.

[opc@qinzhao~]$ cat t5.c
void boo (int k)
{
 const int n = 10;
 struct foo {
   int m;
   int a[n + k];
 };
}
[opc@qinzhao~]$ gcc t5.c -S

Explanation: This is good. The {} brace that marks the boundary of the 
structure "foo” 
does NOT change the scope of the variable n and k, their definitions reach the 
declaration of the array member field a[n + k].

However, when changing the testing case as:
[opc@qinzhao~]$ cat t6.c
void boo (int k)
{
 const int n = 10;
 struct foo {
   int m;
   int a[n + m];
 };
}
[opc@qinzhao~]$ gcc t6.c -S
t6.c: In function ‘boo’:
t6.c:6:15: error: ‘m’ undeclared (first use in this function)
   6 |     int a[n + m];
     |               ^

Explanation: C does not have the concept of instance scope (member scope), 
there is no syntax provided to access the instance scope (member scope) 
variables inside the structures. Therefore, the reference to the member 
variable 
"m" inside the declaration of the array member field a[n + m] is not visible.

4. What's the possible approaches for the counted_by attribute as a C extension.

The major thing for this extension is:  
Adding a new language feature in C to access the member variables inside a 
structure.

Based on the previous comparison between C and C++, there are two possible 
approaches:

A. Add a new variable scope: instance scope (member scope) into C  

The definition of the new instance scope of C is:

The instance scope, also called member scope, refers to the region inside a 
structure. 
The variables, i.e, the members, declared here are accessible to the whole 
structure. 
They can be accessed by the object (i.e., the instance) of the structure. 

The {} brace that marks the boundary of a structure will change the variable 
scope 
to "instance scope"; a variable name confliction between other scopes 
(including global/local) and instance scope will give precedence to instance 
scope.  

The compiler's implementation on this approach could be:
 ** a new variable scope, "instance scope" is added into C FE;
 ** the "instance scope" has the higher precedence than the current 
global/local scope;
 ** the {} brace for the boundary of a structure is the boundary for the 
"instance scope";
 ** a member variable that is referenced inside this structure could be treated 
as this->member.   
 ** reference to a global variable inside the structure need a new syntax.  

B. Add a new syntax to access instance scope (member scope) variable within
    the structure while keeping C's default scoping rules.

The {} brace that marks the boundary of a structure will NOT change the 
variable 
scope. There are still only two variable scoping, global and local.  

In order to explicitly access a member inside a structure, a new syntax need to 
be added.  This new syntax could reuse the current designator syntax in C 
(prefixing the member variable with "."), or adding a new keyword similar as 
"this”, 
such as, "__self", and prefixing the member variable with “__self."  

With the above approach A, we can keep the current syntax for counted_by;
but not sure how easy to extend it for simple expression and nested structure.

However, the major problem with this approach is: it changes the default 
scoping 
rule in C languages. this additional variable scoping will break existing legal 
C code:

[opc@qinzhao~]$ cat t7.c
void boo (int k)
{
 const int n = 10; // a local variable n
 struct foo {
   int n;     // a member variable n
   int a[n + 10];  // currently, this n refers to the local variable n.
 };
}

When we take the approach A, within the structure "foo", the VLA a[n+10] 
will refer to the member variable n, but not the local variable n anymore. 
The existing code with VLA might work incorrectly.

You can argue to only add the new variable scope for counted_by attribute,
not for VLA, then how to handle the following case:

[opc@qinzhao~]$ cat t8.c
void boo (int k)
{
 const int n = 10; // a local variable n
 struct foo {
   int n;     // a member variable n
   int a[n + 10];  // for VLA, this n refers to the local variable n.
   char *b __attribute__ ((counted_by(n + 10)))        
     // for counted_by, this n refers to the member variable n.
 };
}

This will be a disaster.  

So, I think that the approach A is not the right direction for a C extension.

With the above approach B, a new syntax need to be implemented, 
and all the previous source code change in the application need to be modified.

But I still think that approach B is the right direction to go.  
(Please refer to:
******Scope of variables in C++
https://www.geeksforgeeks.org/scope-of-variables-in-c/
******Scope of variables in C
https://www.geeksforgeeks.org/scope-rules-in-c/)


> On Mar 11, 2025, at 02:33, Henrik Olsson <h_ols...@apple.com> wrote:
> 
> 
> 
>> On Mar 10, 2025, at 11:04 PM, Martin Uecker <uec...@tugraz.at> wrote:
>> 
>> Am Montag, dem 10.03.2025 um 19:30 -0400 schrieb John McCall:
>>> On 10 Mar 2025, at 18:30, Martin Uecker wrote:
>>>> Am Montag, dem 10.03.2025 um 16:45 -0400 schrieb John McCall:
>>>>>> 
>> 
>> ..
>> 
>>>> 
>>>>>> 
>>>>>>>> While the next example is also ok in C++.
>>>>>>>> 
>>>>>>>> constexpr int n = 2;
>>>>>>>> 
>>>>>>>> struct foo {
>>>>>>>>  char buf[n];
>>>>>>>> };
>>>>>>>> 
>>>>>>>> With both declarations of 'n' the example has UB in C++. 
>>>>>>>> So I am not convinced the proposed rules make a lot
>>>>>>>> of sense for C++ either.
>>>>>> 
>>>>>> If C required a diagnostic in your first example, it would actually
>>>>>> put a fair amount of pressure on the C++ committee to get rid of
>>>>>> this spurious UB rule.
>>>> 
>>>> Why would C want a diagnostic here?
>>> 
>>> When I said “your first example”, Martin, I did actually mean your
>>> first example:
>> 
>> Sorry, I meant "there". No reason to be condescending though.
>> 
>>> But I think it’s clear that you and I just differ on some basic design
>>> philosophy, so let’s just end the conversation here.
>> 
>> I think the issue that if one does not agree with the
>> design decisions made previously for the name lookup rules
>> in the C and C++ languages and wants to change those (or
>> adding new inconsistent ones), then this is not simply
>> a question of language design preferences.
>> 
>> Martin
>> 
>>> 
>>> John.
>>> 
>>>>>>>> I still think one could use designator syntax, i.e. '.n', which
>>>>>>>> would be clearer and intuitive for both C and C++ programmers.
>>>>>> 
>>>>>> This doesn’t really solve the ambiguity problem. If n is a field name,
>>>>>> a programmer who writes __counted_by(n) almost certainly means to name
>>>>>> the field. “The proper syntax is .n” is the cause of the bug, not its
>>>>>> solution.
>>>> 
>>>> Field names in C are in a different namespace. So far, when you write
>>>> 'n' this *never* refers to field member in any context.  And I have
>>>> never seen anybody request a warning for the examples above. So, no, 
>>>> "a programmer almost certainly means this" can not possible be true.
>>>> 
>>>> Martin
>> 
> I won't speak for John, but the way I see it you can optimise for 
> theoretically consistent minimalist semantics, or strive to optimise each 
> feature for its most common use cases. That's a difference in design 
> philosophy.
> Although I don't think these ambiguities matter in practice (as long as there 
> are workarounds) since their occurrence will be vanishingly rare, the fact 
> that 'n' currently means one thing doesn't mean that it should have to mean 
> the same thing in another context with different design goals, especially 
> when that is a completely new context. The context of "__counted_by(...)" 
> surrounding the expression should be enough to infer what is meant. The fact 
> that code could be structured in a way to mislead the reader is nothing new 
> to C, and when the context where that even could be a concern is largely 
> hypothetical I don't think eliminating it should be a priority.
> 
> Henrik


Reply via email to