RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron
Sorry for top-posting, my work account is stuck on Outlook. :-/

> For a WG14 paper you should add these findings to support that choice.
> Another option would be for WG14 to standardize the then existing 
> implementation with the double underscores.

+1, it's always good to explain prior art and existing uses as part of the 
paper. However, please also point out that C++ has a prior art as well which is 
slightly different and very much worth considering: they have one API for 
getting the array's rank, and another for getting a specific rank's extent. 
This is a general solution that doesn't require the programmer to have deep 
knowledge of C's declarator syntax and how it relates to multidimensional 
arrays.

That said, I suspect WG14 would not be keen on standardizing `lengthof` without 
an ugly keyword given that there are plenty of other uses of it that would 
break: 

https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src/cmd/mailx/names.c?L53-55
https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_fw.c?L292-294
https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/blob/src/spur64.stack/validImage.c?L7014-7018
(and many, many others)

>> > As for the parentheses, I personally think lengthof should follow 
>> > similar rules compared to sizeof.
>> 
>> I think most people agree with this.
>
> I still don't, in particular not for standardisation.
> 
> We have to remember that there are many small C compilers out there.

Those compilers already have to handle parsing this for sizeof, so that's not 
particularly compelling (even if we wanted to design C for the lowest common 
denominator of implementation effort, which I'm not convinced is a good 
approach these days). That said, if we went with a rank/extent design, I think 
we'd *have* to use parens because the extent interface would take two operands 
(the array and the rank you're interested in getting the extent of) and it 
would be inconsistent for the rank interface to then not require parens.

~Aaron

-Original Message-
From: Jens Gustedt  
Sent: Wednesday, August 14, 2024 2:11 AM
To: Alejandro Colomar ; Xavier Del Campo Romero 

Cc: Gcc Patches ; Daniel Plakosh ; 
Martin Uecker ; Joseph Myers ; Gabriel 
Ravier ; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko ; Ballman, Aaron 

Subject: Re: v2.1 Draft for a lengthof paper

Am 14. August 2024 01:27:33 MESZ schrieb Alejandro Colomar :
> Hi Xavier,
> 
> On Wed, Aug 14, 2024 at 12:38:53AM GMT, Xavier Del Campo Romero wrote:
> > I have been overseeing these last emails -
> 
> Ahhh, good to know; thanks!  :)
> 
> > thank you very much for your
> > efforts, Alex!
> 
> :-)
> 
> > I did not reply until now because I do not have prior experience 
> > with gcc internals, so my feedback would probably have not been that 
> > useful.
> 
> Ok.
> 
> > Those emails from 2020 were in fact discussing two completely 
> > different proposals at once:
> > 
> > 1. Add _Lengthof + #include  2. Allow static 
> > qualifier on compound literals
> 
> Yup.
> 
> > Whereas proposal #2 made it into C23 (kudos to Jens Gustedt!), and 
> > as you already know by now, proposal #1 received some negative 
> > feedback, suggesting _Typeof/typeof + some macro magic as a 
> > pragmatic workaround instead.
> 
> The original author of that negative feedback talked to me in private 
> a week ago, and said he likes my proposal.  We have no negative 
> feedback anymore.  :)
> 
> > Since the proposal did not get much traction and I would had been 
> > unable to contribute to gcc myself, I just gave up on it. IIRC the 
> > deadline for new proposals closed soon after, anyway.
> 
> Ok.
> 
> > But I am glad that someone with proper experience took the initiative.
> 
> Fun fact: this is my second non-trivial patch to GCC.  I wouldn't say 
> I had the proper experience with GCC internals when I started this 
> patch set.  But I'm unemployed at the moment, which gives me all the 
> time I need for learning those.  :)
> 
> > I still think the proposal is relevant and has interesting use cases.
> > 
> > > I have only added lengthof for now, not _Lengthof, as suggested by Jens.
> > > Depending on feedback, I'll propose the uglified version.
> > 
> > Probably, all of us know why the uglified version is the usual 
> > approach preferred by the C standard: we do not know how many 
> > applications would break otherwise.
> 
> Yup.
> 
> > However, we see that this trend is now changing with C23, so 
> > probably it makes sense to d

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron
> I think that this argument goes too short. E. g. implementation that already 
> have compound expressions (or lambdas ;-) may provide a > quality 
> implementation using `static_assert` and `typeof` alone, and don't have to 
> touch their compiler at all.
>
> We should not impose an implementation in the language where doing it in a 
> header can be completely sufficient.

But can doing this in a header be completely sufficient in practice? e.g., the 
user who passes a pointer rather than an array is in for quite a surprise, or 
passing a struct, or passing a FAM, etc. If we want to put constraints on the 
interface, that may be more challenging to do from a header file than from the 
compiler. offsetof is a cautionary tale in that compilers that want a 
reasonable QoI basically all implement this as a builtin rather than the 
header-only version.

> Plus, implementing as a macro in a header (probably ) makes also a 
> feature test, for those applications that already have something similar. 
> this was basically what we did for `unreachable` and I think it worked out 
> fine.

True!

I'm still thinking on how important rank + extent is vs overall array length. 
If C had constexpr functions, then I'd almost certainly want array rank and 
extent to be the building blocks and then lengthof can be a constexpr function 
looping over rank and summing extents. But we don't have that yet, and "bird 
hand" vs "bird in bush"... :-D

~Aaron

-Original Message-----
From: Jens Gustedt  
Sent: Wednesday, August 14, 2024 8:18 AM
To: Ballman, Aaron ; Alejandro Colomar 
; Xavier Del Campo Romero 
Cc: Gcc Patches ; Daniel Plakosh ; 
Martin Uecker ; Joseph Myers ; Gabriel 
Ravier ; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko 
Subject: RE: v2.1 Draft for a lengthof paper

Hi Aaron,

Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
:
> Sorry for top-posting, my work account is stuck on Outlook. :-/
> 
> > For a WG14 paper you should add these findings to support that choice.
> > Another option would be for WG14 to standardize the then existing 
> > implementation with the double underscores.
> 
> +1, it's always good to explain prior art and existing uses as part of the 
> paper. However, please also point out that C++ has a prior art as well which 
> is slightly different and very much worth considering: they have one API for 
> getting the array's rank, and another for getting a specific rank's extent. 
> This is a general solution that doesn't require the programmer to have deep 
> knowledge of C's declarator syntax and how it relates to multidimensional 
> arrays.
> 
> That said, I suspect WG14 would not be keen on standardizing `lengthof` 
> without an ugly keyword given that there are plenty of other uses of it that 
> would break: 
> 
> https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/src
> /cmd/mailx/names.c?L53-55
> https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod_f
> w.c?L292-294
> https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/bl
> ob/src/spur64.stack/validImage.c?L7014-7018
> (and many, many others)
> 
> >> > As for the parentheses, I personally think lengthof should follow 
> >> > similar rules compared to sizeof.
> >> 
> >> I think most people agree with this.
> >
> > I still don't, in particular not for standardisation.
> > 
> > We have to remember that there are many small C compilers out there.
> 
> Those compilers already have to handle parsing this for sizeof, so that's not 
> particularly compelling (even if we wanted to design C for the lowest common 
> denominator of implementation effort, which I'm not convinced is a good 
> approach these days). That said, if we went with a rank/extent design, I 
> think we'd *have* to use parens because the extent interface would take two 
> operands (the array and the rank you're interested in getting the extent of) 
> and it would be inconsistent for the rank interface to then not require 
> parens.

I think that this argument goes too short. E. g. implementation that already 
have compound expressions (or lambdas ;-) may provide a quality implementation 
using `static_assert` and `typeof` alone, and don't have to touch their 
compiler at all.

We should not impose an implementation in the language where doing it in a 
header can be completely sufficient.

Plus, implementing as a macro in a header (probably ) makes also a 
feature test, for those applications that already have something similar. 
this was basically what we did for `unreachable` and I think it worked out fine.

Jens

> ~A

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron
> What regex did you use for searching?

I went cheap and easy rather than trying to narrow down:
https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patternType=regexp&sm=0

> I was thinking of renaming the proposal to elementsof(), to avoid confusion 
> between length of an array and length of a string.  Would you mind checking 
> if elementsof() is ok?

From what I was seeing, it looks to be used more uniformly as a function-like 
macro accepting a single argument.

~Aaron

-Original Message-
From: Alejandro Colomar  
Sent: Wednesday, August 14, 2024 8:58 AM
To: Jens Gustedt ; Ballman, Aaron 

Cc: Xavier Del Campo Romero ; Gcc Patches 
; Daniel Plakosh ; Martin Uecker 
; Joseph Myers ; Gabriel Ravier 
; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko 
Subject: Re: v2.1 Draft for a lengthof paper

Hi Aaron, Jens,

On Wed, Aug 14, 2024 at 02:17:52PM GMT, Jens Gustedt wrote:
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part
> > of the paper. However, please also point out that C++ has a prior 
> > art as well which is slightly different and very much worth
> > considering: they have one API for getting the array's rank, and 
> > another for getting a specific rank's extent. This is a general 
> > solution that doesn't require the programmer to have deep knowledge 
> > of C's declarator syntax and how it relates to multidimensional 
> > arrays.

I have added that to my draft.  I'll publish it soon as a reply to the GCC 
mailing list.  See below for details of what I have added for now.

> > 
> > That said, I suspect WG14 would not be keen on standardizing 
> > `lengthof` without an ugly keyword given that there are plenty of other 
> > uses of it that would break:
> > 
> > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/usr/s
> > rc/cmd/mailx/names.c?L53-55
> > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/ipod
> > _fw.c?L292-294
> > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-vm/-/
> > blob/src/spur64.stack/validImage.c?L7014-7018
> > (and many, many others)

What regex did you use for searching?

I was thinking of renaming the proposal to elementsof(), to avoid confusion 
between length of an array and length of a string.  Would you mind checking if 
elementsof() is ok?

> > >> > As for the parentheses, I personally think lengthof should 
> > >> > follow similar rules compared to sizeof.
> > >> 
> > >> I think most people agree with this.
> > >
> > > I still don't, in particular not for standardisation.
> > > 
> > > We have to remember that there are many small C compilers out there.
> > 
> > Those compilers already have to handle parsing this for sizeof, so 
> > that's not particularly compelling

Agree.  I suspect it will be simpler for existing compilers to follow sizeof 
than to have new syntax.  However, it's easy to keep it as a QoI detail, so 
I've temporarily changed the wording to require parentheses, and let 
implementations lift that restriction.

> > (even if we wanted to design C
> > for the lowest common denominator of implementation effort, which 
> > I'm not convinced is a good approach these days).

Off-topic, but I wish that had been the approach when a few implementations (I 
suspect proprietary vendors; this was never
disclosed) rejected redefining NULL as the right thing: (void *) 0.

I fixed one of the last free-software implementations of NULL that expanded to 
0, and nullptr would probably never have been added if WG14 had not accepted 
the pressure from such horrible implementations.

<https://github.com/cc65/cc65/issues/1823>

> > That said, if we went with a rank/extent design, I think we'd *have* 
> > to use parens because the extent interface would take two operands 
> > (the array and the rank you're interested in getting the extent of) 
> > and it would be inconsistent for the rank interface to then not 
> > require parens.

   Prior art
 C
It is common in C programs to get the number of elements of
an array via the usual sizeof division and  wrap  it  in  a
macro.  Common names include:

β€’  

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron
> I am currently on a summer bike trip, so not able to provide a full reference 
> implantation. But could do so, once I am back.

No need (after thinking on this a bit more, I believe you're right that this 
can be done in a macro-only implementation; we might not go that route in Clang 
because of AST matching needs and whatnot, but that's not an issue), but thank 
you for the offer. Please enjoy your summer bike trip! 😊

> Why would you be looping? lengthof only addresses the outer dimension sizeof 
> would need a loop, no ?

Due to poor reading comprehension, I missed in the paper that lengthof works on 
the outer dimension. πŸ˜‰ I think having a way to get the flattened size of a 
multidimensional array is a useful feature.

~Aaron

-Original Message-
From: Jens Gustedt  
Sent: Wednesday, August 14, 2024 9:25 AM
To: Ballman, Aaron ; Alejandro Colomar 
; Xavier Del Campo Romero 
Cc: Gcc Patches ; Daniel Plakosh ; 
Martin Uecker ; Joseph Myers ; Gabriel 
Ravier ; Jakub Jelinek ; Kees Cook 
; Qing Zhao ; David Brown 
; Florian Weimer ; Andreas Schwab 
; Timm Baeder ; A. Jiang 
; Eugene Zelenko 
Subject: RE: v2.1 Draft for a lengthof paper

Am 14. August 2024 14:40:41 MESZ schrieb "Ballman, Aaron" 
:
> > I think that this argument goes too short. E. g. implementation that 
> > already have compound expressions (or lambdas ;-) may provide a > quality 
> > implementation using `static_assert` and `typeof` alone, and don't have to 
> > touch their compiler at all.
> >
> > We should not impose an implementation in the language where doing it in a 
> > header can be completely sufficient.
> 
> But can doing this in a header be completely sufficient in practice? 

Ithindso.

> e.g., the user who passes a pointer rather than an array is in for quite a 
> surprise, or passing a struct, or passing a FAM, etc. If we want to put 
> constraints on the interface, that may be more challenging to do from a 
> header file than from the compiler. offsetof is a cautionary tale in that 
> compilers that want a reasonable QoI basically all implement this as a 
> builtin rather than the header-only version.

Yes,  with the tools that I listed and the ideas that are already in the paper 
you can basically do all that, including given valuable feedback in case of 
failure. 

I am currently on a summer bike trip, so not able to provide a full reference 
implantation. But could do so, once I am back. 


> > Plus, implementing as a macro in a header (probably ) makes also 
> > a feature test, for those applications that already have something similar. 
> > this was basically what we did for `unreachable` and I think it worked out 
> > fine.
> 
> True!
> 
> I'm still thinking on how important rank + extent is vs overall array 
> length. If C had constexpr functions, then I'd almost certainly want 
> array rank and extent to be the building blocks and then lengthof can 
> be a constexpr function looping over rank and summing extents. But we 
> don't have that yet, and "bird hand" vs "bird in bush"... :-D

Why would you be looping? lengthof only addresses the outer dimension sizeof 
would need a loop, no ?

Generally I would be opposed to imposing a complicated solution for a simple 
feature

Jens

> 
> ~Aaron
> 
> -Original Message-
> From: Jens Gustedt 
> Sent: Wednesday, August 14, 2024 8:18 AM
> To: Ballman, Aaron ; Alejandro Colomar 
> ; Xavier Del Campo Romero 
> Cc: Gcc Patches ; Daniel Plakosh 
> ; Martin Uecker ; Joseph Myers 
> ; Gabriel Ravier ; Jakub 
> Jelinek ; Kees Cook ; Qing 
> Zhao ; David Brown ; 
> Florian Weimer ; Andreas Schwab 
> ; Timm Baeder ; A. Jiang 
> ; Eugene Zelenko 
> Subject: RE: v2.1 Draft for a lengthof paper
> 
> Hi Aaron,
> 
> Am 14. August 2024 13:31:19 MESZ schrieb "Ballman, Aaron" 
> :
> > Sorry for top-posting, my work account is stuck on Outlook. :-/
> > 
> > > For a WG14 paper you should add these findings to support that choice.
> > > Another option would be for WG14 to standardize the then existing 
> > > implementation with the double underscores.
> > 
> > +1, it's always good to explain prior art and existing uses as part of the 
> > paper. However, please also point out that C++ has a prior art as well 
> > which is slightly different and very much worth considering: they have one 
> > API for getting the array's rank, and another for getting a specific rank's 
> > extent. This is a general solution that doesn't require the programmer to 
> > have deep knowledge of C's declarator syntax and how it relates to 
> > multidimensional arrays.
> > 
> > That said, I suspect WG14 would no

RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron
> Ahh, context:global seems to be what I wanted.  Where is that documented?

For me it is the default when I go to https://sourcegraph.com/search but 
there's documentation at 
https://sourcegraph.com/docs/code-search/working/search_contexts

> Thanks!  I'll rename it to elementsof().

Rather than renaming it, I'd say that the name chosen in the proposed text is a 
placeholder, and have a section in the prose that describes different naming 
choices, pros and cons, suggests a name from you as the author, but asks WG14 
to pick the final name. I know Jens mentioned he doesn’t like the name 
`elementsof` and I suspect if we ask five more people we'll get about seven 
more opinions on what the name could/should be. 😝

~Aaron

-Original Message-
From: Alejandro Colomar  
Sent: Wednesday, August 14, 2024 10:00 AM
To: Ballman, Aaron 
Cc: Jens Gustedt ; Xavier Del Campo Romero 
; Gcc Patches ; Daniel Plakosh 
; Martin Uecker ; Joseph Myers 
; Gabriel Ravier ; Jakub Jelinek 
; Kees Cook ; Qing Zhao 
; David Brown ; Florian Weimer 
; Andreas Schwab ; Timm Baeder 
; A. Jiang ; Eugene Zelenko 

Subject: Re: v2.1 Draft for a lengthof paper

Hi Aaron,

On Wed, Aug 14, 2024 at 01:21:18PM GMT, Ballman, Aaron wrote:
> > What regex did you use for searching?
> 
> I went cheap and easy rather than trying to narrow down:
> https://sourcegraph.com/search?q=context:global+lang:C+lengthof&patter
> nType=regexp&sm=0

Ahh, context:global seems to be what I wanted.  Where is that documented?

> > I was thinking of renaming the proposal to elementsof(), to avoid confusion 
> > between length of an array and length of a string.  Would you mind checking 
> > if elementsof() is ok?
> 
> From what I was seeing, it looks to be used more uniformly as a 
> function-like macro accepting a single argument.

Thanks!  I'll rename it to elementsof().

Cheers,
Alex

> ~Aaron

--
<https://www.alejandro-colomar.es/>


RE: v2.1 Draft for a lengthof paper

2024-08-14 Thread Ballman, Aaron
> I would love to see a proposal for adding this GNU extension to ISO C.
> Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a 
> longish time for that; if anyone else does it, it would be
great.)

It's been discussed but hasn't moved forward because there are design issues 
with it (the odd way in which it produces a resulting value, sometimes 
surprising behavior with how it interacts with flow control, the fact that it 
can't be used in all contexts, etc). The committee was leaning more towards 
lambdas despite those being a bit orthogonal.

~Aaron

-Original Message-
From: Alejandro Colomar  
Sent: Wednesday, August 14, 2024 10:48 AM
To: Jens Gustedt 
Cc: Ballman, Aaron ; Xavier Del Campo Romero 
; Gcc Patches ; Daniel Plakosh 
; Martin Uecker ; Joseph Myers 
; Gabriel Ravier ; Jakub Jelinek 
; Kees Cook ; Qing Zhao 
; David Brown ; Florian Weimer 
; Andreas Schwab ; Timm Baeder 
; A. Jiang ; Eugene Zelenko 

Subject: Re: v2.1 Draft for a lengthof paper

On Wed, Aug 14, 2024 at 03:50:21PM GMT, Jens Gustedt wrote:
> > > > 
> > > > That said, I suspect WG14 would not be keen on standardizing 
> > > > `lengthof` without an ugly keyword given that there are plenty of other 
> > > > uses of it that would break:
> > > > 
> > > > https://sourcegraph.com/github.com/illumos/illumos-gate/-/blob/u
> > > > sr/src/cmd/mailx/names.c?L53-55
> > > > https://sourcegraph.com/github.com/Rockbox/rockbox/-/blob/tools/
> > > > ipod_fw.c?L292-294
> > > > https://sourcegraph.com/github.com/OpenSmalltalk/opensmalltalk-v
> > > > m/-/blob/src/spur64.stack/validImage.c?L7014-7018
> > > > (and many, many others)
> > 
> > What regex did you use for searching?
> > 
> > I was thinking of renaming the proposal to elementsof(), to avoid 
> > confusion between length of an array and length of a string.  Would 
> > you mind checking if elementsof() is ok?
> 
> No, not for me. I really want as to go consistently to talk about 
> array length for this. Consistent terminology is important.

I understand your desire for consistency.  I think your paper is a net 
improvement over the status quo (which is a mix of length, size, and number of 
elements).  After your proposal, there will be only length and number of 
elements.  That's great.

However, strlen(3) came first, and we must respect it.

Since you haven't proposed eliminating "number of elements" from the standard, 
and it would still be used alongside length, I think
elementsof() would be consistent with your view (consistent with "number of 
elements").

Alternatively, you could use a new term, for example extent, for referring to 
the number of elements of an array.  That would be more respectful to 
strlen(3), keeping a strong distinction between string length and array **.

Or how about always referring to it as "number of elements"?  It's longer to 
type, but would be the most consistent approach.

Also, elementsof() is free to use, while lengthof() has a several existing 
incompatible cases (as Aaron has shown), so we can't use that name so freely.

> > I have concerns about a libc (or a predefined macro) implementation:
> > the sizeof division causes double evaluation with any VLAs, while my 
> > implementation for GCC has less cases of evaluation, and when it 
> > needs to evaluate, it only does it once.  It would be hard to find a 
> > good wording that would allow an implementation to implement this as a 
> > macro.
> 
> No, we should not allow double evaluation.
> 
> putting this in a `({})`

I would love to see a proposal for adding this GNU extension to ISO C.
Did nobody do it yet?  I could try to, if I find some time.  (But I'll take a 
longish time for that; if anyone else does it, it would be
great.)

> and doing a `typedef typeof(X) _my_type;` with the macro parameter `X` 
> at the beginning completely avoids double evaluation. So quality 
> implantations are possible, but perhaps differently and with other builtins 
> than we are imagining. Don't impose the view of one particular implementation 
> onto others.

Ahhh, good.  I haven't thought of that possibility.  Sure, that makes sense 
now.  It gives more strength to your proposal of allowing libc implementations, 
and thus require parens in the standard.

> Somewhere was brought in an argument with `offsetof`. 
> This is exactly what we need. Implementations being able to start with 
> a simple solution (as everybody did in the beginning of `offsetof`), 
> and improve that implementation at their pace when they are ready for 
> it.

Agree.

> > > this was basically what we did for `unreachable` and I think it 
> > > worked out fine.
> 
> I still think that the different options that we had there can be used 
> to ask the right questions for WG14.

I'm looking at it.  I've already taken some parts of it.  :)

Cheers,
Alex

--
<https://www.alejandro-colomar.es/>