Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Martin Uecker via Gcc


Hi Alejandro,

if in caller it is known that endptr has access mode "write_only"
then it can conclude that the content of *endptr has access mode
"none", couldn't it?

You also need to discuss backwards compatibility.  Changing
the type of those functions can break valid programs.  You would
need to make a case that this is unlikely to affect any real
world program.

Martin

Am Sonntag, dem 07.07.2024 um 03:58 +0200 schrieb Alejandro Colomar:
> Hi,
> 
> I've incorporated feedback, and here's a new revision, let's call it
> v0.2, of the draft for a WG14 paper.  I've attached the man(7) source,
> and the generated PDF.
> 
> Cheers,
> Alex
> 
> 



Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Paul Eggert

On 7/7/24 03:58, Alejandro Colomar wrote:


I've incorporated feedback, and here's a new revision, let's call it
v0.2, of the draft for a WG14 paper.
Although I have not followed the email discussion closely, I read v0.2 
and think that as stated there is little chance that its proposal will 
be accepted.


Fundamentally the proposal is trying to say that there are two styles X 
and Y for declaring strtol and similar functions, and that although both 
styles are correct in some sense, style Y is better than style X. 
However, the advantages of Y are not clearly stated and the advantages 
of style X over Y are not admitted, so the proposal is not making the 
case clearly and fairly.


One other thing: to maximize the chance of a proposal being accepted, 
please tailor it for its expected readership. The C committee is expert 
on ‘restrict’, so don’t try to define ‘restrict’ in your own way. Unless 
merely repeating the language of the standard, any definition given for 
‘restrict’ is likely to cause the committee to quibble with the 
restatement of the standard wording. (It is OK to mention some 
corollaries of the standard definition, so long as the corollaries are 
not immediately obvious.)


Here are some comments about the proposal. At the start these comments 
are detailed; towards the end, as I could see the direction the proposal 
was headed and was convinced it wouldn’t be accepted as stated, the 
comments are less detailed.



"The API may copy"

One normally doesn’t think of the application programming interface as 
copying. Please replace the phrase “the API” with “the caller” or “the 
callee” as appropriate. (Although ‘restrict’ can be used in places other 
than function parameters, I don’t think the proposal is concerned about 
those cases and so it doesn’t need to go into that.)



"To avoid violations of for example C11::6.5.16.1p3,"

Code that violates C11::6.5.16.1p3 will do so regardless of whether 
‘restrict’ is present. I would not mention C11::6.5.16.1p3 as it’s a red 
herring. Fundamentally, ‘restrict’ is not about the consequences of 
caching when one does overlapping moves; it’s about caching in a more 
general sense.



“As long as an object is only accessed via one restricted pointer, other 
restricted pointers are allowed to point to the same object.”


“only accessed” → “accessed only”


“This is less strict than I think it should be, but this proposal 
doesn’t attempt to change that definition.”


I would omit this sentence and all similar sentences. Don’t distract the 
reader with other potential proposals. The proposal as it stands is 
complicated enough.



“return ca > a;”
“return ca > *ap;”

I fail to understand why these examples are present. It’s not simply 
that nobody writes code like that: the examples are not on point. I 
would remove the entire programs containing them, along with the 
sections that discuss them. When writing to the C committee one can 
assume the reader is expert in ‘restrict’, there is no need for examples 
such as these.



“strtol(3) accepts 4 objects via pointer parameters and global variables.”

Omit the “(3)”, here and elsewhere, as the audience is the C standard 
committee.


“accepts” is a strange word to use here: normally one says “accepts” to 
talk about parameters, not global variables. Also, “global variables” is 
not right here. The C standard allows strtol, for example, to read and 
write an internal static cache. (Yes, that would be weird, but it’s 
allowed.) I suggest rephrasing this sentence to talk about accessing, 
not accepting.



“endptr access(write_only) ... *endptr access(none)”

This is true for glibc, but it’s not necessarily true for all conforming 
strtol implementations. If endptr is non-null, a conforming strtol 
implementation can both read and write *endptr; it can also both read 
and write **endptr. (Although it would need to write before reading, 
reading is allowed.)



“This qualifier helps catch obvious bugs such as strtol(p, p, 0) and 
strtol(&p, &p, 0) .”


No it doesn’t. Ordinary type checking catches those obvious bugs, and 
‘restrict’ provides no additional help there. Please complicate the 
examples to make the point more clearly.



“The caller knows that errno doesn’t alias any of the function arguments.”

Only because all args are declared with ‘restrict’. So if the proposal 
is accepted, the caller doesn’t necessarily know that.



“The callee knows that *endptr is not accessed.”

This is true for glibc, but not necessarily true for every conforming 
strtol implementation.



“It might seem that it’s a problem that the callee doesn’t know if nptr 
can alias errno or not. However, the callee will not write to the latter 
directly until it knows it has failed,”


Again this is true for glibc, but not necessarily true for every 
conforming strtol implementation.


To my mind this is the most serious objection. The current standard 
prohibits calls like strtol((char *) &errno, 0, 0). The proposal would 

Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sun, Jul 07, 2024 at 09:15:23AM GMT, Martin Uecker wrote:
> 
> Hi Alejandro,
> 
> if in caller it is known that endptr has access mode "write_only"
> then it can conclude that the content of *endptr has access mode
> "none", couldn't it?

H.  I think you're correct.  I'll incorporate that and see how it
affects the caller.

At first glance, I think it would result in

nptraccess(read_only)   alias *endptr
endptr  access(write_only)  unique
errno   access(read_write)  unique
*endptr access(none)alias nptr

Which is actually having perfect information, regardless of 'restrict'
on nptr.  :-)

> You also need to discuss backwards compatibility.  Changing
> the type of those functions can break valid programs.

I might be forgetting about other possibilities, but the only one I had
in mind that could break API would be function pointers.  However, a
small experiment seems to say it doesn't:

$ cat strtolp.c 
#include 

long
alx_strtol(const char *nptr, char **restrict endptr, int base)
{
return strtol(nptr, endptr, base);
}

typedef long (*strtolp_t)(const char *restrict nptr,
  char **restrict endptr, int base);
typedef long (*strtolpnr_t)(const char *nptr,
   char **restrict endptr, int base);

int
main(void)
{
[[maybe_unused]] strtolp_ta = &strtol;
[[maybe_unused]] strtolpnr_t  b = &strtol;
[[maybe_unused]] strtolp_tc = &alx_strtol;
[[maybe_unused]] strtolpnr_t  d = &alx_strtol;
}

$ cc -Wall -Wextra strtolp.c 
$

Anyway, I'll say that it doesn't seem to break API.

>  You would
> need to make a case that this is unlikely to affect any real
> world program.

If you have something else in mind that could break API, please let me
know, and I'll add it to the experiments.

Thanks!

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Martin Uecker via Gcc
Am Sonntag, dem 07.07.2024 um 13:07 +0200 schrieb Alejandro Colomar via Gcc:
> Hi Martin,
> 
> On Sun, Jul 07, 2024 at 09:15:23AM GMT, Martin Uecker wrote:
> > 
> > Hi Alejandro,
> > 
> > if in caller it is known that endptr has access mode "write_only"
> > then it can conclude that the content of *endptr has access mode
> > "none", couldn't it?
> 
> H.  I think you're correct.  I'll incorporate that and see how it
> affects the caller.
> 
> At first glance, I think it would result in
> 
>   nptraccess(read_only)   alias *endptr
>   endptr  access(write_only)  unique
>   errno   access(read_write)  unique
>   *endptr access(none)alias nptr
> 
> Which is actually having perfect information, regardless of 'restrict'
> on nptr.  :-)

Yes, but my point is that even with "restrict" a smarter
compiler could then also be smart enough not to warn even
when *endptr aliases nptr.

> 
> > You also need to discuss backwards compatibility.  Changing
> > the type of those functions can break valid programs.
> 
> I might be forgetting about other possibilities, but the only one I had
> in mind that could break API would be function pointers.  However, a
> small experiment seems to say it doesn't:

Right, the outermost qualifiers are ignored, so this is not a
compatibility problem.  So I think this is not an issue, but
it is worth pointing it out.

Martin

> 
>   $ cat strtolp.c 
>   #include 
> 
>   long
>   alx_strtol(const char *nptr, char **restrict endptr, int base)
>   {
>   return strtol(nptr, endptr, base);
>   }
> 
>   typedef long (*strtolp_t)(const char *restrict nptr,
> char **restrict endptr, int base);
>   typedef long (*strtolpnr_t)(const char *nptr,
>  char **restrict endptr, int base);
> 
>   int
>   main(void)
>   {
>   [[maybe_unused]] strtolp_ta = &strtol;
>   [[maybe_unused]] strtolpnr_t  b = &strtol;
>   [[maybe_unused]] strtolp_tc = &alx_strtol;
>   [[maybe_unused]] strtolpnr_t  d = &alx_strtol;
>   }
> 
>   $ cc -Wall -Wextra strtolp.c 
>   $
> 
> Anyway, I'll say that it doesn't seem to break API.
> 
> >  You would
> > need to make a case that this is unlikely to affect any real
> > world program.
> 
> If you have something else in mind that could break API, please let me
> know, and I'll add it to the experiments.
> 
> Thanks!
> 
> Have a lovely day!
> Alex
> 



Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Paul,

On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:
> On 7/7/24 03:58, Alejandro Colomar wrote:
> 
> > I've incorporated feedback, and here's a new revision, let's call it
> > v0.2, of the draft for a WG14 paper.
> Although I have not followed the email discussion closely, I read v0.2 and
> think that as stated there is little chance that its proposal will be
> accepted.

Thanks for reading thoroughly, and the feedback!

> Fundamentally the proposal is trying to say that there are two styles X and
> Y for declaring strtol and similar functions, and that although both styles
> are correct in some sense, style Y is better than style X. However, the
> advantages of Y are not clearly stated and the advantages of style X over Y
> are not admitted, so the proposal is not making the case clearly and fairly.
> 
> One other thing: to maximize the chance of a proposal being accepted, please
> tailor it for its expected readership. The C committee is expert on
> ‘restrict’, so don’t try to define ‘restrict’ in your own way. Unless merely
> repeating the language of the standard, any definition given for ‘restrict’
> is likely to cause the committee to quibble with the restatement of the
> standard wording. (It is OK to mention some corollaries of the standard
> definition, so long as the corollaries are not immediately obvious.)
> 
> Here are some comments about the proposal. At the start these comments are
> detailed; towards the end, as I could see the direction the proposal was
> headed and was convinced it wouldn’t be accepted as stated, the comments are
> less detailed.
> 
> 
> "The API may copy"
> 
> One normally doesn’t think of the application programming interface as
> copying. Please replace the phrase “the API” with “the caller” or “the
> callee” as appropriate. (Although ‘restrict’ can be used in places other
> than function parameters, I don’t think the proposal is concerned about
> those cases and so it doesn’t need to go into that.)

Ok.

> "To avoid violations of for example C11::6.5.16.1p3,"
> 
> Code that violates C11::6.5.16.1p3 will do so regardless of whether
> ‘restrict’ is present. I would not mention C11::6.5.16.1p3 as it’s a red
> herring. Fundamentally, ‘restrict’ is not about the consequences of caching
> when one does overlapping moves; it’s about caching in a more general sense.

The violations are UB regardless of restrict, but consistent use of
restrict allows the caller to have a rough model of what the callee will
do with the objects, and prevent those violations via compiler
diagnostics.  I've reworded that part to make it more clear why I'm
mentioning that.

> “As long as an object is only accessed via one restricted pointer, other
> restricted pointers are allowed to point to the same object.”
> 
> “only accessed” → “accessed only”

Ok.

> “This is less strict than I think it should be, but this proposal doesn’t
> attempt to change that definition.”
> 
> I would omit this sentence and all similar sentences. Don’t distract the
> reader with other potential proposals. The proposal as it stands is
> complicated enough.

Ok.

> “return ca > a;”
> “return ca > *ap;”
> 
> I fail to understand why these examples are present. It’s not simply that
> nobody writes code like that: the examples are not on point. I would remove
> the entire programs containing them, along with the sections that discuss
> them. When writing to the C committee one can assume the reader is expert in
> ‘restrict’, there is no need for examples such as these.

Those are examples of how consistent use of restrict can --or could, in
the case of g()-- detect, via compiler diagnostics, (likely) violations
of seemingly unrelated parts of the standard, such as the referenced
C11::6.5.16.1p3, or in this case, C11::6.5.8p5.  

> “strtol(3) accepts 4 objects via pointer parameters and global variables.”
> 
> Omit the “(3)”, here and elsewhere, as the audience is the C standard
> committee.

The C standard committee doesn't know about historic use of (3)?  That
predates the standard, and they built on top of that (C originated in
Unix).  While they probably don't care about it anymore, I expect my
paper to be read by other audience, including GCC and glibc, and I
prefer to keep it readable for that audience.  I expect the standard
committee to at least have a rough idea of the existence of this syntax,
and respect it, even if they don't use it or like it.

> “accepts” is a strange word to use here: normally one says “accepts” to talk
> about parameters, not global variables.

The thing is, strtol(3) does not actually access *endptr.  I thought
that might cause more confusion than using "accepts".

> Also, “global variables” is not
> right here. The C standard allows strtol, for example, to read and write an
> internal static cache. (Yes, that would be weird, but it’s allowed.)

That's not part of the API.  A user must not access internal static
cache, and so the implementation is free to assume that it doesn'

Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sun, Jul 07, 2024 at 02:21:17PM GMT, Martin Uecker wrote:
> Am Sonntag, dem 07.07.2024 um 13:07 +0200 schrieb Alejandro Colomar via Gcc:
> > Which is actually having perfect information, regardless of 'restrict'
> > on nptr.  :-)
> 
> Yes, but my point is that even with "restrict" a smarter
> compiler could then also be smart enough not to warn even
> when *endptr aliases nptr.

Hmmm, this is a valid argument.  I feel less strongly about this
proposal now.

I'll document this in the proposal.

Your analyzer would need to be more complex to be able to not trigger
false positives here, but it's possible, so I guess I'm happy with
either case.

Still, removing restrict from strtol(3) would allow to change the
semantics of restrict to be more restrictive (and easier to understand),
so that passing aliasing pointers as restrict pointers would already be
Undefined Behavior, regardless of the accesses by the callee.

But yeah, either way it's good, as far as strtol(3) and gcc-20 are
concerned.  :)

Have a lovely day!
Alex

> > > You also need to discuss backwards compatibility.  Changing
> > > the type of those functions can break valid programs.
> > 
> > I might be forgetting about other possibilities, but the only one I had
> > in mind that could break API would be function pointers.  However, a
> > small experiment seems to say it doesn't:
> 
> Right, the outermost qualifiers are ignored, so this is not a
> compatibility problem.  So I think this is not an issue, but
> it is worth pointing it out.

Yup.

> 
> Martin

-- 



signature.asc
Description: PGP signature


RE: [WG14] Request for document number; strtol restrictness

2024-07-07 Thread Daniel Plakosh
Alex,

Your document number is below:

n3294 - strtol(3) et al. shouldn't have a restricted first parameter

Please return the updated document with this number

Best regards,

Dan

Technical Director - Enabling Mission Capability at Scale
Principal Member of the Technical Staff
Software Engineering Institute
Carnegie Mellon University
4500 Fifth Avenue
Pittsburgh, PA 15213
WORK: 412-268-7197
CELL: 412-427-4606

-Original Message-
From: Alejandro Colomar  
Sent: Friday, July 5, 2024 3:42 PM
To: dplak...@cert.org
Cc: Martin Uecker ; Jonathan Wakely ; 
Xi Ruoyao ; Jakub Jelinek ; 
libc-al...@sourceware.org; gcc@gcc.gnu.org; Paul Eggert ; 
linux-...@vger.kernel.org; LIU Hao ; Richard Earnshaw 
; Sam James 
Subject: [WG14] Request for document number; strtol restrictness

Hi,

I have a paper for removing restrict from the first parameter of
strtol(3) et al.  The title is

strtol(3) et al. should’t have a restricted first parameter.

If it helps, I already have a draft of the paper, which I attach (both the PDF, 
and the man(7) source).

Cheers,
Alex

--



Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Paul Eggert

On 7/7/24 14:42, Alejandro Colomar wrote:

On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:

Also, “global variables” is not
right here. The C standard allows strtol, for example, to read and write an
internal static cache. (Yes, that would be weird, but it’s allowed.)


That's not part of the API.  A user must not access internal static
cache


Although true in the normal (sane) case, as an extension the 
implementation can make such a static cache visible to the user, and in 
this case the caller must not pass cache addresses as arguments to strtol.


For other functions this point is not purely academic. For example, the 
C standard specifies the signature "FILE *fopen(const char *restrict, 
const char *restrict);". If I understand your argument correctly, it 
says that the "restrict"s can be omitted there without changing the set 
of valid programs. But that can't be right, as omitting the "restrict"s 
would make the following code be valid in any platform where sizeof(int)>1:


   char *p = (char *) &errno;
   p[0] = 'r';
   p[1] = 0;
   FILE *f = fopen (p, p);

even though the current standard says this code is invalid.



“endptr access(write_only) ... *endptr access(none)”

This is true for glibc, but it’s not necessarily true for all conforming
strtol implementations. If endptr is non-null, a conforming strtol
implementation can both read and write *endptr;


It can't, I think.  It's perfectly valid to pass an uninitialized
endptr, which means the callee must not read the original value.


Sure, but the callee can do something silly like "*endptr = p + 1; 
*endptr = *endptr - 1;". That is, it can read *endptr after writing it, 
without any undefined behavior. (And if the callee is written in 
assembly language it can read *endptr even before writing it - but I 
digress.)


The point is that it is not correct to say that *endptr cannot be read 
from; it can. Similarly for **endptr.




Here, we need to consider two separate objects.  The object pointed-to
by *endptr _before_ the object pointed to by endptr is written to, and
the object pointed-to by *endptr _after_ the object pointed to by endptr
is written to.


Those are not the only possibilities. The C standard also permits strtol 
to set *endptr to some other pointer value, not pointing anywhere into 
the string being scanned, so long as it sets *endptr correctly before it 
returns.




“The caller knows that errno doesn’t alias any of the function arguments.”

Only because all args are declared with ‘restrict’. So if the proposal is
accepted, the caller doesn’t necessarily know that.


Not really.  The caller has created the string (or has received it via a
restricted pointer)


v0.2 doesn't state the assumption that the caller either created the 
string or received it via a restricted pointer. If this assumption were 
stated clearly, that would address the objection here.




“The callee knows that *endptr is not accessed.”

This is true for glibc, but not necessarily true for every conforming strtol
implementation.


The original *endptr may be uninitialized, and so must not be accessed.


**endptr can be read once the callee sets *endptr. **endptr can even be 
written, if the callee temporarily sets *endptr to point to a writable 
buffer; admittedly this would be weird but it's allowed.




“It might seem that it’s a problem that the callee doesn’t know if nptr can
alias errno or not. However, the callee will not write to the latter
directly until it knows it has failed,”

Again this is true for glibc, but not necessarily true for every conforming
strtol implementation.


An implementation is free to set errno = EDEADLK in the middle of it, as
long as it later removes that.  However, I don't see how it would make
any sense.


It could make sense in some cases. Here the spec is a bit tricky, but an 
implementation is allowed to set errno = EINVAL first thing, and then 
set errno to some other nonzero value if it determines that the 
arguments are valid. I wouldn't implement strtol that way, but I can see 
where someone else might do that.




Let's find
an ISO C function that accepts a non-restrict string:

int system(const char *string);

Does ISO C constrain implementations to support system((char *)&errno)?
I don't think so.  Maybe it does implicitly because of a defect in the
wording, but even then it's widely understood that it doesn't.


'system' is a special case since the C standard says 'system' can do 
pretty much anything it likes. That being said, I agree that 
implementations shouldn't need to support calls like atol((char *) 
&errno). Certainly the C standard's description of atol, which defines 
atol's behavior in terms of a call to strtol, means that atol's argument 
in practice must follow the 'restrict' rules.


Perhaps we should report this sort of thing as a defect in the standard. 
It is odd, for example, that fopen's two arguments are both const char 
*restrict, but system's argument lacks the "restr

Re: [RFC] MAINTAINERS: require a BZ account field

2024-07-07 Thread Richard Sandiford via Gcc
Sam James  writes:
> Richard Sandiford  writes:
>
>> Sam James via Gcc  writes:
>>> Hi!
>>>
>>> This comes up in #gcc on IRC every so often, so finally
>>> writing an RFC.
>>>
>> [...]
>>> TL;DR: The proposal is:
>>>
>>> 1) MAINTAINERS should list a field containing either the gcc.gnu.org
>>> email in full, or their gcc username (bikeshedding semi-welcome);
>>>
>>> 2) It should become a requirement that to be in MAINTAINERS, one must
>>> possess a Bugzilla account (ideally using their gcc.gnu.org email).
>>
>> How about the attached as a compromise?  (gzipped as a poor protection
>> against scraping.)
>>
>
> Thanks! This would work for me. A note on BZ below.
>
>> It adds the gcc.gnu.org/bugzilla account name, without the @gcc.gnu.org,
>> as a middle column to the Write After Approval section.  I think this
>> makes it clear that the email specified in the last column should be
>> used for communication.
>>
>> [..]
>>
>> If this is OK, I'll need to update check-MAINTAINERS.py.
>
> For Bugzilla, there's two issues:
> 1) If someone uses an alternative (n...@gcc.gnu.org) email on Bugzilla,
> unless an exception is made (and Jakub indicated he didn't want to add
> more - there's very few right now), they do not have editbugs and cannot
> assign bugs to themselves or edit fields, etc.
>
> This leads to bugs being open when they don't need to be anymore, etc,
> and pinskia and I often have to clean that up.
>
> People with commit access are usually very happy to switch to
> @gcc.gnu.org when I let them know it grants powers!
>
> 2) CCing someone using a n...@gcc.gnu.org email is a pain, but *if* they
> have to use a n...@gcc.gnu.org email, it might be OK if they use the
> email that is listed in MAINTAINERS otherwise. If they use a third email
> then it becomes a pain though, but your proposal helps if it's just two
> emails in use.
>
> (But I'd still really encourage them to not do that, given the lack of
> perms.)
>
> I care about both but 1) > 2) for me, some others here care a lot about 2)
> if they're the ones doing triage and bisecting.

Ah, yeah, I agree with all of the above.  By "communication" I meant
"normal email" -- sorry for the bad choice of words.

For me, the point of the new middle column is to answer "which gcc.gnu.org
account should I use in bugzilla PRs?".  But adding "@gcc.gnu.org" to each
entry might encourage people to use it for normal email too.

After:

  To report problems in GCC, please visit:

http://gcc.gnu.org/bugs/

how about adding something like:

  If you wish to CC a maintainer in bugzilla, please add @gcc.gnu.org
  to the account name given in the Write After Approval section below.
  Please use the email address given in <...> for direct email communication.

Richard


gcc-15-20240707 is now available

2024-07-07 Thread GCC Administrator via Gcc
Snapshot gcc-15-20240707 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/15-20240707/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 15 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision 4594d555aa551a9998fc921363c5f6ea50630d5c

You'll find:

 gcc-15-20240707.tar.xz   Complete GCC

  SHA256=2646f1c36715bcd2195a98cc63ad3259d3c46563774b35afe5ed5052bce812fc
  SHA1=7e1fe8051fda06149d7c38961a27eec4ae6e3b71

Diffs from 15-20240630 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-15
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Paul,

On Sun, Jul 07, 2024 at 07:30:43PM GMT, Paul Eggert wrote:
> On 7/7/24 14:42, Alejandro Colomar wrote:
> > On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:
> > > Also, “global variables” is not
> > > right here. The C standard allows strtol, for example, to read and write 
> > > an
> > > internal static cache. (Yes, that would be weird, but it’s allowed.)
> > 
> > That's not part of the API.  A user must not access internal static
> > cache
> 
> Although true in the normal (sane) case, as an extension the implementation
> can make such a static cache visible to the user, and in this case the
> caller must not pass cache addresses as arguments to strtol.
> 
> For other functions this point is not purely academic. For example, the C
> standard specifies the signature "FILE *fopen(const char *restrict, const
> char *restrict);". If I understand your argument correctly, it says that the
> "restrict"s can be omitted there without changing the set of valid programs.

No, I didn't say that restrict can be removed from fopen(3).

What I say is that in functions that accept pointers that alias each
other, those aliasing pointers should not be restrict.  Usually,
pointers that alias are accessed, and thus they are not specified as
restrict, such as in memmove(3).  However, a small set of functions
accept pointers that alias each other, but one of them is never
accessed; in those few cases, restrict was added to the parameters in
ISO C, but I claim it would be better removed.  We're lucky, and the
small set of functions where this happens don't seem to use any state,
so we don't need to care about implementations using internal buffers
that are passed somehow to the user..

From ISO C, IIRC, the only examples are strtol(3) et al.  Another
example is Plan9's seprint(3) family of functions.  However, Plan9
doesn't use restrict, so it doesn't have it.

> But that can't be right, as omitting the "restrict"s would make the
> following code be valid in any platform where sizeof(int)>1:
> 
>char *p = (char *) &errno;
>p[0] = 'r';
>p[1] = 0;
>FILE *f = fopen (p, p);
> 
> even though the current standard says this code is invalid.

No, I wouldn't remove any of the restrict qualifiers in fopen(3).

Only from pointers that alias an access(none) pointer.

> > > “endptr   access(write_only) ... *endptr access(none)”
> > > 
> > > This is true for glibc, but it’s not necessarily true for all conforming
> > > strtol implementations. If endptr is non-null, a conforming strtol
> > > implementation can both read and write *endptr;
> > 
> > It can't, I think.  It's perfectly valid to pass an uninitialized
> > endptr, which means the callee must not read the original value.
> 
> Sure, but the callee can do something silly like "*endptr = p + 1; *endptr =
> *endptr - 1;". That is, it can read *endptr after writing it, without any
> undefined behavior. (And if the callee is written in assembly language it
> can read *endptr even before writing it - but I digress.)

But once you modify the pointer provenance, you don't care anymore about
it.  We need to consider the pointers that a function receives, which
are the ones the callee needs to know their provenance.  Of course, a
callee knows what it does, and so doesn't need restrict in local
variables.

C23/N3220::6.7.4.1p9 says:

An object that is accessed through a restrict-qualified pointer
has a special association with that pointer.  This association,
defined in 6.7.4.2, requires that all accesses to that object
use, directly or indirectly, the value of that pointer.

When you set *endptr = nptr + x, and use the lvalue **endptr, you're
still accessing the object indirectly using the value of nptr.

So, strtol(3) gets 4 objects, let's call them A, B, C, and D.

A is gotten via its pointer nptr.
B is gotten via its pointer endptr.
C is gotten via its pointer *endptr.
D is gotten via the global variable errno.

Object A may be the same as object C.
Object B is unique inside the callee.  Its pointer endptr must be
restrict-qualified to denote its uniqueness.
Object D is unique, but there's no way to specify that.

Object C must NOT be read or written.  The function is of course allowed
to set *endptr to whatever it likes, and then access it however it
likes, but object C must still NOT be accessed, since its pointer may be
uninitialized, and thus point to no object at all.


Maybe I should use abstract names for the objects, to avoid confusing
them with the pointer variables that are used to pass them?

The formal definition of restrict refers to the "object into which it
formerly [in the list of parameter declarations of a function
definition] pointed".  I'm not 100% certain, because this formal
definition is quite unreadable, though.  The more I read it, the less
sure I am about it.

BTW, I noticed something I didn't know:

If L is used to access the value of the object X that it
designates, and X is also mo

Re: [RFC] MAINTAINERS: require a BZ account field

2024-07-07 Thread Sam James via Gcc
Richard Sandiford  writes:

> Sam James  writes:
>> Richard Sandiford  writes:
>>
>>> Sam James via Gcc  writes:
 Hi!

 This comes up in #gcc on IRC every so often, so finally
 writing an RFC.

>>> [...]
 TL;DR: The proposal is:

 1) MAINTAINERS should list a field containing either the gcc.gnu.org
 email in full, or their gcc username (bikeshedding semi-welcome);

 2) It should become a requirement that to be in MAINTAINERS, one must
 possess a Bugzilla account (ideally using their gcc.gnu.org email).
>>>
>>> How about the attached as a compromise?  (gzipped as a poor protection
>>> against scraping.)
>>>
>>
>> Thanks! This would work for me. A note on BZ below.
>>
>>> It adds the gcc.gnu.org/bugzilla account name, without the @gcc.gnu.org,
>>> as a middle column to the Write After Approval section.  I think this
>>> makes it clear that the email specified in the last column should be
>>> used for communication.
>>>
>>> [..]
>>>
>>> If this is OK, I'll need to update check-MAINTAINERS.py.
>>
>> For Bugzilla, there's two issues:
>> 1) If someone uses an alternative (n...@gcc.gnu.org) email on Bugzilla,
>> unless an exception is made (and Jakub indicated he didn't want to add
>> more - there's very few right now), they do not have editbugs and cannot
>> assign bugs to themselves or edit fields, etc.
>>
>> This leads to bugs being open when they don't need to be anymore, etc,
>> and pinskia and I often have to clean that up.
>>
>> People with commit access are usually very happy to switch to
>> @gcc.gnu.org when I let them know it grants powers!
>>
>> 2) CCing someone using a n...@gcc.gnu.org email is a pain, but *if* they
>> have to use a n...@gcc.gnu.org email, it might be OK if they use the
>> email that is listed in MAINTAINERS otherwise. If they use a third email
>> then it becomes a pain though, but your proposal helps if it's just two
>> emails in use.
>>
>> (But I'd still really encourage them to not do that, given the lack of
>> perms.)
>>
>> I care about both but 1) > 2) for me, some others here care a lot about 2)
>> if they're the ones doing triage and bisecting.
>
> Ah, yeah, I agree with all of the above.  By "communication" I meant
> "normal email" -- sorry for the bad choice of words.

Ah, great!

>
> For me, the point of the new middle column is to answer "which gcc.gnu.org
> account should I use in bugzilla PRs?".  But adding "@gcc.gnu.org" to each
> entry might encourage people to use it for normal email too.
>
> After:
>
>   To report problems in GCC, please visit:
>
> http://gcc.gnu.org/bugs/
>
> how about adding something like:
>
>   If you wish to CC a maintainer in bugzilla, please add @gcc.gnu.org
>   to the account name given in the Write After Approval section below.
>   Please use the email address given in <...> for direct email communication.
>

Sounds good to me -- thank you! This seems like a solid compromise (or
even a better way of doing what I wanted to begin with).

> Richard

thanks,
sam