Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-28 Thread Alex Colomar via Gcc

Hi Joseph,

On 11/14/22 19:13, Joseph Myers wrote:

On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:


SYNOPSIS:

unary-operator:  . identifier


That's not what you mean.  See the standard syntax.


Yup; typo there.



unary-expression:
   [other alternatives]
   unary-operator cast-expression

unary-operator: one of
   & * + - ~ !


-  It is not an lvalue.

-  This means sizeof() and _Lengthof() cannot be applied to them.


sizeof can be applied to non-lvalues.


thinko there.  I fixed it in a subsequent email.




-  This prevents ambiguity with a designator in an initializer-list within
a nested braced-initializer.


No, it doesn't.  See my previous points about syntactic disambiguation
being a separate matter from "one parse would result in a constraint
violation, so choose another parse that doesn't" (necessarily, because the
constraint violation that results could in general be at an arbitrary
distance from the point where a choice of parse has to be made).  Or see
e.g. the disambiguation rule about enum type specifiers: there is an
explicit rule "If an enum type specifier is present, then the longest
possible sequence of tokens that can be interpreted as a specifier
qualifier list is interpreted as part of the enum type specifier." that
ensures that "enum e : long int;" interprets "long int" as the enum type
specifier, rather than "long" as the enum type specifier and "int" as
another type specifier in the sequence of declaration specifiers, even
though the latter parse would result in a constraint violation later.


I get it.  It's only unambiguous if there's lookahead.



Also, requiring unbounded lookahead to determine what kind of construct is
being parsed may be considered questionable for C.  (If you have an
initializer starting .a.b.c.d.e, possibly with array element access as
well, those could all be designators or .a might be a reference to a
parameter of struct or union type and .b.c.d.e a sequence of references to
members within it and disambiguation under your rule would depend on
whether an '=' follows such an unbounded sequence.)


I'm thinking of an idea for this.




-  The type of a .identifier is always an incomplete type.

-  This prevents circular dependencies involving sizeof() or _Lengthof().


We have typeof as well, which can be applied to expressions with
incomplete type.


Yes, but it would not be problematic in the two-pass parsing I have in mind.




-  Shadowing rules apply.

-  This prevents ambiguity.


"Shadowing rules apply" isn't much of a specification.  You need detailed
wording that would be added to 6.2.1 Scopes of identifiers (or equivalent
elsewhere) to make it clear exactly what scopes apply for identifiers
looked up using this construct.


Yeah, I guess.  I'm being easy for this draft.  I'll try to be more 
precise for future revisions.





-
void foo(struct bar { int x; char c[.x] } a, int x);

Explanation:
-  Because of shadowing rules, [.x] refers to the struct member.


I really don't think standardizing VLAs-in-structures would be a good
idea.  Certainly it would be a massive pain to specify meaningful
semantics for them and this outline doesn't even attempt to work through
the consequences of removing the rule that "If an identifier is declared
as having a variably modified type, it shall be an ordinary identifier (as
defined in 6.2.3), have no linkage, and have either block scope or
function prototype scope.".


Maybe.  I didn't have them in mind until Martin mentioned them.  Now 
that he mentioned them, I'd like at least to be careful so that any new 
syntax doesn't do something that impedes adding them in the future, if 
it is ever considered desirable.




The idea that .x as an expression might refer to either a member or a
parameter is also a massive change to the namespace rules, where at
present those are in completely different namespaces and so in any given
context a name only needs looking up as one or the other.

Again, proposals should be *minimal*.


Yes.  I only want to have a rough discussion about how the entire 
feature in an ideal future where everything is added would look like. 
Otherwise, adding a minimal feature without considering this future, 
might do something that prevents some part of it being implemented due 
to backwards compatibility.


So I'd like to discuss the whole idea before then going to a minimal 
proposal that will be *much* smaller than this idea that I'm discussing.


I'm happy with the Linux man-pages implementing the whole idea (even if 
it's impossible to implement it in C ever), and letting ISO C / GCC 
implement initially (and possibly ever) only the minimal stuff.




 And even when they are, many issues
may well arise in practice (see the long list of constexpr issues in my
commit message for that C2x feature, for example, which I expect to turn
into multiple NB comments and at least two accompanying documents).


Sure; I expect that.


Cheers,

Alex


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-28 Thread Alex Colomar via Gcc

Hi Joseph,

On 11/14/22 19:26, Joseph Myers wrote:

On Mon, 14 Nov 2022, Alejandro Colomar via Gcc wrote:


To quote the convenor in WG14 reflector message 18575 (17 Nov
2020) when I asked about its status, "The author asked me not to put those
on the agenda.  He will supply updated versions later.".


Since his email is not in the paper, would you mind forwarding him this
suggestion of mine of renaming it to avoid confusion with string lengths?  Or
maybe point him to the mailing list discussion[1]?

[1]:



I don't have his email address (I don't see any emails from him on the
reflector since I joined it in 2001).


Meh; thanks.  Would you mind commenting this issue to whoever defends 
his document, whenever you talk about it?


Thanks,

Alex





--




OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-28 Thread Alex Colomar via Gcc

Hi Martin,

On 11/13/22 15:58, Martin Uecker wrote:

Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:


On 11/13/22 14:33, Alejandro Colomar wrote:

Hi Martin,

On 11/13/22 14:19, Alejandro Colomar wrote:

But there are not only syntactical problems, because
also the type of the parameter might become relevant
and then you can get circular dependencies:

void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);


This seems to be a difficult stone in the road.


But note that GNU forward declarations solve this nicely.


Okay, so GNU declarations basically work by duplicating (some of) the 
declarations.


How about the compiler parsing the parameter list twice?  One for 
getting the declarations and their types (but not resolving any 
sizeof(), _Lengthof(), or typeof(), when they contain .identifier (or 
expressions containing it; in those cases, leave the type incomplete, to 
be completed in the second pass).  As if the programmer had specified 
the firward declarations, but it's the compiler that gets them 
automatically.


I guess asking the compiler to do two passes on the param list isn't as 
bad as asking to do unbound lookahead.  In this case it's bound:  look 
ahead till the end of the param list; get as much info as possible, and 
then do it again to complete.  Anything not yet clear after two passes 
is not valid.


So, for

void foo(char (*a)[sizeof(*.b)], char (*b)[sizeof(*.a)]);

in the first pass, the compiler would read:

char (*a)[sizeof(*.b)];  // sizeof .identifier; incomplete type; 
continue parsing
char (*b)[sizeof(*.a)];  // sizeof .identifier; incomplete type; 
continue parsing


At the end of the first pass, the compiler only know:

char (*a)[];
char (*b)[];

At the second pass, when evaluating sizeof(), since the type of the 
arguments are yet incomplete, it can't be evaluated, and therefore, 
there's an error at the first sizeof(*.b): *.b has incomplete type.


---

Let's show a distinct case:

void foo(char (*a)[sizeof(*.b)], char (*b)[10]);

After the first pass, the compiler would know:

char (*a)[];
char (*b)[10];

At the second pass, sizeof(*.b) would be evaluated undoubtedly to 
sizeof(char[10]), and the parameter list would then be fine.


Does this 2-pass parsing make sense to you?  Did I miss any details?







I am not sure what would the best way to fix it. One
could specifiy that parameters referred to by
the .identifer syntax must of some integer type and
that the sub-expression .identifer is always
converted to a 'size_t'.


That makes sense, but then overnight some quite useful thing came to my mind
that would not be possible with this limitation:




char *
stpecpy(char dst[.end - .dst], char *src, char end[1])


Heh, I got an off-by-one error.  It should be dst[.end - .dst + 1], of course,
and then the result of the whole expression would be 0, which is fine as size_t.

So, never mind.


.end and .dst would have pointer size though.


{
  for (/* void */; dst <= end; dst++) {
  *dst = *src++;
  if (*dst == '\0')
  return dst;
  }
  /* Truncation detected */
  *end = '\0';

#if !defined(NDEBUG)
  /* Consume the rest of the input string. */
  while (*src++) {};
#endif

  return end + 1;
}

And I forgot to say it:  Default promotions rank high (probably the highest) in
my list of most hated features^Wbugs in C.


If you replaced them with explicit conversion you then have
to add by hand all the time, I am pretty sure most people
would hate this more. (and it could also hide bugs)


I wouldn't convert it to size_t, but
rather follow normal promotion rules.


The point of making it size_t is that you then
do need to know the type of the parameter to make
sense of the expression. If the type matters, then you get
mutual dependencies as in the example above.


Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
array (which took me some time to understand), I'd also allow the same here. So,
the type of the expression between [] could perfectly be signed or unsigned.

So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
allow negative numbers.  In the function above, since dst can be a pointer to
one-past-the-end (it represents a previous truncation; that's why the test
dst<=end), forcing a size_t conversion would disallow that syntax.


Yes, this then does not work.


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Alex Colomar via Gcc

Hi Martin, Joseph,

On 11/29/22 18:00, Martin Uecker wrote:

Am Dienstag, dem 29.11.2022 um 16:53 + schrieb Jonathan Wakely:

On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:


On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:


like.  But I'm generally doubtful of this whole feature within C
itself.
It serves a purpose in documentation, so in man-pages it seems
fine enough
(but then still could use a different puncuator to not be
confusable with
C syntax).


In man-pages you don't need to invent syntax at all.  You can write

int f(char buf[n], int n);

and in the context of a man page it will be clear to readers what
is
meant,


Considerably more clear than new invented syntax IMHO.


True, but I think it would be a mistake to use code in
man pages which then does not work as expected (or even
is subtle wrong) in actual code.


Exactly.  Using your proposed syntax (which was my first draft) would 
have probably been the source of hidden bugs, since it might work (read 
compile) in some cases, but with wrong results.


I prefer this hypothetical syntax, which at most will cause compile errors.

Cheers,

Alex



Martin





--




OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Alex Colomar via Gcc

Hi Martin and Michael,

On 11/29/22 17:58, Uecker, Martin wrote:


Hi,

Am Dienstag, dem 29.11.2022 um 15:44 + schrieb Michael Matz:

Hey,

On Tue, 29 Nov 2022, Uecker, Martin wrote:


It does not require any changes on how arrays are represented.

As part of VM-types the size becomes part of the type and this
can be used for static or dynamic analysis, e.g. you can
- today - get a run-time bounds violation with the sanitizer:

void foo(int n, char (*buf)[n])
{
   (*buf)[n] = 1;
}


This can already statically analyzed as being wrong, no need for
dynamic checking.


In this toy example, but in general in can be checked
only at run-time by using the information about the
dynamic bound.


What I mean is the checking of the claimed contract.
Above you assure for the function body that buf has n elements.


Yes.


This is also a pre-condition for calling this function and
_that_ can't be checked in all  cases because:

   void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
   void callfoo(char * buf) { foo(10, buf); }

buf doesn't have a known size.


This does not type check.


  And a pre-condition that can't be checked
is no pre-condition at all, as only then it can become a guarantee
for the body.


The example above should look like:

void foo(int n, char (*buf)[n]);

void callfoo(char (*buf)[12]) { foo(10, buf); }

This could be checked by an UB sanitizer as calling
the function with an argument of incompatible type
is UB (but we currently do not do this)


If you think about

void foo(int n, char buf[n]);

void callfoo(char *buf) { foo(10, buf); }


Then you are right that this can not be checked at this
time. But this  does not mean it is useless because we
still can detect inconsistencies in other cases:

void callfoo(int n, char buf[n - 1]) { foo(n, buf); }

We could also - in the future - have a warning about all
situations where bound information is lost, making sure
that preconditions are always checked for people who
consistently use these annotations.



The compiler has no choice than to trust the user that the pre-
condition  for calling foo is fulfilled.  I can see how
being able to just check half  of the contract might be
useful, but if it doesn't give full checking then
any proposal for syntax should be even more obviously
orthogonal than the current one.


Your argument is not clear to me.



For

void foo(int n, char buf[n]);

it semantically has no meaning according to the C standard,
but a compiler could still warn.


Hmm?  Warn about what in this decl?


I meant, we could warn about something like this
because it is likely an error:

void foo(int n, char buf[n])
{
   buf[n] = 1;
}



It could also warn for

void foo(int n, char buf[n]);

int main()
{
     char buf[9];
     foo(buf);
}


You mean if you write 'foo(10,buf)' (the above, as is, is simply a
syntax error for non-matching number of args).  Or was it a mispaste
and you mean  the one from the godbolt link, i.e.:


I meant:

char buf[9];
foo(10, buf);

In fact, it turns out we warn already:

https://godbolt.org/z/qcvsv87Ev


void foo(char buf[10]){ buf[9] = 1; }
int main()
{
     char buf[9];
     foo(buf);
}

?  If so, yeah, we warn already.  I don't think this is an argument
for (or against) introducing new syntax.
...


It is argument for having this syntax, because we could
extend such warning (those we already have and those we
could still add) to more common cases such as

void foo(char buf[.n], size_t n);

In my opinion, this would a huge step forward for
safety of C programs as we already have a lot of
infrastructure for checking bounds.

Of course, the existing GNU extension would achieve
the same thing:

void foo(size_t n; char buf[n], size_t n);




But in general: This feature is useful not only for documentation
but also for analysis.


Which feature we're talking about now?  The ones you used all work
today,
as you demonstrated.  I thought we would be talking about that
".whatever"
syntax to refer to arbitrary parameters, even following ones?  I
think a
disrupting syntax change like that should have a higher bar than "in
some
cases, depending on circumstance, we might even be able to warn".


We can use our existing features and then apply them
to cases where the bound is specified after the pointer,
which is more common in practice.


Yep; basically adding some (not perfect, but some) static analysis to 
many libc function calls.


Also, considering the issues with sizeof and arrays, and the lack of a 
_Nitems() [proposed as _Lengthof()] operator, there's a lot of manual 
work in array (read pointer) parameters.


However, a hypothetical _Nitems() operator could make use of this 
syntactic sugar, and be more useful than just providing static analysis. 
 Using _Nitems() on a VMT (including pointer parameters) could be 
specified to return the number of elements, so I foresee code like:



void foo(int arr[nmemb], size_t nmemb)
{
// _Nitems() evaluates to nmemb
for (size_t i = 0; i < _Nitems(

Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Alex Colomar via Gcc

On 11/29/22 18:19, Alex Colomar wrote:

Hi Martin, Joseph,

On 11/29/22 18:00, Martin Uecker wrote:

Am Dienstag, dem 29.11.2022 um 16:53 + schrieb Jonathan Wakely:

On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:


On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:


like.  But I'm generally doubtful of this whole feature within C
itself.
It serves a purpose in documentation, so in man-pages it seems
fine enough
(but then still could use a different puncuator to not be
confusable with
C syntax).


In man-pages you don't need to invent syntax at all.  You can write

int f(char buf[n], int n);

and in the context of a man page it will be clear to readers what
is
meant,


Considerably more clear than new invented syntax IMHO.


True, but I think it would be a mistake to use code in
man pages which then does not work as expected (or even
is subtle wrong) in actual code.


Exactly.  Using your


s/your/Joseph's/

proposed syntax (which was my first draft) would 
have probably been the source of hidden bugs, since it might work (read 
compile) in some cases, but with wrong results.


I prefer this hypothetical syntax, which at most will cause compile errors.

Cheers,

Alex



Martin







--




OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi Richard,

On 1/23/23 17:28, Richard Biener wrote:

The common initial sequence of structures is only allowed if the structures form
part of a union (which is why to avoid UB you need a union; and still, you need
to make sure you don't invoke UB in a different way).



GCC only allows it if the union is visible as part of the access, that
is, it allows it
under its rule of allowing punning for union accesses and not specially because
of the initial sequence rule.  So

  u.a.x = 1;
  ... = u.b.x;

is allowed but

   struct A *p = &u.a;
   p->x = 1;
   struct B *q = &u.b;
   ... = q->x;

is UB with GCC if struct A and B are the union members with a common
initial sequence.


Yep.  That's why we need a union that is defined in libc, so that it can 
be used both in and out of glibc.  sockaddr_storage can be reconverted 
to that purpose.


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi Jakub,

On 1/23/23 17:37, Jakub Jelinek wrote:

Please see transparent_union documentation in GCC documentation.
E.g. 
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute
transparent_union doesn't change anything regarding type punning, it is
solely about function arguments, how arguments of that type are passed
(as first union member) and that no casts to the union are needed from
the member types.


Yep, when I wrote that I didn't fully understand it.  Now I got it. 
I'll prepare some better suggestion about a fix.


Thanks.


And, with LTO TU boundaries are lost, inlining or other IPA optimizations
(including say modref) work in between TUs.


Yeah, that's why we need a fix.  Compilers will only get better at 
optimizing, so UB will sooner or later be a problem.


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi Rick,

On 1/24/23 12:16, Rich Felker wrote:

On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote:

Hi Alex,

On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
 wrote:


Hi!

I just received a report about struct sockaddr_storage in the man pages.  It
reminded me of some concern I've always had about it: it doesn't seem to be a
usable type.

It has some alignment promises that make it "just work" most of the time, but
it's still a UB mine, according to ISO C.

According to strict aliasing rules, if you declare a variable of type 'struct
sockaddr_storage', that's what you get, and trying to access it later as some
other sockaddr_8 is simply not legal.  The compiler may assume those accesses
can't happen, and optimize as it pleases.


Can you detail the "is not legal" part? How about the APIs like
connect() etc that use pointers to struct sockaddr, where the
underlying type is different, why would that be legal while using
sockaddr_storage isn't?


Because they're specified to take different types. In C, any struct
pointer type can legally point to any other struct type. You just
can't dereference through it with the wrong type.


Yep.  Which basically means that users need to treat sockaddr structures 
as black boxes.  Otherwise, there's going to be undefined behavior at 
some point.  Because of course, you can't know the right type before 
reading the first field, which is already UB.



How the
implementation of connect etc. handle this is an implementation
detail. You're allowed to pass pointers to struct sockaddr_in, etc. to
connect etc. simply because the specification says you are.


While the implementation has some more freedom regarding UB, in this 
case it's waiting for a compiler optimization to break this code, so I'd 
go the safe way and use standard C techniques in libc so that there are 
no long-term UB issues.


As a side effect, user code that currently invokes UB could be changed 
to have defined behavior.




In any case, sockaddr_storage is a legacy thing designed by folks who
didn't understand the rules of the C language. It should never appear
in modern code except perhaps with sizeof for allocting buffers. There
is no action that needs to be taken here except documenting that it
should not be used (cannot be used meaningfully without UB).


I agree with you on this.  sockaddr_storage has been broken since day 0. 
 However, for designing a solution for libc using unions, it could be 
useful.




Rich


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi,

After reading more about transparent_unit, here's my idea of a fix for 
the API.  old_api() is an example for the libc functions that accept a 
`struct sockaddr *`, and user_code() is an example for user code 
functions that handle sockaddr structures.  The interface would be the 
same as it is currently, but the implementation inside libc would change 
to use a union.  In user code, uses of sockaddr_storage would be made 
safe with these changes, I believe, and new code would be simpler, since 
it wouldn't need casts.




void old_api(union my_sockaddr_ptr *sa);


struct sockaddr_storage {
union {
struct {
sa_family_t  ss_family;
};
struct sockaddr_in   sin;
struct sockaddr_in6  sin6;
struct sockaddr_un   sun;
// ...
};
};


union [[gnu::transparent_union]] sockaddr_ptr {
struct sockaddr_storage  *ss;
struct sockaddr  *sa;
};


void old_api(struct sockaddr_storage *ss)
{
// Here libc uses the union, so it doesn't invoke UB.
ss->sun.sa_family = AF_UNIX;
//...
}


void user_code(void)
{
struct my_sockaddr_storage  ss;  // object definition

// ...

old_api(&ss);  // The transparent_union allows no casts.

switch (ss.ss_family) {
// This is safe too.
// thanks to common initial sequence within a union.
}
}


This would in fact deprecate plain `struct sockaddr`, as Bastien suggested.


Cheers,

Alex


--




OpenPGP_signature
Description: OpenPGP digital signature


Re: Missed warning (-Wuse-after-free)

2023-02-23 Thread Alex Colomar via Gcc

Hi Martin,

On 2/17/23 14:48, Martin Uecker wrote:

This new wording doesn't even allow one to use memcmp(3);
just reading the pointer value, however you do it, is UB.


memcmp would not use the pointer value but work
on the representation bytes and is still allowed.


Hmm, interesting.  It's rather unspecified behavior.  Still 
unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true or 
false randomly; the compiler may compile out the call to memcmp(3), 
since it knows it won't produce any observable behavior.




Cheers!

Alex

--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: Missed warning (-Wuse-after-free)

2023-02-23 Thread Alex Colomar via Gcc

Hi Martin,

On 2/23/23 20:57, Martin Uecker wrote:

Am Donnerstag, dem 23.02.2023 um 20:23 +0100 schrieb Alex Colomar:

Hi Martin,

On 2/17/23 14:48, Martin Uecker wrote:

This new wording doesn't even allow one to use memcmp(3);
just reading the pointer value, however you do it, is UB.


memcmp would not use the pointer value but work
on the representation bytes and is still allowed.


Hmm, interesting.  It's rather unspecified behavior. Still
unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true or
false randomly; the compiler may compile out the call to memcmp(3),
since it knows it won't produce any observable behavior.




No, I think several things get mixed up here.

The representation of a pointer that becomes invalid
does not change.

So (0 === memcmp(&p, &p, sizeof(p)) always
evaluates to true.

Also in general, an unspecified value is simply unspecified
but does not change anymore.

Reading an uninitialized value of automatic storage whose
address was not taken is undefined behavior, so everything
is possible afterwards.

An uninitialized variable whose address was taken has a
representation which can represent an unspecified value
or a no-value (trap) representation. Reading the
representation itself is always ok and gives consistent
results. Reading the variable can be undefined behavior
iff it is a trap representation, otherwise you get
the unspecified value which is stored there.

At least this is my reading of the C standard. Compilers
are not full conformant.


Does all this imply that the following is well defined behavior (and 
shall print what one would expect)?


  free(p);

  (void) &p;  // take the address
  // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?

  printf("%p\n", p);  // we took previously its address,
  // so now it has to hold consistently
  // the previous value


This feels weird.  And a bit of a Schroedinger's pointer.  I'm not 
entirely convinced, but might be.


Cheers,

Alex




Martin









--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: Missed warning (-Wuse-after-free)

2023-02-23 Thread Alex Colomar via Gcc

Hi Serge, Martin,

On 2/24/23 02:21, Serge E. Hallyn wrote:

Does all this imply that the following is well defined behavior (and shall
print what one would expect)?

   free(p);

   (void) &p;  // take the address
   // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?

   printf("%p\n", p);  // we took previously its address,
   // so now it has to hold consistently
   // the previous value


This feels weird.  And a bit of a Schroedinger's pointer.  I'm not entirely
convinced, but might be.


Again, p is just an n byte variable which happens to have (one hopes)
pointed at a previously malloc'd address.

And I'd argue that pre-C11, this was not confusing, and would not have
felt weird to you.

But I am most grateful to you for having brought this to my attention.
I may not agree with it and not like it, but it's right there in the
spec, so time for me to adjust :)


I'll try to show why this feels weird to me (even in C89):


alx@dell7760:~/tmp$ cat pointers.c
#include 
#include 


int
main(void)
{
char  *p, *q;

p = malloc(42);
if (p == NULL)
exit(1);

q = realloc(p, 42);
if (q == NULL)
exit(1);

(void) &p;  // If we remove this, we get -Wuse-after-free

printf("(%p == %p) = %i\n", p, q, (p == q));
}
alx@dell7760:~/tmp$ cc -Wall -Wextra pointers.c  -Wuse-after-free=3
alx@dell7760:~/tmp$ ./a.out
(0x5642cd9022a0 == 0x5642cd9022a0) = 1


This pointers point to different objects (actually, one of them doesn't 
even point to an object anymore), so they can't compare equal, according 
to both:






(I believe C89 already had the concept of lifetime well defined as it is 
now, so the object had finished it's lifetime after realloc(3)).


How can we justify that true, if the pointer don't point to the same 
object?  And how can we justify a hypothetical false (which compilers 
don't implement), if compilers will really just read the value?  To 
implement this as well defined behavior, it could result in no other 
than false, and it would require heavy overhead for the compilers to 
detect that the seemingly-equal values are indeed different, don't you 
think?  The easiest solution is for the standard to just declare this 
outlaw, IMO.


Maybe it could do an exception for printing, that is, reading a pointer 
is not a problem in itself, a long as you don't compare it, but I'm not 
such an expert about this.


Cheers,

Alex



-serge


--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature