AS UN TO

2023-02-06 Thread Manuel Aarón Rojas Salinas via Gcc


Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Rich Felker
On Mon, Feb 06, 2023 at 02:02:23PM +0800, Xi Ruoyao wrote:
> On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:
> 
> > The only correct way to use  different  types  in  an  API  is
> > through  a  union.
> 
> I don't think this statement is true (in general).  Technically we can
> write something like this:
> 
> struct sockaddr { ... };
> struct sockaddr_in { ... };
> struct sockaddr_in6 { ... };
> 
> int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
> {
> if (addrlen < sizeof(struct sockaddr) {
> errno = EINVAL;
> return -1;
> }
> 
> /* cannot use "addr->sa_family" directly: it will be an UB */
> sa_family_t sa_family;
> memcpy(&sa_family, addr, sizeof(sa_family));
> 
> switch (sa_family) {
> case AF_INET:
> return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
> case AF_INET6:
> return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
> /* more cases follow here */
> default:
> errno = EINVAL;
> return -1;
> }
> }
> }
> 
> In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
> as we can distinguish the "real" type of addr using the leading byte
> sequence (and the caller uses it carefully).
> 
> But obviously sockaddr_storage can't be distinguished here, so casting a
> struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
> will still be wrong (unless we make sockaddr_storage an union or add
> [[gnu::may_alias]]).

If you wanted to make this work, you can just memcpy sockaddr_storage
to a local object of the right declared type to access it. But this is
only relevant for a userspace implementation of bind() rather than one
that just marshalls it across some syscall boundary to a kernel.

Rich


Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Alejandro Colomar via Gcc

Hi Xi,

On 2/6/23 07:02, Xi Ruoyao wrote:

On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:


The only correct way to use  different  types  in  an  API  is
through  a  union.


I don't think this statement is true (in general).  Technically we can
write something like this:

struct sockaddr { ... };
struct sockaddr_in { ... };
struct sockaddr_in6 { ... };

int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
{
 if (addrlen < sizeof(struct sockaddr) {
 errno = EINVAL;
 return -1;
 }

 /* cannot use "addr->sa_family" directly: it will be an UB */
 sa_family_t sa_family;
 memcpy(&sa_family, addr, sizeof(sa_family));

 switch (sa_family) {
 case AF_INET:
 return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
 case AF_INET6:
 return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
 /* more cases follow here */
 default:
 errno = EINVAL;
 return -1;
 }
 }
}

In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
as we can distinguish the "real" type of addr using the leading byte
sequence (and the caller uses it carefully).


True; I hadn't thought of memcpy()ing the first member of the struct.  That's 
valid; overcomplicated but valid.




But obviously sockaddr_storage can't be distinguished here, so casting a
struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
will still be wrong (unless we make sockaddr_storage an union or add
[[gnu::may_alias]]).


But as you say, it still leaves us with a question.  What should one declare for 
passing to the standard APIs?  It can only be a union.  So we can either tell 
users to each create their own union, or we can make sockaddr_storage be a 
union.  The latter slightly violates POSIX due to namespaces as Rich noted, but 
that's a minor violation, and POSIX can be changed to accomodate for that.


If we change sockaddr_storage to be a union, we have two benefits:

-  Old code which uses sockaddr_storage is made conforming (non-UB) without 
modifying the source.

-  Users can inspect the structures.

If we don't, and deprecate sockaddr_storage, we should tell users to declare 
their own unions _and_ treat all these structures as black boxes which can only 
be read by memcpy()ing their contents.


Which of the two do we want?  I think fixing sockaddr_storage is simpler, and 
allows existing practice of reading these structures.  The other one just makes 
(or rather acknowledges, since it has always been like that) a lot of existing 
code invoke UB, and acknowledges that you can't safely use these structures 
without a lot of workarounding.


Cheers,

Alex

--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Rich Felker
On Mon, Feb 06, 2023 at 12:55:12PM +0100, Alejandro Colomar wrote:
> Hi Xi,
> 
> On 2/6/23 07:02, Xi Ruoyao wrote:
> >On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:
> >
> >>The only correct way to use  different  types  in  an  API  is
> >>through  a  union.
> >
> >I don't think this statement is true (in general).  Technically we can
> >write something like this:
> >
> >struct sockaddr { ... };
> >struct sockaddr_in { ... };
> >struct sockaddr_in6 { ... };
> >
> >int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
> >{
> > if (addrlen < sizeof(struct sockaddr) {
> > errno = EINVAL;
> > return -1;
> > }
> >
> > /* cannot use "addr->sa_family" directly: it will be an UB */
> > sa_family_t sa_family;
> > memcpy(&sa_family, addr, sizeof(sa_family));
> >
> > switch (sa_family) {
> > case AF_INET:
> > return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
> > case AF_INET6:
> > return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
> > /* more cases follow here */
> > default:
> > errno = EINVAL;
> > return -1;
> > }
> > }
> >}
> >
> >In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
> >as we can distinguish the "real" type of addr using the leading byte
> >sequence (and the caller uses it carefully).
> 
> True; I hadn't thought of memcpy()ing the first member of the
> struct.  That's valid; overcomplicated but valid.
> 
> >
> >But obviously sockaddr_storage can't be distinguished here, so casting a
> >struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
> >will still be wrong (unless we make sockaddr_storage an union or add
> >[[gnu::may_alias]]).
> 
> But as you say, it still leaves us with a question.  What should one
> declare for passing to the standard APIs?  It can only be a union.
> So we can either tell users to each create their own union, or we
> can make sockaddr_storage be a union.  The latter slightly violates
> POSIX due to namespaces as Rich noted, but that's a minor violation,
> and POSIX can be changed to accomodate for that.

There is absolutely not any need to declare the union for application
code calling the socket APIs. You declare whatever type you will be
using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
pass it to the function.

But normally you don't use declared-type objects for this anyway. You
use the struct sockaddr * obtained from getaddrinfo as an abstract
pointer and never dereference it at all.

> If we change sockaddr_storage to be a union, we have two benefits:
> 
> -  Old code which uses sockaddr_storage is made conforming (non-UB)
> without modifying the source.

It's not conforming. It's just no longer UB as a result of unspecified
implementation choices you'd be making.

> -  Users can inspect the structures.
> 
> If we don't, and deprecate sockaddr_storage, we should tell users to
> declare their own unions _and_ treat all these structures as black
> boxes which can only be read by memcpy()ing their contents.

No, there is no need to tell users to do any such thing. No action is
needed here except for folks to stop using sockaddr_storage.

Rich


Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Alejandro Colomar via Gcc

Hi Rich,

On 2/6/23 14:38, Rich Felker wrote:

There is absolutely not any need to declare the union for application
code calling the socket APIs. You declare whatever type you will be
using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
pass it to the function.


Except that you may be using generic code that may use either of AF_UNIX, 
AF_INET, and AF_INET6.  A web server may very well use all the three.




But normally you don't use declared-type objects for this anyway. You
use the struct sockaddr * obtained from getaddrinfo as an abstract
pointer and never dereference it at all.


That's right.  Most of the time, we should be using getaddrinfo(3), which 
already provides the storage.  I don't know for sure if there are any cases that 
can't be rewritten to work that way.


However, there are some APIs that require you to allocate an object.  For 
example recvfrom(2).  How would you recommend using recvfrom(2), or is it some 
API to avoid?  Example of usage: 
https://man7.org/tlpi/code/online/dist/sockets/id_echo_sv.c.html>.



Cheers,

Alex

--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


[no subject]

2023-02-06 Thread siraj mohammad via Gcc
Aw6394373


Aw6394373

2023-02-06 Thread siraj mohammad via Gcc
Aw6394373


Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Rich Felker
On Mon, Feb 06, 2023 at 03:11:10PM +0100, Alejandro Colomar wrote:
> Hi Rich,
> 
> On 2/6/23 14:38, Rich Felker wrote:
> >There is absolutely not any need to declare the union for application
> >code calling the socket APIs. You declare whatever type you will be
> >using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
> >sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
> >pass it to the function.
> 
> Except that you may be using generic code that may use either of
> AF_UNIX, AF_INET, and AF_INET6.  A web server may very well use all
> the three.

If you have generic code, the generic code is not what's creating the
object. It got the object from the calling code or from some callback
that allocated it, in which case it doesn't have to care about any of
this.

> >But normally you don't use declared-type objects for this anyway. You
> >use the struct sockaddr * obtained from getaddrinfo as an abstract
> >pointer and never dereference it at all.
> 
> That's right.  Most of the time, we should be using getaddrinfo(3),
> which already provides the storage.  I don't know for sure if there
> are any cases that can't be rewritten to work that way.
> 
> However, there are some APIs that require you to allocate an object.
> For example recvfrom(2).  How would you recommend using recvfrom(2),
> or is it some API to avoid?  Example of usage:
> https://man7.org/tlpi/code/online/dist/sockets/id_echo_sv.c.html>.

If using allocated storage, there's nothing special you have to do.
But if you want to avoid malloc and use a declared object, you have a
few options:

If it's your socket and you know what address family it's associated
with, you just pass an object of that type. recvfrom will always
produce output of the same AF.

If you're in a situation like having been invoked from inetd, you can
use getsockname into a sockaddr_storage object or union or whatever,
then read out the family field (with memcpy if needed). Then you know
the AF to use. Or, you can just recvfrom into a sockaddr_storage
object, determine the family at that time, then memcpy the the
appropriate object type. Alternatively, for the common case where you
want a printable name for the address, you just pass it to getnameinfo
as-is and let getnameinfo deal with how to read it.

Rich


Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Eric Blake via Gcc
On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:

Regardless of the merits of the patch, let's not introduce typos:

> +++ b/man3type/sockaddr.3type
> @@ -120,6 +120,26 @@ .SH NOTES
>  .I 
>  and
>  .IR  .
> +.SH BUGS
> +.I sockaddr_storage
> +was designed back when strict aliasing wasn't a problem.
> +Back then,
> +one would define a variable of that type,
> +and then access it as any of the other
> +.IR sockaddr_ *
> +types,
> +depending on the value of the first member.
> +This is Undefined Behavior.
> +However, there is no way to use these APIs without invoking Unedfined 
> Behavior,

Undefined

> +either in the user program or in libc,
> +so it is still recommended to use this method.
> +The only correct way to use different types in an API is through a union.
> +However,
> +that union must be implemented in the library,
> +since the type must be shared between the library and user code,
> +so libc should be fixed by implementing
> +.I sockaddr_storage
> +as a union.
>  .SH SEE ALSO
>  .BR accept (2),
>  .BR bind (2),

Also, while I could raise the issue with the Austin Group on your
behalf to get the POSIX wording improved, I think it would work better
if you initiate a bug report rather than having me do it:

https://www.austingroupbugs.net/main_page.php

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Alejandro Colomar via Gcc

Hello Eric,

On 2/6/23 19:45, Eric Blake wrote:

On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:

Regardless of the merits of the patch, let's not introduce typos:


+++ b/man3type/sockaddr.3type
@@ -120,6 +120,26 @@ .SH NOTES
  .I 
  and
  .IR  .
+.SH BUGS
+.I sockaddr_storage
+was designed back when strict aliasing wasn't a problem.
+Back then,
+one would define a variable of that type,
+and then access it as any of the other
+.IR sockaddr_ *
+types,
+depending on the value of the first member.
+This is Undefined Behavior.
+However, there is no way to use these APIs without invoking Unedfined Behavior,


Undefined


+either in the user program or in libc,
+so it is still recommended to use this method.
+The only correct way to use different types in an API is through a union.
+However,
+that union must be implemented in the library,
+since the type must be shared between the library and user code,
+so libc should be fixed by implementing
+.I sockaddr_storage
+as a union.
  .SH SEE ALSO
  .BR accept (2),
  .BR bind (2),


Also, while I could raise the issue with the Austin Group on your
behalf to get the POSIX wording improved, I think it would work better
if you initiate a bug report rather than having me do it:

https://www.austingroupbugs.net/main_page.php


I gave it a try after a year since my last attempt.  The open group website is 
one of the worst I've ever seen.  I'm sorry, I can't get it to work.  Could you 
please report a bug on my behalf?


I have an opengroup account, and supposedly am subscribed to the austin-group-l 
mailing list.  But I'm not able to log it.


Thanks,

Alex





--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature