Re: [PATCH] sockaddr.3type: BUGS: Document that libc should be fixed using a union

2023-02-06 Thread Eric Blake via Gcc
On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:

Regardless of the merits of the patch, let's not introduce typos:

> +++ b/man3type/sockaddr.3type
> @@ -120,6 +120,26 @@ .SH NOTES
>  .I 
>  and
>  .IR  .
> +.SH BUGS
> +.I sockaddr_storage
> +was designed back when strict aliasing wasn't a problem.
> +Back then,
> +one would define a variable of that type,
> +and then access it as any of the other
> +.IR sockaddr_ *
> +types,
> +depending on the value of the first member.
> +This is Undefined Behavior.
> +However, there is no way to use these APIs without invoking Unedfined 
> Behavior,

Undefined

> +either in the user program or in libc,
> +so it is still recommended to use this method.
> +The only correct way to use different types in an API is through a union.
> +However,
> +that union must be implemented in the library,
> +since the type must be shared between the library and user code,
> +so libc should be fixed by implementing
> +.I sockaddr_storage
> +as a union.
>  .SH SEE ALSO
>  .BR accept (2),
>  .BR bind (2),

Also, while I could raise the issue with the Austin Group on your
behalf to get the POSIX wording improved, I think it would work better
if you initiate a bug report rather than having me do it:

https://www.austingroupbugs.net/main_page.php

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-03-30 Thread Eric Blake via Gcc
On Thu, Mar 30, 2023 at 07:13:11PM +0200, Alejandro Colomar wrote:
> POSIX.1 Issue 8 will fix the long-standing issue with sockaddr APIs,
> which inevitably caused UB either on user code, libc, or more likely,
> both.  sockaddr_storage has been clarified to be implemented in a manner
> that aliasing it is safe (suggesting a unnamed union, or other compiler
> magic).
> 
> Link: 
> Reported-by: Bastien Roucariès 
> Reported-by: Alejandro Colomar 
> Cc: glibc 
> Cc: GCC 
> Cc: Eric Blake 
> Cc: Stefan Puiu 
> Cc: Igor Sysoev 
> Cc: Rich Felker 
> Cc: Andrew Clayton 
> Cc: Richard Biener 
> Cc: Zack Weinberg 
> Cc: Florian Weimer 
> Cc: Joseph Myers 
> Cc: Jakub Jelinek 
> Cc: Sam James 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> Hi all,
> 
> This is my proposal for documenting the POSIX decission of fixing the
> definition of sockaddr_storage.  Bastien, I believe you had something
> similar in mind; please review.  Eric, thanks again for the fix!  Could
> you please also have a look at this?
> 
> Cheers,
> 
> Alex
> 
>  man3type/sockaddr.3type | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type
> index 32c3c5bd0..d1db87d5d 100644
> --- a/man3type/sockaddr.3type
> +++ b/man3type/sockaddr.3type
> @@ -23,6 +23,14 @@ .SH SYNOPSIS
>  .PP
>  .B struct sockaddr_storage {
>  .BR "sa_family_t ss_family;" "  /* Address family */"
> +.PP
> +.RS 4
> +/* This structure is not really implemented this way.  It may be
> +\&   implemented with an unnamed union or some compiler magic to
> +\&   avoid breaking aliasing rules when accessed as any other of the
> +\&   sockaddr_* structures documented in this page.  See CAVEATS.
> +\& */

Do we want similar comments in struct sockaddr and/or sockaddr_XX?

> +.RE
>  .B };
>  .PP
>  .BR typedef " /* ... */ " socklen_t;
> @@ -122,6 +130,20 @@ .SH NOTES
>  .I 
>  and
>  .IR  .
> +.SH CAVEATS
> +To avoid breaking aliasing rules,
> +programs that use functions that receive pointers to
> +.I sockaddr
> +structures should declare objects of type
> +.IR sockaddr_storage ,
> +which is defined in a way that it
> +can be accessed as any of the different structures defined in this page.
> +Failure to do so may result in Undefined Behavior.

Existing POSIX already requires sockaddr_storage to be suitably sized
and aligned to overlay with all other sockaddr* types.  What the
recent POSIX bug change does is add wording to emphasize that casts in
any of the 6 directions:

sockaddr* <-> sockaddr_XX*
sockaddr_storage* <-> sockaddr*
sockaddr_storage* <-> sockaddr_XX*

must allow the sa_family/ss_family/sa_family_t member to overlay
without triggering undefined behavior due to bad aliasing, at which
point, access to that member lets you deduce what other object type
you really have.  But you are also correct that merely casting a
pointer to another larger struct that doesn't trigger aliasing, but
then dereferencing beyond the bounds of the original, is not intended
to be portable.  The aliasing diagnostics are suppressed because of
the requirements on the first member, so now the user must now be
careful that their access of remaining members is safe even if the
compiler is no longer helping them because of the magic that
suppressed the aliasing detection.

I agree with your warning that code that can handle generic socket
types should use sockaddr_storage (and not sockaddr) as the original
object (the one object that the standard requires to be suitably sized
and aligned to overlay with the entirety of all other sockaddr types,
rather than just the sa_family_t first member), although we may want
to be more precise that code using a specific protocol type can
directly use the proper sockaddr_XX type rather than having to use an
intermediate sockaddr_storage.

I'm not sure if there are better ways to word that paragraph to convey
the intended sentiment.

> +.PP
> +New functions should be written to accept pointers to
> +.I sockaddr_storage
> +instead of the traditional
> +.IR sockaddr .

I'm less certain about this one.  The POSIX wording specifically chose
to keep existing API/ABI of sockaddr* in all the standardized
functions unchanged, as it would be too invasive to existing code to
change the signatures now.  The burden is on the system headers to
define types so that the necessary casts (present in lots of existing
code because sockaddr* has a bit more type-safety than void*) do not
of themselves cause aliasing issues, and therefore avoid undefined
behavior provided subsequent code accessing through the pointers is
not accessing beyond the bounds of the real object.  The likelihood of
POSIX adding new socket APIs taking sockaddr_storage* just to enforce
non-aliasing seems slim.  But then again, this advice applies to more
than just functions likely to be standardized in a future libc, so
maybe this paragraph is worth it after all.

>  .SH SEE ALSO
>  .BR

Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-06 Thread Eric Blake via Gcc
On Wed, Apr 05, 2023 at 02:42:04AM +0200, Alejandro Colomar wrote:
> Hi Eric,
> 
> I'm going to reply both your emails here so that GCC is CCed, and they can
> suggest better stuff.  I'm worried about sending something to POSIX without
> enough eyes checking it.  So this will be a long email.

Because your mail landed in a publicly archived mailing list, the
POSIX folks saw it anyways ;)

...
> > 
> > Whether gcc already has all the attributes you need is not my area of
> > expertise.  In my skim of the glibc list conversation, I saw mention
> > of attribute [[gnu:transparent_union]] rather than [[__may_alias__]] -
> > if that's a better implementation-defined extension that does what we
> > need, then use it.  The standard developers were a bit uncomfortable
> > directly putting [[gnu:transparent_union]] in the standard, but
> > [[__may_alias__]] was noncontroversial (it's in the namespace reserved
> > for the implementation)
> 
> Not really; implementation-defined attributes are required to use an
> implementation-defined prefix like 'gnu::'.  So [[__may_alias__]] is
> reserved by ISO C, AFAIR.  Maybe it would be better to just mention
> attributes without any specific attribute name; being fuzzy about it
> would help avoid making promises that we can't hold.

On this point, the group agreed, and we intentionally loosened to
wording to just mention an implementation-defined extension, rather
than giving any specific attribute name.

...
> 
> I would just make it more fuzzy about which standard version did what.
> How about this?:
> 
> [[
> Note that defining the sockaddr_storage and sockaddr structures using
> only mechanisms defined in editions of the ISO C standard may produce
> aliasing diagnostics.  Because of the large body of existing code
> utilizing sockets in a way that could trigger undefined behavior due
> to strict aliasing rules, this standard mandates that the various socket
> address structures can alias each other for accessing their first member,

The sa_family_t member is not necessarily the first member on all
platforms (it happens to be first in Linux, but as a counter-example,
https://man.freebsd.org/cgi/man.cgi?query=unix&sektion=4 shows
sun_family as the second one-byte field in struct sockaddr_un).  The
emphasis is on derefencing the family member (whatever offset it is
at) to learn what cast to use to then safely access the rest of the
storage.

As such, here's the updated wording that the Austin Group tried today
(and we plan on starting a 30-day interpretation feedback window if
there are still adjustments to be made to the POSIX wording):

https://austingroupbugs.net/view.php?id=1641#c6255

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-21 Thread Eric Blake via Gcc
On Fri, Apr 21, 2023 at 05:00:14PM +0200, Alejandro Colomar wrote:
> > 
> > The wording I see in 
> > doesn't seem to cover the case of aliasing a sockaddr_storage as a
> > protocol-specific address for setting other members.
> > 
> > Aliasing rules don't allow one to declare an object of type
> > sockaddr_storage and then fill the structure as if it were another
> > structure, even if alignment and size are correct.  We would need
> > some wording that says something like:
> > 
> > When a pointer to a sockaddr_storage structure is first aliased as a
> > pointer to a protocol-specific address structure, the effective type
> > of the object will be set to the protocol-specific structure.

I'll add that as a comment to the Austin Group page; it seems like a
reasonable statement of intent (POSIX already says that struct
sockaddr_storage is sufficiently sized and aligned; all that remains
is for the compiler to be aware that we intend to use a
more-appropriate effective type once we have the storage allocated).

> > 
> > This is similar to what happens when malloc(3) is assigned to a
> > non-character type.  That's a big hammer, but it does the job.  Maybe
> > we would need some looser language?  I CCd GCC, in case they have
> > concerns about this wording.
> > 
> > Cheers,
> > Alex
> > 
> >>
> >> I quite like this way of putting it.  It subsumes both what I wrote and 
> >> the related potential headache with deciding whether the sa_family_t 
> >> field is considered an object or just a range of bytes within a larger 
> >> object.
> >>
> >> zw
> > 
> 
> For the man pages, I've rewritten it to the following:
> 
> 
> $ git diff
> diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type
> index 2fdf56c59..e610aa0f5 100644
> --- a/man3type/sockaddr.3type
> +++ b/man3type/sockaddr.3type
> @@ -117,6 +117,14 @@ .SH HISTORY
>  was invented by POSIX.
>  See also
>  .BR accept (2).
> +.PP
> +These structures were invented before modern ISO C strict-aliasing rules.
> +If aliasing rules are applied strictly,
> +these structures would be impossible to use

Maybe "extremely difficult" instead of "impossible" to use (if I
understand this thread correctly, it is possible to memcpy() from one
struct into different storage of a different effective type where the
memcpy()'s intermediate aliasing through char* avoids the UB).

> +without invoking Undefined Behavior (UB).
> +POSIX Issue 8 will fix this by requiring that implementations
> +make sure that these structures
> +can be safely used as they were designed.
>  .SH NOTES
>  .I socklen_t
>  is also defined in
> 
> 
> I guess this is simple enough that it should work as documentation.

It seems fine from my perspective.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [PATCH v3] sockaddr.3type: POSIX Issue 8 will solve strict-aliasing issues with these types

2023-04-21 Thread Eric Blake via Gcc
On Fri, Apr 21, 2023 at 10:27:18PM +0200, Alejandro Colomar wrote:
> Link: 
> Reported-by: Bastien Roucariès 
> Reported-by: Alejandro Colomar 
> Reviewed-by: Eric Blake 
> Cc: glibc 
> Cc: GCC 
> Cc: Stefan Puiu 
> Cc: Igor Sysoev 
> Cc: Rich Felker 
> Cc: Andrew Clayton 
> Cc: Richard Biener 
> Cc: Zack Weinberg 
> Cc: Florian Weimer 
> Cc: Joseph Myers 
> Cc: Jakub Jelinek 
> Cc: Sam James 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> Hi Eric,
> 
> I took your informal review as a "Reviewed-by".  Please confirm.
> I've also modified the small wording thingy you suggested.

For the avoidance of doubt ;)

Reviewed-by: Eric Blake 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org