Re: struct sockaddr_storage

2023-01-24 Thread Rich Felker
On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote:
> Hi Alex,
> 
> On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
>  wrote:
> >
> > Hi!
> >
> > I just received a report about struct sockaddr_storage in the man pages.  It
> > reminded me of some concern I've always had about it: it doesn't seem to be 
> > a
> > usable type.
> >
> > It has some alignment promises that make it "just work" most of the time, 
> > but
> > it's still a UB mine, according to ISO C.
> >
> > According to strict aliasing rules, if you declare a variable of type 
> > 'struct
> > sockaddr_storage', that's what you get, and trying to access it later as 
> > some
> > other sockaddr_8 is simply not legal.  The compiler may assume those 
> > accesses
> > can't happen, and optimize as it pleases.
> 
> Can you detail the "is not legal" part? How about the APIs like
> connect() etc that use pointers to struct sockaddr, where the
> underlying type is different, why would that be legal while using
> sockaddr_storage isn't?

Because they're specified to take different types. In C, any struct
pointer type can legally point to any other struct type. You just
can't dereference through it with the wrong type. How the
implementation of connect etc. handle this is an implementation
detail. You're allowed to pass pointers to struct sockaddr_in, etc. to
connect etc. simply because the specification says you are.

In any case, sockaddr_storage is a legacy thing designed by folks who
didn't understand the rules of the C language. It should never appear
in modern code except perhaps with sizeof for allocting buffers. There
is no action that needs to be taken here except documenting that it
should not be used (cannot be used meaningfully without UB).

Rich


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Eli Zaretskii via Gcc
> Date: Mon, 23 Jan 2023 15:00:56 -0800
> Cc: gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org
> From: Ian Lance Taylor via Gcc 
> 
> > +#ifdef HAVE_WINDOWS_H
> > +
> > +static char *
> > +windows_get_executable_path (char *buf, backtrace_error_callback 
> > error_callback,
> > +void *data)
> > +{
> > +  if (GetModuleFileNameA (NULL, buf, MAX_PATH - 1) == 0)
> > +{
> > +  error_callback (data,
> > + "could not get the filename of the current 
> > executable",
> > + (int) GetLastError ());
> > +  return NULL;
> > +}
> > +  return buf;
> > +}
> 
> Thanks, but this seems incomplete.  The docs for GetModuleFileNameA
> say that if the pathname is too long to fit into the buffer it returns
> the size of the buffer and sets the error to
> ERROR_INSUFFICIENT_BUFFER.  It seems to me that in that case we should
> allocate a larger buffer and try again.

This is correct in general, but not in this particular case.

> On Windows it seems that MAX_PATH is not
> a true limit, as an extended length path may be up to 32767 bytes.

The limit of 32767 characters (not bytes, AFAIK) is only applicable
when using the Unicode (a.k.a. "wide") versions of the Windows Win32
APIs, see

  
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

Since the above code uses GetModuleFileNameA, which is an "ANSI"
single-byte API, it is still subject to the MAX_PATH limitation, and
MAX_PATH is defined as 260 on Windows headers.


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Ian Lance Taylor via Gcc
On Tue, Jan 24, 2023 at 5:11 AM Eli Zaretskii via Gcc-patches
 wrote:
>
> > Date: Mon, 23 Jan 2023 15:00:56 -0800
> > Cc: gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org
> > From: Ian Lance Taylor via Gcc 
> >
> > > +#ifdef HAVE_WINDOWS_H
> > > +
> > > +static char *
> > > +windows_get_executable_path (char *buf, backtrace_error_callback 
> > > error_callback,
> > > +void *data)
> > > +{
> > > +  if (GetModuleFileNameA (NULL, buf, MAX_PATH - 1) == 0)
> > > +{
> > > +  error_callback (data,
> > > + "could not get the filename of the current 
> > > executable",
> > > + (int) GetLastError ());
> > > +  return NULL;
> > > +}
> > > +  return buf;
> > > +}
> >
> > Thanks, but this seems incomplete.  The docs for GetModuleFileNameA
> > say that if the pathname is too long to fit into the buffer it returns
> > the size of the buffer and sets the error to
> > ERROR_INSUFFICIENT_BUFFER.  It seems to me that in that case we should
> > allocate a larger buffer and try again.
>
> This is correct in general, but not in this particular case.
>
> > On Windows it seems that MAX_PATH is not
> > a true limit, as an extended length path may be up to 32767 bytes.
>
> The limit of 32767 characters (not bytes, AFAIK) is only applicable
> when using the Unicode (a.k.a. "wide") versions of the Windows Win32
> APIs, see
>
>   
> https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
>
> Since the above code uses GetModuleFileNameA, which is an "ANSI"
> single-byte API, it is still subject to the MAX_PATH limitation, and
> MAX_PATH is defined as 260 on Windows headers.

Thanks.  Should this code be using GetModuleFileNameW?  Or would that
mean that the later call to open will fail?

260 bytes does not seem like very much for a path name these days.

Ian


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi Richard,

On 1/23/23 17:28, Richard Biener wrote:

The common initial sequence of structures is only allowed if the structures form
part of a union (which is why to avoid UB you need a union; and still, you need
to make sure you don't invoke UB in a different way).



GCC only allows it if the union is visible as part of the access, that
is, it allows it
under its rule of allowing punning for union accesses and not specially because
of the initial sequence rule.  So

  u.a.x = 1;
  ... = u.b.x;

is allowed but

   struct A *p = &u.a;
   p->x = 1;
   struct B *q = &u.b;
   ... = q->x;

is UB with GCC if struct A and B are the union members with a common
initial sequence.


Yep.  That's why we need a union that is defined in libc, so that it can 
be used both in and out of glibc.  sockaddr_storage can be reconverted 
to that purpose.


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi Jakub,

On 1/23/23 17:37, Jakub Jelinek wrote:

Please see transparent_union documentation in GCC documentation.
E.g. 
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute
transparent_union doesn't change anything regarding type punning, it is
solely about function arguments, how arguments of that type are passed
(as first union member) and that no casts to the union are needed from
the member types.


Yep, when I wrote that I didn't fully understand it.  Now I got it. 
I'll prepare some better suggestion about a fix.


Thanks.


And, with LTO TU boundaries are lost, inlining or other IPA optimizations
(including say modref) work in between TUs.


Yeah, that's why we need a fix.  Compilers will only get better at 
optimizing, so UB will sooner or later be a problem.


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi Rick,

On 1/24/23 12:16, Rich Felker wrote:

On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote:

Hi Alex,

On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
 wrote:


Hi!

I just received a report about struct sockaddr_storage in the man pages.  It
reminded me of some concern I've always had about it: it doesn't seem to be a
usable type.

It has some alignment promises that make it "just work" most of the time, but
it's still a UB mine, according to ISO C.

According to strict aliasing rules, if you declare a variable of type 'struct
sockaddr_storage', that's what you get, and trying to access it later as some
other sockaddr_8 is simply not legal.  The compiler may assume those accesses
can't happen, and optimize as it pleases.


Can you detail the "is not legal" part? How about the APIs like
connect() etc that use pointers to struct sockaddr, where the
underlying type is different, why would that be legal while using
sockaddr_storage isn't?


Because they're specified to take different types. In C, any struct
pointer type can legally point to any other struct type. You just
can't dereference through it with the wrong type.


Yep.  Which basically means that users need to treat sockaddr structures 
as black boxes.  Otherwise, there's going to be undefined behavior at 
some point.  Because of course, you can't know the right type before 
reading the first field, which is already UB.



How the
implementation of connect etc. handle this is an implementation
detail. You're allowed to pass pointers to struct sockaddr_in, etc. to
connect etc. simply because the specification says you are.


While the implementation has some more freedom regarding UB, in this 
case it's waiting for a compiler optimization to break this code, so I'd 
go the safe way and use standard C techniques in libc so that there are 
no long-term UB issues.


As a side effect, user code that currently invokes UB could be changed 
to have defined behavior.




In any case, sockaddr_storage is a legacy thing designed by folks who
didn't understand the rules of the C language. It should never appear
in modern code except perhaps with sizeof for allocting buffers. There
is no action that needs to be taken here except documenting that it
should not be used (cannot be used meaningfully without UB).


I agree with you on this.  sockaddr_storage has been broken since day 0. 
 However, for designing a solution for libc using unions, it could be 
useful.




Rich


Cheers,

Alex

--




OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Eli Zaretskii via Gcc
> From: Ian Lance Taylor 
> Date: Tue, 24 Jan 2023 06:35:21 -0800
> Cc: g...@hazardy.de, gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org
> 
> > > On Windows it seems that MAX_PATH is not
> > > a true limit, as an extended length path may be up to 32767 bytes.
> >
> > The limit of 32767 characters (not bytes, AFAIK) is only applicable
> > when using the Unicode (a.k.a. "wide") versions of the Windows Win32
> > APIs, see
> >
> >   
> > https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
> >
> > Since the above code uses GetModuleFileNameA, which is an "ANSI"
> > single-byte API, it is still subject to the MAX_PATH limitation, and
> > MAX_PATH is defined as 260 on Windows headers.
> 
> Thanks.  Should this code be using GetModuleFileNameW?  Or would that
> mean that the later call to open will fail?

We'd need to use _wopen or somesuch, and the file name will have to be
a wchar_t array, not a char array, yes.  So this is not very practical
when file names need to be passed between functions, unless they are
converted to UTF-8 (and back again before using them in Windows APIs).

And note that even then, the 260-byte limit could be lifted only if
the user has a new enough Windows version _and_ has opted in to the
long-name feature by turning it on in the Registry.  Otherwise, file
names used in "wide" APIs can only break the 260-byte limit if they
use the special format "\\?\D:\foo\bar", which means file names
specified by user outside of the program or file names that come from
other programs will need to be reformatted to this special format.

> 260 bytes does not seem like very much for a path name these days.

That's true.  But complications with using longer file names are still
a PITA on Windows, even though they are a step closer to practically
possible.


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Ian Lance Taylor via Gcc
On Tue, Jan 24, 2023 at 8:53 AM Eli Zaretskii via Gcc-patches
 wrote:
>
> > From: Ian Lance Taylor 
> > Date: Tue, 24 Jan 2023 06:35:21 -0800
> > Cc: g...@hazardy.de, gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org
> >
> > > > On Windows it seems that MAX_PATH is not
> > > > a true limit, as an extended length path may be up to 32767 bytes.
> > >
> > > The limit of 32767 characters (not bytes, AFAIK) is only applicable
> > > when using the Unicode (a.k.a. "wide") versions of the Windows Win32
> > > APIs, see
> > >
> > >   
> > > https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
> > >
> > > Since the above code uses GetModuleFileNameA, which is an "ANSI"
> > > single-byte API, it is still subject to the MAX_PATH limitation, and
> > > MAX_PATH is defined as 260 on Windows headers.
> >
> > Thanks.  Should this code be using GetModuleFileNameW?  Or would that
> > mean that the later call to open will fail?
>
> We'd need to use _wopen or somesuch, and the file name will have to be
> a wchar_t array, not a char array, yes.  So this is not very practical
> when file names need to be passed between functions, unless they are
> converted to UTF-8 (and back again before using them in Windows APIs).
>
> And note that even then, the 260-byte limit could be lifted only if
> the user has a new enough Windows version _and_ has opted in to the
> long-name feature by turning it on in the Registry.  Otherwise, file
> names used in "wide" APIs can only break the 260-byte limit if they
> use the special format "\\?\D:\foo\bar", which means file names
> specified by user outside of the program or file names that come from
> other programs will need to be reformatted to this special format.
>
> > 260 bytes does not seem like very much for a path name these days.
>
> That's true.  But complications with using longer file names are still
> a PITA on Windows, even though they are a step closer to practically
> possible.


OK, thanks.

I'd rather that the patch look like the appended.  Can someone with a
Windows system test to see what that builds and passes the tests?
Thanks.

Ian


Re: struct sockaddr_storage

2023-01-24 Thread Alex Colomar via Gcc

Hi,

After reading more about transparent_unit, here's my idea of a fix for 
the API.  old_api() is an example for the libc functions that accept a 
`struct sockaddr *`, and user_code() is an example for user code 
functions that handle sockaddr structures.  The interface would be the 
same as it is currently, but the implementation inside libc would change 
to use a union.  In user code, uses of sockaddr_storage would be made 
safe with these changes, I believe, and new code would be simpler, since 
it wouldn't need casts.




void old_api(union my_sockaddr_ptr *sa);


struct sockaddr_storage {
union {
struct {
sa_family_t  ss_family;
};
struct sockaddr_in   sin;
struct sockaddr_in6  sin6;
struct sockaddr_un   sun;
// ...
};
};


union [[gnu::transparent_union]] sockaddr_ptr {
struct sockaddr_storage  *ss;
struct sockaddr  *sa;
};


void old_api(struct sockaddr_storage *ss)
{
// Here libc uses the union, so it doesn't invoke UB.
ss->sun.sa_family = AF_UNIX;
//...
}


void user_code(void)
{
struct my_sockaddr_storage  ss;  // object definition

// ...

old_api(&ss);  // The transparent_union allows no casts.

switch (ss.ss_family) {
// This is safe too.
// thanks to common initial sequence within a union.
}
}


This would in fact deprecate plain `struct sockaddr`, as Bastien suggested.


Cheers,

Alex


--




OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Eli Zaretskii via Gcc
> From: Ian Lance Taylor 
> Date: Tue, 24 Jan 2023 09:58:10 -0800
> Cc: g...@hazardy.de, gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org
> 
> I'd rather that the patch look like the appended.  Can someone with a
> Windows system test to see what that builds and passes the tests?

ENOPATCH


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Ian Lance Taylor via Gcc
On Tue, Jan 24, 2023 at 10:12 AM Eli Zaretskii via Gcc-patches
 wrote:
>
> > From: Ian Lance Taylor 
> > Date: Tue, 24 Jan 2023 09:58:10 -0800
> > Cc: g...@hazardy.de, gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org
> >
> > I'd rather that the patch look like the appended.  Can someone with a
> > Windows system test to see what that builds and passes the tests?
>
> ENOPATCH

Gah.

Ian
diff --git a/libbacktrace/config.h.in b/libbacktrace/config.h.in
index 94621c2e385..29d1ad3911a 100644
--- a/libbacktrace/config.h.in
+++ b/libbacktrace/config.h.in
@@ -100,6 +100,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_UNISTD_H
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_WINDOWS_H
+
 /* Define if -lz is available. */
 #undef HAVE_ZLIB
 
diff --git a/libbacktrace/configure b/libbacktrace/configure
index 6af2c04c81a..0a27cfb7799 100755
--- a/libbacktrace/configure
+++ b/libbacktrace/configure
@@ -13409,6 +13409,19 @@ $as_echo "#define HAVE_LOADQUERY 1" >>confdefs.h
 
 fi
 
+for ac_header in windows.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "windows.h" "ac_cv_header_windows_h" 
"$ac_includes_default"
+if test "x$ac_cv_header_windows_h" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_WINDOWS_H 1
+_ACEOF
+
+fi
+
+done
+
+
 # Check for the fcntl function.
 if test -n "${with_target_subdir}"; then
case "${host}" in
diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac
index 39e6bf41e35..e3e10abd7b5 100644
--- a/libbacktrace/configure.ac
+++ b/libbacktrace/configure.ac
@@ -377,6 +377,8 @@ if test "$have_loadquery" = "yes"; then
   AC_DEFINE(HAVE_LOADQUERY, 1, [Define if AIX loadquery is available.])
 fi
 
+AC_CHECK_HEADERS(windows.h)
+
 # Check for the fcntl function.
 if test -n "${with_target_subdir}"; then
case "${host}" in
diff --git a/libbacktrace/fileline.c b/libbacktrace/fileline.c
index 674bf33cdcf..e110b54ee24 100644
--- a/libbacktrace/fileline.c
+++ b/libbacktrace/fileline.c
@@ -47,6 +47,18 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #include 
 #endif
 
+#ifdef HAVE_WINDOWS_H
+#ifndef WIN32_MEAN_AND_LEAN
+#define WIN32_MEAN_AND_LEAN
+#endif
+
+#ifndef NOMINMAX
+#define NOMINMAX
+#endif
+
+#include 
+#endif
+
 #include "backtrace.h"
 #include "internal.h"
 
@@ -155,6 +167,27 @@ macho_get_executable_path (struct backtrace_state *state,
 
 #endif /* !defined (HAVE_MACH_O_DYLD_H) */
 
+#ifdef HAVE_WINDOWS_H
+
+static char *
+windows_get_executable_path (char *buf, backtrace_error_callback 
error_callback,
+void *data)
+{
+  size_t got;
+
+  got = GetModuleFileNameA (NULL, buf, MAX_PATH - 1);
+  if (got == 0
+  || (got == MAX_PATH - 1 && GetLastError () == ERROR_INSUFFICIENT_BUFFER))
+return NULL;
+  return buf;
+}
+
+#else /* !defined (HAVE_WINDOWS_H) */
+
+#define windows_get_executable_path(buf, error_callback, data) NULL
+
+#endif /* !defined (HAVE_WINDOWS_H) */
+
 /* Initialize the fileline information from the executable.  Returns 1
on success, 0 on failure.  */
 
@@ -168,7 +201,11 @@ fileline_initialize (struct backtrace_state *state,
   int called_error_callback;
   int descriptor;
   const char *filename;
+#ifdef HAVE_WINDOWS_H
+  char buf[MAX_PATH];
+#else
   char buf[64];
+#endif
 
   if (!state->threaded)
 failed = state->fileline_initialization_failed;
@@ -192,7 +229,7 @@ fileline_initialize (struct backtrace_state *state,
 
   descriptor = -1;
   called_error_callback = 0;
-  for (pass = 0; pass < 8; ++pass)
+  for (pass = 0; pass < 9; ++pass)
 {
   int does_not_exist;
 
@@ -224,6 +261,9 @@ fileline_initialize (struct backtrace_state *state,
case 7:
  filename = macho_get_executable_path (state, error_callback, data);
  break;
+   case 8:
+ filename = windows_get_executable_path (buf, error_callback, data);
+ break;
default:
  abort ();
}


Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-24 Thread Björn Schäpers

Am 24.01.2023 um 17:52 schrieb Eli Zaretskii:

From: Ian Lance Taylor 
Date: Tue, 24 Jan 2023 06:35:21 -0800
Cc: g...@hazardy.de, gcc-patc...@gcc.gnu.org, gcc@gcc.gnu.org


On Windows it seems that MAX_PATH is not
a true limit, as an extended length path may be up to 32767 bytes.


The limit of 32767 characters (not bytes, AFAIK) is only applicable
when using the Unicode (a.k.a. "wide") versions of the Windows Win32
APIs, see

   
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

Since the above code uses GetModuleFileNameA, which is an "ANSI"
single-byte API, it is still subject to the MAX_PATH limitation, and
MAX_PATH is defined as 260 on Windows headers.


Thanks.  Should this code be using GetModuleFileNameW?  Or would that
mean that the later call to open will fail?


We'd need to use _wopen or somesuch, and the file name will have to be
a wchar_t array, not a char array, yes.  So this is not very practical
when file names need to be passed between functions, unless they are
converted to UTF-8 (and back again before using them in Windows APIs).

And note that even then, the 260-byte limit could be lifted only if
the user has a new enough Windows version _and_ has opted in to the
long-name feature by turning it on in the Registry.  Otherwise, file
names used in "wide" APIs can only break the 260-byte limit if they
use the special format "\\?\D:\foo\bar", which means file names
specified by user outside of the program or file names that come from
other programs will need to be reformatted to this special format.


260 bytes does not seem like very much for a path name these days.


That's true.  But complications with using longer file names are still
a PITA on Windows, even though they are a step closer to practically
possible.


That was basically also my reasoning for choosing the A variant instead of W.