https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82946
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 16 Nov 2017, msebor at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82946 > > --- Comment #6 from Martin Sebor <msebor at gcc dot gnu.org> --- > (In reply to rguent...@suse.de from comment #5) > > This means you can very well replace memcpy with strcpy if you know > > there's a '\0' in and only in the right place. > > Sure, except when dealing with a string literal we know that the source is a > string literal and not a pointer representation disguised as a sequence of > bytes. The optimization I'm referring to is specifically for string literals: > > unsigned g (struct A *a) > { > strcpy (a->d, "123"); // here we have a literal, not the representation > of a pointer > return strlen (a->d); // a->d must be a valid pointer > } > > > We certainly have to treat literal pointers encoded in any form > > conservatively. I don't see how they are against any standard. There's > > other clearly "valid" optimizations missing in GCC that look more > > important to implement. > > The C and C++ standards are clear as to what are valid pointer values and how > they can come about. Copying the representation from an arbitrary constant of > an incompatible type into a pointer object is certainly not one of them. > I.e., > this: > > const char a[] = "123"; > char *p; > memcpy (&p, a, sizeof p); > strlen (p); > > is undefined, but this is of course valid: > > const char a[4] = "123"; > char *p; > char *q = a; > memcpy (&p, &q, sizeof p); > strlen (p); > > because it just copies the representation of what's known to be a valid > pointer > value into another pointer object of a compatible type. > > The point is that the bytes of no string literal can also be a valid pointer > value, even if it happens to have the same representation as one, and this can > be exploited to allow the optimization above. It will not invalidate any > correct programs. It would be not only invalid but downright silly for a > program to represent valid addresses as string literals. Embedded programs of > course do hardcode pointer values, but not in string literals: they hardcode > them as integers, e.g., > > void *my_register = (void*)0x123; > > but never like so: > > char my_register[] = "123"; Ok. I guess I have some patches somewhere that "properly" distinguish string literals during points-to analysis which might help this case. Or maybe not. As with the other cases I have a hard time to imagine how to implement and to transfer such knowledge to the alias-oracle / IL.