Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Jakub Jelinek
On Sun, Oct 01, 2017 at 03:52:39PM -0600, Martin Sebor wrote:
> While debugging some of my tests I noticed unexpected differences
> between the results depending on whether or not the stpcpy function
> is declared.  It turns out that the differences are caused by
> the handle_builtin_strcpy function in tree-ssa-strlen.c testing
> for stpcpy having been declared:
> 
>   if (srclen == NULL_TREE)
> switch (bcode)
>   {
>   case BUILT_IN_STRCPY:
>   case BUILT_IN_STRCPY_CHK:
>   case BUILT_IN_STRCPY_CHKP:
>   case BUILT_IN_STRCPY_CHK_CHKP:
>   if (lhs != NULL_TREE || !builtin_decl_implicit_p (BUILT_IN_STPCPY))
> return;
> 
> and taking different paths depending on whether or not the test
> succeeds.
> 
> As far as can see, the tests have been there since the pass was
> added, but I don't understand from the comments in the file what
> their purpose is or why optimization decisions involving one set
> of functions (I think strcpy and strcat at a minimum) are based
> on whether another function has been declared or not.
> 
> Can you explain what they're for?

The reason is that stpcpy is not a standard C function, so in non-POSIX
environments one could have stpcpy with completely unrelated prototype
used for something else.  In such case we don't want to introduce stpcpy
into a TU that didn't have such a call.  So, we use the existence of
a matching prototype as a sign that stpcpy can be synthetized.

Jakub


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Alan Modra
On Mon, Oct 02, 2017 at 09:11:53AM +0200, Jakub Jelinek wrote:
> On Sun, Oct 01, 2017 at 03:52:39PM -0600, Martin Sebor wrote:
> > While debugging some of my tests I noticed unexpected differences
> > between the results depending on whether or not the stpcpy function
> > is declared.  It turns out that the differences are caused by
> > the handle_builtin_strcpy function in tree-ssa-strlen.c testing
> > for stpcpy having been declared:
> > 
> >   if (srclen == NULL_TREE)
> > switch (bcode)
> >   {
> >   case BUILT_IN_STRCPY:
> >   case BUILT_IN_STRCPY_CHK:
> >   case BUILT_IN_STRCPY_CHKP:
> >   case BUILT_IN_STRCPY_CHK_CHKP:
> > if (lhs != NULL_TREE || !builtin_decl_implicit_p (BUILT_IN_STPCPY))
> >   return;
> > 
> > and taking different paths depending on whether or not the test
> > succeeds.
> > 
> > As far as can see, the tests have been there since the pass was
> > added, but I don't understand from the comments in the file what
> > their purpose is or why optimization decisions involving one set
> > of functions (I think strcpy and strcat at a minimum) are based
> > on whether another function has been declared or not.
> > 
> > Can you explain what they're for?
> 
> The reason is that stpcpy is not a standard C function, so in non-POSIX
> environments one could have stpcpy with completely unrelated prototype
> used for something else.  In such case we don't want to introduce stpcpy
> into a TU that didn't have such a call.  So, we use the existence of
> a matching prototype as a sign that stpcpy can be synthetized.

Why is the test for stpcpy being declared done for the strcpy cases
rather than the stpcpy cases?

-- 
Alan Modra
Australia Development Lab, IBM


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Jakub Jelinek
On Mon, Oct 02, 2017 at 09:05:06PM +1030, Alan Modra wrote:
> > > and taking different paths depending on whether or not the test
> > > succeeds.
> > > 
> > > As far as can see, the tests have been there since the pass was
> > > added, but I don't understand from the comments in the file what
> > > their purpose is or why optimization decisions involving one set
> > > of functions (I think strcpy and strcat at a minimum) are based
> > > on whether another function has been declared or not.
> > > 
> > > Can you explain what they're for?
> > 
> > The reason is that stpcpy is not a standard C function, so in non-POSIX
> > environments one could have stpcpy with completely unrelated prototype
> > used for something else.  In such case we don't want to introduce stpcpy
> > into a TU that didn't have such a call.  So, we use the existence of
> > a matching prototype as a sign that stpcpy can be synthetized.
> 
> Why is the test for stpcpy being declared done for the strcpy cases
> rather than the stpcpy cases?

Because the optimization is about strcpy followed by some call that would
like to know the length of the string, so we want to replace the strcpy call
by stpcpy and use the lhs of the stpcpy call - the first argument as the
length instead of yet another strlen call (or similar).

If we don't know that stpcpy is available and can be safely used, we can't
do that.
The reason why a matching prototype is sufficient is that if you have a
matching prototype (and no -fno-builtin-stpcpy or -fno-builtin) and have
calls to that function in your code, GCC considers it a builtin and
optimizes them according to the behavior of the builtin.

Jakub


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Martin Sebor

On 10/02/2017 04:40 AM, Jakub Jelinek wrote:

On Mon, Oct 02, 2017 at 09:05:06PM +1030, Alan Modra wrote:

and taking different paths depending on whether or not the test
succeeds.

As far as can see, the tests have been there since the pass was
added, but I don't understand from the comments in the file what
their purpose is or why optimization decisions involving one set
of functions (I think strcpy and strcat at a minimum) are based
on whether another function has been declared or not.

Can you explain what they're for?


The reason is that stpcpy is not a standard C function, so in non-POSIX
environments one could have stpcpy with completely unrelated prototype
used for something else.  In such case we don't want to introduce stpcpy
into a TU that didn't have such a call.  So, we use the existence of
a matching prototype as a sign that stpcpy can be synthetized.


Why is the test for stpcpy being declared done for the strcpy cases
rather than the stpcpy cases?


Because the optimization is about strcpy followed by some call that would
like to know the length of the string, so we want to replace the strcpy call
by stpcpy and use the lhs of the stpcpy call - the first argument as the
length instead of yet another strlen call (or similar).

If we don't know that stpcpy is available and can be safely used, we can't
do that.
The reason why a matching prototype is sufficient is that if you have a
matching prototype (and no -fno-builtin-stpcpy or -fno-builtin) and have
calls to that function in your code, GCC considers it a builtin and
optimizes them according to the behavior of the builtin.


Thanks.  That makes sense to me.  The wrinkle with this approach
is that the same code (same function) has different effects on
the compiler (as in, is subject to different optimization
decisions, or can cause false positives/negatives) depending
whether some unrelated code (in another function) calls
__builtin_stpcpy or calls (and declares) stpcpy, or does neither.
This is probably not very common in application programs but it
does happen often in the GCC test suite (this is the second time
I've been bitten by it in just a few months).

IIUC, ideally, the decision whether or not to make
the transformation would be based on whether stpcpy is called
by the function on the result of a prior strcpy/strcat.  A less
ideal solution, but probably a good enough one to avoid the kind
of surprises I ran into, would only check whether stpcpy is called
by each function.  Is there a way to do either without making
overly intrusive changes to the pass?  It seems the latter should
be doable simply by scanning each function for calls to stpcpy
first, either by the strlen pass itself or by some earlier pass.

Or is there something I'm missing that makes this not feasible?

Martin



Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Jakub Jelinek
On Mon, Oct 02, 2017 at 09:00:41AM -0600, Martin Sebor wrote:
> Thanks.  That makes sense to me.  The wrinkle with this approach
> is that the same code (same function) has different effects on
> the compiler (as in, is subject to different optimization
> decisions, or can cause false positives/negatives) depending
> whether some unrelated code (in another function) calls
> __builtin_stpcpy or calls (and declares) stpcpy, or does neither.
> This is probably not very common in application programs but it
> does happen often in the GCC test suite (this is the second time
> I've been bitten by it in just a few months).

Why is that a problem?  In most user code, people just
#include  or #include  and depending on feature
test macros, either stpcpy is available, or not.
For GCC testsuite the tests that specially test for these transformations
have or intentionally don't have the stpcpy prototype available.

> IIUC, ideally, the decision whether or not to make
> the transformation would be based on whether stpcpy is called
> by the function on the result of a prior strcpy/strcat.  A less

I don't understand this suggestion.  Usually there is no stpcpy call
anywhere, we still want to make the transformation if we can assume
the library provides it.  So you'd penalize a lot of code for no benefit.

Jakub


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Martin Sebor

On 10/02/2017 09:06 AM, Jakub Jelinek wrote:

On Mon, Oct 02, 2017 at 09:00:41AM -0600, Martin Sebor wrote:

Thanks.  That makes sense to me.  The wrinkle with this approach
is that the same code (same function) has different effects on
the compiler (as in, is subject to different optimization
decisions, or can cause false positives/negatives) depending
whether some unrelated code (in another function) calls
__builtin_stpcpy or calls (and declares) stpcpy, or does neither.
This is probably not very common in application programs but it
does happen often in the GCC test suite (this is the second time
I've been bitten by it in just a few months).


Why is that a problem?  In most user code, people just
#include  or #include  and depending on feature
test macros, either stpcpy is available, or not.
For GCC testsuite the tests that specially test for these transformations
have or intentionally don't have the stpcpy prototype available.


It's a gotcha for those writing GCC tests who are unaware of this
subtlety.  Some tests that exercise both built-in functions define
macros to call them:

  #define stpcpy __builtin_stpcpy
  #define strcpy __builtin_strcpy

Other test declare them:

  extern char* stpcpy (char*, const char*);
  extern char* strcpy (char*, const char*);

Other tests still exercise one function at a time.  As I said,
it's surprising when the tests have different effects even though
the calls to these functions are otherwise identical, because for
other standard functions they behave the same.  I spent close to
an hour the other day debugging two of my tests side by side
trying to understand why they were behaving differently before
it dawned on me that the cause was in the strlen pass.


IIUC, ideally, the decision whether or not to make
the transformation would be based on whether stpcpy is called
by the function on the result of a prior strcpy/strcat.  A less


I don't understand this suggestion.  Usually there is no stpcpy call
anywhere, we still want to make the transformation if we can assume
the library provides it.  So you'd penalize a lot of code for no benefit.


Ah, okay I get it now.   After re-reading some of the comments
in the file and some more testing I see the pass transforms all
calls to strcpy to stpcpy whose source length is unknown and
the length of whose destination is later needed.  It does that
because the latter length can be computed more efficiently by
subtracting the stpcpy return value from the first argument.

And the decision whether or not to make use of stpcpy is based
on the presence of its declaration.

I also take back what I said about application programs being
unaffected by this.  Using the declaration to make these decisions
results in less optimal code when compiling in strict conformance
mode (e.g., -std=c11 or -std=c++14) than in "relaxed mode" (-std=
gnu11 or -std=gnu++14).  This can be seen using the following test
case:

  #include 

  void f (char *d, const char *s)
  {
strcpy (d, s);

if (__builtin_strlen (d) != __builtin_strlen (s))
   __builtin_abort ();
  }

I understand this is because, as you said, in strict mode, stpcpy
could be declared to be a different symbol.  After our discussion
I will (hopefully) remember this and avoid getting surprised by
it in the future.  But it still feels like a subtlety that should
be more prominently advertised somehow/somewhere to help others
avoid falling into the same trap.

Martin


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Jakub Jelinek
On Mon, Oct 02, 2017 at 10:41:59AM -0600, Martin Sebor wrote:
> I also take back what I said about application programs being
> unaffected by this.  Using the declaration to make these decisions
> results in less optimal code when compiling in strict conformance
> mode (e.g., -std=c11 or -std=c++14) than in "relaxed mode" (-std=
> gnu11 or -std=gnu++14).  This can be seen using the following test
> case:

Only for C, and even for -std=c99 or -std=c11 one can use -D_GNU_SOURCE
or -D_POSIX_C_SOURCE=200809 or -D_XOPEN_SOURCE=700 and various others
to make stpcpy available.  For C++ we define _GNU_SOURCE (unfortunately)
unconditionally.

>   #include 
> 
>   void f (char *d, const char *s)
>   {
> strcpy (d, s);
> 
> if (__builtin_strlen (d) != __builtin_strlen (s))
>__builtin_abort ();
>   }
> 
> I understand this is because, as you said, in strict mode, stpcpy
> could be declared to be a different symbol.  After our discussion
> I will (hopefully) remember this and avoid getting surprised by
> it in the future.  But it still feels like a subtlety that should
> be more prominently advertised somehow/somewhere to help others
> avoid falling into the same trap.

Why should it be advertised?  It is an optimization.  We use it when
we feel it is safe to do so.  It isn't something that should be documented
in user manuals IMNSHO.

Jakub


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Martin Sebor

On 10/02/2017 11:07 AM, Jakub Jelinek wrote:

On Mon, Oct 02, 2017 at 10:41:59AM -0600, Martin Sebor wrote:

I also take back what I said about application programs being
unaffected by this.  Using the declaration to make these decisions
results in less optimal code when compiling in strict conformance
mode (e.g., -std=c11 or -std=c++14) than in "relaxed mode" (-std=
gnu11 or -std=gnu++14).  This can be seen using the following test
case:


Only for C, and even for -std=c99 or -std=c11 one can use -D_GNU_SOURCE
or -D_POSIX_C_SOURCE=200809 or -D_XOPEN_SOURCE=700 and various others
to make stpcpy available.  For C++ we define _GNU_SOURCE (unfortunately)
unconditionally.


That's not what I see.  Making the stpcpy declaration available
doesn't enable the transformation in strict mode.  What's more,
in strict mode GCC transforms stpcpy calls to strcpy.




  #include 

  void f (char *d, const char *s)
  {
strcpy (d, s);

if (__builtin_strlen (d) != __builtin_strlen (s))
   __builtin_abort ();
  }

I understand this is because, as you said, in strict mode, stpcpy
could be declared to be a different symbol.  After our discussion
I will (hopefully) remember this and avoid getting surprised by
it in the future.  But it still feels like a subtlety that should
be more prominently advertised somehow/somewhere to help others
avoid falling into the same trap.


Why should it be advertised?  It is an optimization.  We use it when
we feel it is safe to do so.  It isn't something that should be documented
in user manuals IMNSHO.


By "others" I was referring to other GCC developers.  I also
had in mind something a little less subjective than "do it when
you feel it's safe."  Clearly, in the case of strcpy to stpcpy,
the decision doesn't depend on how we feel but on what a valid
program in a given conformance mode can do.

But I think it would be helpful even to users to document the
rules for when a call to a standard library function can be
expected to be emitted (or optimized inline) and when it can
be expected to be transformed to another.

In the case of strcpy, GCC not only transforms it to stpcpy but
it also does the opposite transformation.  I.e., in strict mode
it transforms even calls to __builtin_stpcpy to strcpy.  That
would make sense to me because of what we said about defining
one's own non-standard stpcpy.  But GCC doesn't do that for
calls to other such ("semi-standard") built-in functions.  For
example, for a call to __builtin_strdup GCC emits a call to
strdup regardless of the language conformance mode.  Ditto for
__builtin_index.  The inconsistency raises questions about what
is actually intended.

IMO, a reasonable question a GCC user might ask is: when I make
a call to a standard library function via __builtin_foo() in
a language conformance mode where foo is not a standard function,
can I expect GCC to transform it to some equivalent call to
a function that is defined by the standard (or expand it inline)?
I don't know what the answer should be, but whatever we might
want it to be, it seems worth documenting.

Martin


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Joseph Myers
On Mon, 2 Oct 2017, Martin Sebor wrote:

> IMO, a reasonable question a GCC user might ask is: when I make
> a call to a standard library function via __builtin_foo() in
> a language conformance mode where foo is not a standard function,
> can I expect GCC to transform it to some equivalent call to
> a function that is defined by the standard (or expand it inline)?
> I don't know what the answer should be, but whatever we might
> want it to be, it seems worth documenting.

It may transform it, but is not required to do so; calling foo is also OK.

 has my 
analysis of what I think is correct in this area (which may not be the 
same as what is currently implemented).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Jakub Jelinek
On Mon, Oct 02, 2017 at 11:45:24AM -0600, Martin Sebor wrote:
> What's more, in strict mode GCC transforms stpcpy calls to strcpy.

Only if the result is not needed or if the length of the source string
is already known.  And we do that transformation regardless of strict
mode.  If the result is needed, I don't believe
we ever do that, it would be a clear opposite of optimization,
replace one call that does both the copying and computing the length
(well, end address, but that is simple pointer arithmetics away from that)
to computing the length one way and copying another way.
This is different for mempcpy where the result can be computed quite cheaply
and thus what kind of builtin is used doesn't matter that much, only that
we shouldn't introduce a less standard call when more standard one was used
in the source.

Jakub


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Martin Sebor

On 10/02/2017 11:54 AM, Joseph Myers wrote:

On Mon, 2 Oct 2017, Martin Sebor wrote:


IMO, a reasonable question a GCC user might ask is: when I make
a call to a standard library function via __builtin_foo() in
a language conformance mode where foo is not a standard function,
can I expect GCC to transform it to some equivalent call to
a function that is defined by the standard (or expand it inline)?
I don't know what the answer should be, but whatever we might
want it to be, it seems worth documenting.


It may transform it, but is not required to do so; calling foo is also OK.

 has my
analysis of what I think is correct in this area (which may not be the
same as what is currently implemented).


Thanks!  That's a perfect reference -- it even discusses
__builtin_foo! :)  The past discussion of the stpcpy specifics
is especially helpful(*).  I'll have to take some time to digest
it but from what I've read so far just mentioning what you said
above should in line with everyone's view (IIUC).  If that's so
I'll propose an update to the Other Builtins section to mention
something along those lines.

Martin

[*] I would still be curious to know why stpcpy alone is treated
special and not other built-ins as well, and what happened to
Richard's patch referenced in that thread:
  https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00357.html
It doesn't look like it was ever committed (I don't see
the changes in c/c-decl.c/cp/decl.c).


Re: strlen optimizations based on whether stpcpy is declared?

2017-10-02 Thread Martin Sebor

On 10/02/2017 12:00 PM, Jakub Jelinek wrote:

On Mon, Oct 02, 2017 at 11:45:24AM -0600, Martin Sebor wrote:

What's more, in strict mode GCC transforms stpcpy calls to strcpy.


Only if the result is not needed or if the length of the source string
is already known.  And we do that transformation regardless of strict
mode.  If the result is needed, I don't believe
we ever do that, it would be a clear opposite of optimization,
replace one call that does both the copying and computing the length
(well, end address, but that is simple pointer arithmetics away from that)
to computing the length one way and copying another way.


I see.  Okay, that explains that.

What about the part about:

  for -std=c99 or -std=c11 one can use -D_GNU_SOURCE
  or -D_POSIX_C_SOURCE=200809 or -D_XOPEN_SOURCE=700 and various
  others to make stpcpy available.

It's not what happens.  But what does seem to do it is when, in
addition to defining one of the macros, stpcpy is also explicitly
declared in the program, like so:

#include 

extern char* stpcpy (char*, const char*);

void __attribute__ ((noclone)) f (char *d, const char *s)
{
  strcpy (d, s);

  if (__builtin_strlen (d) != __builtin_strlen (s))
 __builtin_abort ();
}

Then the strcpy is transformed into stpcpy.  That seems odd and
like a bug, do you agree?


This is different for mempcpy where the result can be computed quite cheaply
and thus what kind of builtin is used doesn't matter that much, only that
we shouldn't introduce a less standard call when more standard one was used
in the source.


This makes sense as a rule of thumb.  The strcpy to stpcpy
transformation is less intuitive and, if it has consensus,
I think would a good example of the acceptable tradeoffs that
would be helpful for GCC developers without the benefit of all
the past discussions and decisions to be aware of.

Martin


pass manager question

2017-10-02 Thread Sandra Loosemore
Is there an idiom for target-specific back end code to ask the pass 
manager if a particular pass (e.g., "split1") has already run?


I have some nios2 addressing mode improvement patches in the works that 
depend on deferring splitting of some complex address forms until after 
cse and fwprop, instead of during expand.  Once "split1" has run, 
TARGET_LEGITIMATE_ADDRESS_P shouldn't consider those address forms valid 
any more.  For now I've solved this problem by adding a target-specific 
pass immediately after "split1" that does nothing but set a flag, but 
that seems kind of hacky.  If I can get at the information from the pass 
manager's public interface, that seems like a better solution, but I've 
gotten rather lost in that code.  :-(


I suppose another alternative is adding a split1_completed variable akin 
to reload_completed, but I'm hesitant to touch target-independent code 
for things that aren't generally useful.


Any suggestions/recommendations?

-Sandra