Re: Missed optimization with const member

2017-07-05 Thread Oleg Endo
Hi,

On Wed, 2017-07-05 at 02:02 +0200, Geza Herman wrote:
> 
> Here's what happens: in callInitA(), an Object put onto the stack (which 
> has a const member variable, initialized to 0). Then somefunction called 
> (which is intentionally not defined). Then ~Object() is called, which 
> has an "if", which has a not-immediately-obvious, but always false 
> condition. Compiling with -03, everything gets inlined.
> 
> My question is about the inlined ~Object(). As m_initType is always 0, 
> why does not optimize the destructor GCC away? GCC inserts code that 
> checks the value of m_initType.
> 
> Is it because such construct is rare in practice? Or is it hard to do an 
> optimization like that?

It's not safe to optimize it away because the compiler does not know
what "somefunction" does.  Theoretically, it could cast the "Object&"
to some subclass and overwrite the const variable.  "const" does not
mean that the memory location is read-only in some way.

For some more explanations about const, see e.g. here 
https://stackoverflow.com/questions/4486326/does-const-just-mean-read-only-or-something-more

You can try showing "somefunction" to the compiler (i.e. put it into
the same translation unit in your example, or build the whole thing
with LTO) and see what happens.

Cheers,
Oleg


whereis asm_nodes? how to migrate it for GCC v6.x?

2017-07-05 Thread Leslie Zhai

Hi GCC developers,

There was

extern GTY(()) struct asm_node *asm_nodes;

for GCC v4.x, but how to migrate it for v6.x? there is no asm_nodes 
deprecated log in ChangeLog-201X nor git log cgraph.h... please give me 
some hint, thanks a lot!


--
Regards,
Leslie Zhai https://reviews.llvm.org/p/xiangzhai/





Re: whereis asm_nodes? how to migrate it for GCC v6.x?

2017-07-05 Thread Segher Boessenkool
Hi Leslie,

On Wed, Jul 05, 2017 at 05:36:15PM +0800, Leslie Zhai wrote:
> There was
> 
> extern GTY(()) struct asm_node *asm_nodes;
> 
> for GCC v4.x, but how to migrate it for v6.x? there is no asm_nodes 
> deprecated log in ChangeLog-201X nor git log cgraph.h... please give me 
> some hint, thanks a lot!

symtab->first_asm_symbol ?


Segher


Re: Missed optimization with const member

2017-07-05 Thread Jonathan Wakely
On 5 July 2017 at 10:13, Oleg Endo wrote:
> Hi,
>
> On Wed, 2017-07-05 at 02:02 +0200, Geza Herman wrote:
>>
>> Here's what happens: in callInitA(), an Object put onto the stack (which
>> has a const member variable, initialized to 0). Then somefunction called
>> (which is intentionally not defined). Then ~Object() is called, which
>> has an "if", which has a not-immediately-obvious, but always false
>> condition. Compiling with -03, everything gets inlined.
>>
>> My question is about the inlined ~Object(). As m_initType is always 0,
>> why does not optimize the destructor GCC away? GCC inserts code that
>> checks the value of m_initType.
>>
>> Is it because such construct is rare in practice? Or is it hard to do an
>> optimization like that?
>
> It's not safe to optimize it away because the compiler does not know
> what "somefunction" does.  Theoretically, it could cast the "Object&"
> to some subclass and overwrite the const variable.  "const" does not
> mean that the memory location is read-only in some way.

No, that would be undefined behaviour. The data member is defined as
const, so it's not possible to write to that member without undefined
behaviour. A variable defined with a const type is not the same as a
variable accessed through a pointer/reference to a const type.

Furthermore, casting the Object to a derived class would also be
undefined, because the dynamic type is Object, not some derived type.

I think the reason it's not optimized away is for this case:

void somefunction(const Object& object);
{
  void* p = &object;
  object.~Object();
  new(p) Object();
}

This means that after calling someFunction there could be a different
object at the same location (with a possibly different value for that
member).

However, for this specific case that's also not possible, because
there are no constructors that could initialize the member to anything
except zero. But that requires more than just control-flow analysis,
it also requires analysing all the available constructors to check
there isn't one that does:

Object::Object(int) : m_initType(1) { }


Re: Missed optimization with const member

2017-07-05 Thread Geza Herman

On 07/05/2017 01:14 PM, Jonathan Wakely wrote:

I think the reason it's not optimized away is for this case:

void somefunction(const Object& object);
{
   void* p = &object;
   object.~Object();
   new(p) Object();
}

This means that after calling someFunction there could be a different
object at the same location (with a possibly different value for that
member).


https://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder

I don't know whether the anwser at stackoverflow is right, but: the 
compiler can assume that m_initType doesn't change, if I don't launder 
it. (in this case, though, I don't know how would I tell the compiler 
that it must call the destructor "laundered".)



However, for this specific case that's also not possible, because
there are no constructors that could initialize the member to anything
except zero. But that requires more than just control-flow analysis,
it also requires analysing all the available constructors to check
there isn't one that does:

Object::Object(int) : m_initType(1) { }
My real code has a constructor like that. But in this case, the compiler 
knows the value Object was initialized with, and could optimize access 
m_initType in ~Object() too.


Re: Missed optimization with const member

2017-07-05 Thread Oleg Endo
On Wed, 2017-07-05 at 12:14 +0100, Jonathan Wakely wrote:
> 
> No, that would be undefined behaviour. The data member is defined as
> const, so it's not possible to write to that member without undefined
> behaviour. A variable defined with a const type is not the same as a
> variable accessed through a pointer/reference to a const type.
> 
> Furthermore, casting the Object to a derived class would also be
> undefined, because the dynamic type is Object, not some derived type.

Ugh, you're right.  Sorry for the misleading examples.

> 
> I think the reason it's not optimized away is for this case:
> 
> void somefunction(const Object& object);
> {
>   void* p = &object;
>   object.~Object();
>   new(p) Object();
> }
> 
> This means that after calling someFunction there could be a different
> object at the same location (with a possibly different value for that
> member).

This is basically what I was trying to say.  The function call acts as
an optimization barrier in this case because it could do anything with
that memory address it gets passed.  If the example is changed to:

void somefunction(const Object& object);

void callInitA(Object& x) {
Object o;
somefunction(x);
}

we can observe that everything of "Object o" gets optimized away as
expected.  And for that, the member doesn't even need to be const.

Is there actually any particular optimization mechanism in GCC that
takes advantage of "const"?  I'm not aware of any such thing.

Cheers,
Oleg


Re: Missed optimization with const member

2017-07-05 Thread Martin Sebor

On 07/04/2017 06:02 PM, Geza Herman wrote:

Hi,

I've included a small program at the end of my email.

Here's what happens: in callInitA(), an Object put onto the stack (which
has a const member variable, initialized to 0). Then somefunction called
(which is intentionally not defined). Then ~Object() is called, which
has an "if", which has a not-immediately-obvious, but always false
condition. Compiling with -03, everything gets inlined.

My question is about the inlined ~Object(). As m_initType is always 0,
why does not optimize the destructor GCC away? GCC inserts code that
checks the value of m_initType.

Is it because such construct is rare in practice? Or is it hard to do an
optimization like that?

I've checked other compilers (clang, icc, msvc), none of them optimized
this.

(I've tried gcc 5.1, 6.1, 7.1)


I agree this would be a useful optimization.  I recently raised
bug 80794 with this same observation/request.  There are some
concerns with it (Jonathan gave a partial example(*)) but I don't
think they necessarily rule it out for the reasons discussed in
the bug (my comments 2 and 5).  It would be helpful if you added
your observations/interest there.

In general, GCC doesn't implement these kinds of optimizations
(based on the const qualifier) because its optimizer doesn't have
a way to distinguish between a const reference (or pointer) to
a non-const object that can be safely modified by casting
the constness away and a reference to const-ualified object
that cannot.  With this limitation overcome it should be possible
to implement this and other interesting optimizations that exploit
the const qualifier (another opportunity is discussed in bug 81009).
To avoid breaking existing (though I suspect rare) programs that
play tricks like those in the examples GCC might need to expose
its internal annotation to users to decorate functions/ and either
opt in to the optimization or opt out of it.  The annotation could
take the form of  a new attribute similar to that already provided
by some compilers.  For instance some make it possible to annotate
function arguments as "in" or "out" in addition to const.  Another
possibility is to make use of restrict in C++).

Martin

[*] While the example (copied below) is valid, accessing the object
after someFunction() has returned via a reference or pointer to it
is not.

  void somefunction(const Object& object);
  {
void* p = &object;
object.~Object();
new(p) Object();
  }

In simple cases like the one above, decorating the reference with
the __restrict keyword would invalidate the code and make the
optimization safe.

  void somefunction(const Object& __restrict object);


RFC: Add ___tls_get_addr

2017-07-05 Thread H.J. Lu
On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
GCCs older than GCC 4.9.4:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

continue to work even if vector instructions are used by functions called
from __tls_get_addr, which assumes 16-byte stack alignment as specified
by x86-64 psABI.

We are considering to add an alternative interface, ___tls_get_addr, to
glibc, which doesn't realign stack.  Compilers, which properly align stack
for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
if ___tls_get_addr is available.

Any comments?


-- 
H.J.


Re: RFC: Add ___tls_get_addr

2017-07-05 Thread Carlos O'Donell
On 07/05/2017 11:38 AM, H.J. Lu wrote:
> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
> GCCs older than GCC 4.9.4:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> 
> continue to work even if vector instructions are used by functions called
> from __tls_get_addr, which assumes 16-byte stack alignment as specified
> by x86-64 psABI.
> 
> We are considering to add an alternative interface, ___tls_get_addr, to
> glibc, which doesn't realign stack.  Compilers, which properly align stack
> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
> if ___tls_get_addr is available.
> 
> Any comments?

Overall I agree with the idea.

However, the triple underscore will lead to eventual human error, can we
avoid that error by changing the name in an meaningful way? I don't care
what you choose really.

-- 
Cheers,
Carlos.


Re: RFC: Add ___tls_get_addr

2017-07-05 Thread Szabolcs Nagy
On 05/07/17 16:38, H.J. Lu wrote:
> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
> GCCs older than GCC 4.9.4:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> 
> continue to work even if vector instructions are used by functions called
> from __tls_get_addr, which assumes 16-byte stack alignment as specified
> by x86-64 psABI.
> 
> We are considering to add an alternative interface, ___tls_get_addr, to
> glibc, which doesn't realign stack.  Compilers, which properly align stack
> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
> if ___tls_get_addr is available.
> 
> Any comments?
> 
> 

what happens when new compiler generating the new symbol
is used with old glibc?




Re: RFC: Add ___tls_get_addr

2017-07-05 Thread Rich Felker
On Wed, Jul 05, 2017 at 08:38:50AM -0700, H.J. Lu wrote:
> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
> GCCs older than GCC 4.9.4:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> 
> continue to work even if vector instructions are used by functions called
> from __tls_get_addr, which assumes 16-byte stack alignment as specified
> by x86-64 psABI.
> 
> We are considering to add an alternative interface, ___tls_get_addr, to
> glibc, which doesn't realign stack.  Compilers, which properly align stack
> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
> if ___tls_get_addr is available.
> 
> Any comments?

I think it's unnecessary. The fast path of __tls_get_addr is trivial
to write in asm, where alignment doesn't matter, and then the cost of
alignment only enters in the slow path anyway. Implementations like
glibc and musl that need to be compatible with old binaries can easily
do this, and ones which know they'll only be running binaries built
without the gcc bug don't need to care.

(FWIW, it can probably be done without asm too, just using an
intermediate function with the right attribute to force realignment,
if you have an attribute to suppress use of vector instructions in the
top-level fast-path __tls_get_addr C.)

Note that if you make the change and have gcc generate calls to the
new ___tls_get_addr symbol, it's going to be problematic for people
trying to link to older glibc versions that don't have it.

Rich


Re: RFC: Add ___tls_get_addr

2017-07-05 Thread Florian Weimer
On 07/05/2017 05:38 PM, H.J. Lu wrote:
> We are considering to add an alternative interface, ___tls_get_addr, to
> glibc, which doesn't realign stack.  Compilers, which properly align stack
> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
> if ___tls_get_addr is available.

Why can't we add a new symbol version instead?  Those who recompile or
relink their applications would have to use fixed compilers.

Thanks,
Florian


Re: RFC: Add ___tls_get_addr

2017-07-05 Thread H.J. Lu
On Wed, Jul 5, 2017 at 8:53 AM, Szabolcs Nagy  wrote:
> On 05/07/17 16:38, H.J. Lu wrote:
>> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
>> GCCs older than GCC 4.9.4:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
>>
>> continue to work even if vector instructions are used by functions called
>> from __tls_get_addr, which assumes 16-byte stack alignment as specified
>> by x86-64 psABI.
>>
>> We are considering to add an alternative interface, ___tls_get_addr, to
>> glibc, which doesn't realign stack.  Compilers, which properly align stack
>> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
>> if ___tls_get_addr is available.
>>
>> Any comments?
>>
>>
>
> what happens when new compiler generating the new symbol
> is used with old glibc?
>

Compiler shouldn't do that.

-- 
H.J.


Re: RFC: Add ___tls_get_addr

2017-07-05 Thread H.J. Lu
On Wed, Jul 5, 2017 at 8:51 AM, Carlos O'Donell  wrote:
> On 07/05/2017 11:38 AM, H.J. Lu wrote:
>> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
>> GCCs older than GCC 4.9.4:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
>>
>> continue to work even if vector instructions are used by functions called
>> from __tls_get_addr, which assumes 16-byte stack alignment as specified
>> by x86-64 psABI.
>>
>> We are considering to add an alternative interface, ___tls_get_addr, to
>> glibc, which doesn't realign stack.  Compilers, which properly align stack
>> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
>> if ___tls_get_addr is available.
>>
>> Any comments?
>
> Overall I agree with the idea.
>
> However, the triple underscore will lead to eventual human error, can we
> avoid that error by changing the name in an meaningful way? I don't care
> what you choose really.
>

___tls_get_addr is generated by compilers, not by programmers.
Linker needs to check it.  The shorter of the function name, the better.

-- 
H.J.


Re: RFC: Add ___tls_get_addr

2017-07-05 Thread H.J. Lu
On Wed, Jul 5, 2017 at 9:27 AM, Florian Weimer  wrote:
> On 07/05/2017 06:21 PM, H.J. Lu wrote:
>> On Wed, Jul 5, 2017 at 9:18 AM, Florian Weimer  wrote:
>>> On 07/05/2017 06:17 PM, H.J. Lu wrote:
 On Wed, Jul 5, 2017 at 9:09 AM, Florian Weimer  wrote:
> On 07/05/2017 05:38 PM, H.J. Lu wrote:
>> We are considering to add an alternative interface, ___tls_get_addr, to
>> glibc, which doesn't realign stack.  Compilers, which properly align 
>> stack
>> for TLS, call generate call to ___tls_get_addr, instead of 
>> __tls_get_addr,
>> if ___tls_get_addr is available.
>
> Why can't we add a new symbol version instead?  Those who recompile or
> relink their applications would have to use fixed compilers.
>

 When GCC 4.8 is used to compile packages under the new glibc, the new
 version of ___tls_get_addr will be used.  But the alignment will be wrong.
>>>
>>> GCC 4.8 is out of support.  Do not use it, or use it along with a
>>> matching glibc.
>>
>> We do support old GCCs to be used with the new glibc, just not to support
>> them to build glibc.
>
> There is no promise that we will work around compiler bugs in old GCCs,
> though.

We are supporting older GCCs as much as we can.


-- 
H.J.


Re: Missed optimization with const member

2017-07-05 Thread Geza Herman

On 07/05/2017 01:26 PM, Geza Herman wrote:

On 07/05/2017 01:14 PM, Jonathan Wakely wrote:

I think the reason it's not optimized away is for this case:

void somefunction(const Object& object);
{
   void* p = &object;
   object.~Object();
   new(p) Object();
}

This means that after calling someFunction there could be a different
object at the same location (with a possibly different value for that
member).


https://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder 



I don't know whether the anwser at stackoverflow is right, but: the 
compiler can assume that m_initType doesn't change, if I don't launder 
it. (in this case, though, I don't know how would I tell the compiler 
that it must call the destructor "laundered".)


Have a look at this doc: 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0532r0.pdf


It explitly says that after "new(p) Object();", it is UB to access 
object (even the mutable members!) via p. If you want to access it, you 
have to use the pointer which "new" returned. Or, std::launder must be 
used (c++17 proposal).




Re: Missed optimization with const member

2017-07-05 Thread Yuri Gribov
On Wed, Jul 5, 2017 at 12:14 PM, Jonathan Wakely  wrote:
> On 5 July 2017 at 10:13, Oleg Endo wrote:
>> Hi,
>>
>> On Wed, 2017-07-05 at 02:02 +0200, Geza Herman wrote:
>>>
>>> Here's what happens: in callInitA(), an Object put onto the stack (which
>>> has a const member variable, initialized to 0). Then somefunction called
>>> (which is intentionally not defined). Then ~Object() is called, which
>>> has an "if", which has a not-immediately-obvious, but always false
>>> condition. Compiling with -03, everything gets inlined.
>>>
>>> My question is about the inlined ~Object(). As m_initType is always 0,
>>> why does not optimize the destructor GCC away? GCC inserts code that
>>> checks the value of m_initType.
>>>
>>> Is it because such construct is rare in practice? Or is it hard to do an
>>> optimization like that?
>>
>> It's not safe to optimize it away because the compiler does not know
>> what "somefunction" does.  Theoretically, it could cast the "Object&"
>> to some subclass and overwrite the const variable.  "const" does not
>> mean that the memory location is read-only in some way.
>
> No, that would be undefined behaviour. The data member is defined as
> const, so it's not possible to write to that member without undefined
> behaviour. A variable defined with a const type is not the same as a
> variable accessed through a pointer/reference to a const type.
>
> Furthermore, casting the Object to a derived class would also be
> undefined, because the dynamic type is Object, not some derived type.

There's a related PR about this
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67886). basic.lifetime
allows to devirtualize second call to foo in
  Base *p = new Derived;
  p->foo();
  p->foo();
to
  Base *p = new Derived;
  Derived::foo(p);
  Derived::foo(p);
because first call is not allowed to changed dynamic type of p. Clang
optimizes this but GCC does not.

> I think the reason it's not optimized away is for this case:
>
> void somefunction(const Object& object);
> {
>   void* p = &object;
>   object.~Object();
>   new(p) Object();
> }
>
> This means that after calling someFunction there could be a different
> object at the same location (with a possibly different value for that
> member).
>
> However, for this specific case that's also not possible, because
> there are no constructors that could initialize the member to anything
> except zero. But that requires more than just control-flow analysis,
> it also requires analysing all the available constructors to check
> there isn't one that does:
>
> Object::Object(int) : m_initType(1) { }