Re: Missed optimization with const member
Hi, On Wed, 2017-07-05 at 02:02 +0200, Geza Herman wrote: > > Here's what happens: in callInitA(), an Object put onto the stack (which > has a const member variable, initialized to 0). Then somefunction called > (which is intentionally not defined). Then ~Object() is called, which > has an "if", which has a not-immediately-obvious, but always false > condition. Compiling with -03, everything gets inlined. > > My question is about the inlined ~Object(). As m_initType is always 0, > why does not optimize the destructor GCC away? GCC inserts code that > checks the value of m_initType. > > Is it because such construct is rare in practice? Or is it hard to do an > optimization like that? It's not safe to optimize it away because the compiler does not know what "somefunction" does. Theoretically, it could cast the "Object&" to some subclass and overwrite the const variable. "const" does not mean that the memory location is read-only in some way. For some more explanations about const, see e.g. here https://stackoverflow.com/questions/4486326/does-const-just-mean-read-only-or-something-more You can try showing "somefunction" to the compiler (i.e. put it into the same translation unit in your example, or build the whole thing with LTO) and see what happens. Cheers, Oleg
whereis asm_nodes? how to migrate it for GCC v6.x?
Hi GCC developers, There was extern GTY(()) struct asm_node *asm_nodes; for GCC v4.x, but how to migrate it for v6.x? there is no asm_nodes deprecated log in ChangeLog-201X nor git log cgraph.h... please give me some hint, thanks a lot! -- Regards, Leslie Zhai https://reviews.llvm.org/p/xiangzhai/
Re: whereis asm_nodes? how to migrate it for GCC v6.x?
Hi Leslie, On Wed, Jul 05, 2017 at 05:36:15PM +0800, Leslie Zhai wrote: > There was > > extern GTY(()) struct asm_node *asm_nodes; > > for GCC v4.x, but how to migrate it for v6.x? there is no asm_nodes > deprecated log in ChangeLog-201X nor git log cgraph.h... please give me > some hint, thanks a lot! symtab->first_asm_symbol ? Segher
Re: Missed optimization with const member
On 5 July 2017 at 10:13, Oleg Endo wrote: > Hi, > > On Wed, 2017-07-05 at 02:02 +0200, Geza Herman wrote: >> >> Here's what happens: in callInitA(), an Object put onto the stack (which >> has a const member variable, initialized to 0). Then somefunction called >> (which is intentionally not defined). Then ~Object() is called, which >> has an "if", which has a not-immediately-obvious, but always false >> condition. Compiling with -03, everything gets inlined. >> >> My question is about the inlined ~Object(). As m_initType is always 0, >> why does not optimize the destructor GCC away? GCC inserts code that >> checks the value of m_initType. >> >> Is it because such construct is rare in practice? Or is it hard to do an >> optimization like that? > > It's not safe to optimize it away because the compiler does not know > what "somefunction" does. Theoretically, it could cast the "Object&" > to some subclass and overwrite the const variable. "const" does not > mean that the memory location is read-only in some way. No, that would be undefined behaviour. The data member is defined as const, so it's not possible to write to that member without undefined behaviour. A variable defined with a const type is not the same as a variable accessed through a pointer/reference to a const type. Furthermore, casting the Object to a derived class would also be undefined, because the dynamic type is Object, not some derived type. I think the reason it's not optimized away is for this case: void somefunction(const Object& object); { void* p = &object; object.~Object(); new(p) Object(); } This means that after calling someFunction there could be a different object at the same location (with a possibly different value for that member). However, for this specific case that's also not possible, because there are no constructors that could initialize the member to anything except zero. But that requires more than just control-flow analysis, it also requires analysing all the available constructors to check there isn't one that does: Object::Object(int) : m_initType(1) { }
Re: Missed optimization with const member
On 07/05/2017 01:14 PM, Jonathan Wakely wrote: I think the reason it's not optimized away is for this case: void somefunction(const Object& object); { void* p = &object; object.~Object(); new(p) Object(); } This means that after calling someFunction there could be a different object at the same location (with a possibly different value for that member). https://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder I don't know whether the anwser at stackoverflow is right, but: the compiler can assume that m_initType doesn't change, if I don't launder it. (in this case, though, I don't know how would I tell the compiler that it must call the destructor "laundered".) However, for this specific case that's also not possible, because there are no constructors that could initialize the member to anything except zero. But that requires more than just control-flow analysis, it also requires analysing all the available constructors to check there isn't one that does: Object::Object(int) : m_initType(1) { } My real code has a constructor like that. But in this case, the compiler knows the value Object was initialized with, and could optimize access m_initType in ~Object() too.
Re: Missed optimization with const member
On Wed, 2017-07-05 at 12:14 +0100, Jonathan Wakely wrote: > > No, that would be undefined behaviour. The data member is defined as > const, so it's not possible to write to that member without undefined > behaviour. A variable defined with a const type is not the same as a > variable accessed through a pointer/reference to a const type. > > Furthermore, casting the Object to a derived class would also be > undefined, because the dynamic type is Object, not some derived type. Ugh, you're right. Sorry for the misleading examples. > > I think the reason it's not optimized away is for this case: > > void somefunction(const Object& object); > { > void* p = &object; > object.~Object(); > new(p) Object(); > } > > This means that after calling someFunction there could be a different > object at the same location (with a possibly different value for that > member). This is basically what I was trying to say. The function call acts as an optimization barrier in this case because it could do anything with that memory address it gets passed. If the example is changed to: void somefunction(const Object& object); void callInitA(Object& x) { Object o; somefunction(x); } we can observe that everything of "Object o" gets optimized away as expected. And for that, the member doesn't even need to be const. Is there actually any particular optimization mechanism in GCC that takes advantage of "const"? I'm not aware of any such thing. Cheers, Oleg
Re: Missed optimization with const member
On 07/04/2017 06:02 PM, Geza Herman wrote: Hi, I've included a small program at the end of my email. Here's what happens: in callInitA(), an Object put onto the stack (which has a const member variable, initialized to 0). Then somefunction called (which is intentionally not defined). Then ~Object() is called, which has an "if", which has a not-immediately-obvious, but always false condition. Compiling with -03, everything gets inlined. My question is about the inlined ~Object(). As m_initType is always 0, why does not optimize the destructor GCC away? GCC inserts code that checks the value of m_initType. Is it because such construct is rare in practice? Or is it hard to do an optimization like that? I've checked other compilers (clang, icc, msvc), none of them optimized this. (I've tried gcc 5.1, 6.1, 7.1) I agree this would be a useful optimization. I recently raised bug 80794 with this same observation/request. There are some concerns with it (Jonathan gave a partial example(*)) but I don't think they necessarily rule it out for the reasons discussed in the bug (my comments 2 and 5). It would be helpful if you added your observations/interest there. In general, GCC doesn't implement these kinds of optimizations (based on the const qualifier) because its optimizer doesn't have a way to distinguish between a const reference (or pointer) to a non-const object that can be safely modified by casting the constness away and a reference to const-ualified object that cannot. With this limitation overcome it should be possible to implement this and other interesting optimizations that exploit the const qualifier (another opportunity is discussed in bug 81009). To avoid breaking existing (though I suspect rare) programs that play tricks like those in the examples GCC might need to expose its internal annotation to users to decorate functions/ and either opt in to the optimization or opt out of it. The annotation could take the form of a new attribute similar to that already provided by some compilers. For instance some make it possible to annotate function arguments as "in" or "out" in addition to const. Another possibility is to make use of restrict in C++). Martin [*] While the example (copied below) is valid, accessing the object after someFunction() has returned via a reference or pointer to it is not. void somefunction(const Object& object); { void* p = &object; object.~Object(); new(p) Object(); } In simple cases like the one above, decorating the reference with the __restrict keyword would invalidate the code and make the optimization safe. void somefunction(const Object& __restrict object);
RFC: Add ___tls_get_addr
On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by GCCs older than GCC 4.9.4: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 continue to work even if vector instructions are used by functions called from __tls_get_addr, which assumes 16-byte stack alignment as specified by x86-64 psABI. We are considering to add an alternative interface, ___tls_get_addr, to glibc, which doesn't realign stack. Compilers, which properly align stack for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, if ___tls_get_addr is available. Any comments? -- H.J.
Re: RFC: Add ___tls_get_addr
On 07/05/2017 11:38 AM, H.J. Lu wrote: > On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by > GCCs older than GCC 4.9.4: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 > > continue to work even if vector instructions are used by functions called > from __tls_get_addr, which assumes 16-byte stack alignment as specified > by x86-64 psABI. > > We are considering to add an alternative interface, ___tls_get_addr, to > glibc, which doesn't realign stack. Compilers, which properly align stack > for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, > if ___tls_get_addr is available. > > Any comments? Overall I agree with the idea. However, the triple underscore will lead to eventual human error, can we avoid that error by changing the name in an meaningful way? I don't care what you choose really. -- Cheers, Carlos.
Re: RFC: Add ___tls_get_addr
On 05/07/17 16:38, H.J. Lu wrote: > On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by > GCCs older than GCC 4.9.4: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 > > continue to work even if vector instructions are used by functions called > from __tls_get_addr, which assumes 16-byte stack alignment as specified > by x86-64 psABI. > > We are considering to add an alternative interface, ___tls_get_addr, to > glibc, which doesn't realign stack. Compilers, which properly align stack > for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, > if ___tls_get_addr is available. > > Any comments? > > what happens when new compiler generating the new symbol is used with old glibc?
Re: RFC: Add ___tls_get_addr
On Wed, Jul 05, 2017 at 08:38:50AM -0700, H.J. Lu wrote: > On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by > GCCs older than GCC 4.9.4: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 > > continue to work even if vector instructions are used by functions called > from __tls_get_addr, which assumes 16-byte stack alignment as specified > by x86-64 psABI. > > We are considering to add an alternative interface, ___tls_get_addr, to > glibc, which doesn't realign stack. Compilers, which properly align stack > for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, > if ___tls_get_addr is available. > > Any comments? I think it's unnecessary. The fast path of __tls_get_addr is trivial to write in asm, where alignment doesn't matter, and then the cost of alignment only enters in the slow path anyway. Implementations like glibc and musl that need to be compatible with old binaries can easily do this, and ones which know they'll only be running binaries built without the gcc bug don't need to care. (FWIW, it can probably be done without asm too, just using an intermediate function with the right attribute to force realignment, if you have an attribute to suppress use of vector instructions in the top-level fast-path __tls_get_addr C.) Note that if you make the change and have gcc generate calls to the new ___tls_get_addr symbol, it's going to be problematic for people trying to link to older glibc versions that don't have it. Rich
Re: RFC: Add ___tls_get_addr
On 07/05/2017 05:38 PM, H.J. Lu wrote: > We are considering to add an alternative interface, ___tls_get_addr, to > glibc, which doesn't realign stack. Compilers, which properly align stack > for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, > if ___tls_get_addr is available. Why can't we add a new symbol version instead? Those who recompile or relink their applications would have to use fixed compilers. Thanks, Florian
Re: RFC: Add ___tls_get_addr
On Wed, Jul 5, 2017 at 8:53 AM, Szabolcs Nagy wrote: > On 05/07/17 16:38, H.J. Lu wrote: >> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by >> GCCs older than GCC 4.9.4: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 >> >> continue to work even if vector instructions are used by functions called >> from __tls_get_addr, which assumes 16-byte stack alignment as specified >> by x86-64 psABI. >> >> We are considering to add an alternative interface, ___tls_get_addr, to >> glibc, which doesn't realign stack. Compilers, which properly align stack >> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, >> if ___tls_get_addr is available. >> >> Any comments? >> >> > > what happens when new compiler generating the new symbol > is used with old glibc? > Compiler shouldn't do that. -- H.J.
Re: RFC: Add ___tls_get_addr
On Wed, Jul 5, 2017 at 8:51 AM, Carlos O'Donell wrote: > On 07/05/2017 11:38 AM, H.J. Lu wrote: >> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by >> GCCs older than GCC 4.9.4: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 >> >> continue to work even if vector instructions are used by functions called >> from __tls_get_addr, which assumes 16-byte stack alignment as specified >> by x86-64 psABI. >> >> We are considering to add an alternative interface, ___tls_get_addr, to >> glibc, which doesn't realign stack. Compilers, which properly align stack >> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, >> if ___tls_get_addr is available. >> >> Any comments? > > Overall I agree with the idea. > > However, the triple underscore will lead to eventual human error, can we > avoid that error by changing the name in an meaningful way? I don't care > what you choose really. > ___tls_get_addr is generated by compilers, not by programmers. Linker needs to check it. The shorter of the function name, the better. -- H.J.
Re: RFC: Add ___tls_get_addr
On Wed, Jul 5, 2017 at 9:27 AM, Florian Weimer wrote: > On 07/05/2017 06:21 PM, H.J. Lu wrote: >> On Wed, Jul 5, 2017 at 9:18 AM, Florian Weimer wrote: >>> On 07/05/2017 06:17 PM, H.J. Lu wrote: On Wed, Jul 5, 2017 at 9:09 AM, Florian Weimer wrote: > On 07/05/2017 05:38 PM, H.J. Lu wrote: >> We are considering to add an alternative interface, ___tls_get_addr, to >> glibc, which doesn't realign stack. Compilers, which properly align >> stack >> for TLS, call generate call to ___tls_get_addr, instead of >> __tls_get_addr, >> if ___tls_get_addr is available. > > Why can't we add a new symbol version instead? Those who recompile or > relink their applications would have to use fixed compilers. > When GCC 4.8 is used to compile packages under the new glibc, the new version of ___tls_get_addr will be used. But the alignment will be wrong. >>> >>> GCC 4.8 is out of support. Do not use it, or use it along with a >>> matching glibc. >> >> We do support old GCCs to be used with the new glibc, just not to support >> them to build glibc. > > There is no promise that we will work around compiler bugs in old GCCs, > though. We are supporting older GCCs as much as we can. -- H.J.
Re: Missed optimization with const member
On 07/05/2017 01:26 PM, Geza Herman wrote: On 07/05/2017 01:14 PM, Jonathan Wakely wrote: I think the reason it's not optimized away is for this case: void somefunction(const Object& object); { void* p = &object; object.~Object(); new(p) Object(); } This means that after calling someFunction there could be a different object at the same location (with a possibly different value for that member). https://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder I don't know whether the anwser at stackoverflow is right, but: the compiler can assume that m_initType doesn't change, if I don't launder it. (in this case, though, I don't know how would I tell the compiler that it must call the destructor "laundered".) Have a look at this doc: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0532r0.pdf It explitly says that after "new(p) Object();", it is UB to access object (even the mutable members!) via p. If you want to access it, you have to use the pointer which "new" returned. Or, std::launder must be used (c++17 proposal).
Re: Missed optimization with const member
On Wed, Jul 5, 2017 at 12:14 PM, Jonathan Wakely wrote: > On 5 July 2017 at 10:13, Oleg Endo wrote: >> Hi, >> >> On Wed, 2017-07-05 at 02:02 +0200, Geza Herman wrote: >>> >>> Here's what happens: in callInitA(), an Object put onto the stack (which >>> has a const member variable, initialized to 0). Then somefunction called >>> (which is intentionally not defined). Then ~Object() is called, which >>> has an "if", which has a not-immediately-obvious, but always false >>> condition. Compiling with -03, everything gets inlined. >>> >>> My question is about the inlined ~Object(). As m_initType is always 0, >>> why does not optimize the destructor GCC away? GCC inserts code that >>> checks the value of m_initType. >>> >>> Is it because such construct is rare in practice? Or is it hard to do an >>> optimization like that? >> >> It's not safe to optimize it away because the compiler does not know >> what "somefunction" does. Theoretically, it could cast the "Object&" >> to some subclass and overwrite the const variable. "const" does not >> mean that the memory location is read-only in some way. > > No, that would be undefined behaviour. The data member is defined as > const, so it's not possible to write to that member without undefined > behaviour. A variable defined with a const type is not the same as a > variable accessed through a pointer/reference to a const type. > > Furthermore, casting the Object to a derived class would also be > undefined, because the dynamic type is Object, not some derived type. There's a related PR about this (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67886). basic.lifetime allows to devirtualize second call to foo in Base *p = new Derived; p->foo(); p->foo(); to Base *p = new Derived; Derived::foo(p); Derived::foo(p); because first call is not allowed to changed dynamic type of p. Clang optimizes this but GCC does not. > I think the reason it's not optimized away is for this case: > > void somefunction(const Object& object); > { > void* p = &object; > object.~Object(); > new(p) Object(); > } > > This means that after calling someFunction there could be a different > object at the same location (with a possibly different value for that > member). > > However, for this specific case that's also not possible, because > there are no constructors that could initialize the member to anything > except zero. But that requires more than just control-flow analysis, > it also requires analysing all the available constructors to check > there isn't one that does: > > Object::Object(int) : m_initType(1) { }