https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91771
Bug ID: 91771
Summary: Optimization fails to inline final override.
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: carlo at gcc dot gnu.org
Target Milestone: ---
Compiling the following code snippet:
struct Base
{
int foo(int n) { return do_foo(n); }
virtual int do_foo(int n) = 0;
};
struct Derived : public Base
{
int do_foo(int n) override final { return n + 2; }
};
int f(Derived& d)
{
return d.do_foo(40);
}
with g++ -S -O2 f.cxx
results correctly in the assembly code:
_Z1fR7Derived:
.LFB2:
.cfi_startproc
movl $42, %eax
ret
This does obviously not happen when the 'final' keyword is removed.
However, when we change f() to return foo instead of do_foo:
return d.foo(40);
the assembly code of f() changes to:
_Z1fR7Derived:
.LFB2:
.cfi_startproc
movq (%rdi), %rax
leaq _ZN7Derived6do_fooEi(%rip), %rdx
movq (%rax), %rax
cmpq %rdx, %rax
jne .L5
movl $42, %eax
ret
In other words, it failed to do the inlining.
The reason I find this bad is because of std::pmr::memory_resource
which follows this exact pattern,
class memory_resource
{
...
void*
allocate(size_t __bytes, size_t __alignment = _S_max_align)
{ return do_allocate(__bytes, __alignment); }
void
deallocate(void* __p, size_t __bytes, size_t __alignment = _S_max_align)
{ return do_deallocate(__p, __bytes, __alignment); }
...
virtual void*
do_allocate(size_t __bytes, size_t __alignment) = 0;
virtual void
do_deallocate(void* __p, size_t __bytes, size_t __alignment) = 0;
...
};
I'd really like to use std::pmr::memory_resource at the moment, but
only when the compiler will do the above optimization; then I can
specify 'final' for the do_allocate and do_deallocate of my ultra fast
pool memory allocators and get rid of the indirection of the virtual
functions by making sure the caller has the right type (which is normally
the case for the lowest level memory resource classes; only
'upstream' classes will be called through the memory_resource::allocate()
member function of the base class, in which we're already one 'level' higher in
the memory resource hierarchy, so speed isn't as much as a requirement anymore.