https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91771
Bug ID: 91771 Summary: Optimization fails to inline final override. Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: carlo at gcc dot gnu.org Target Milestone: --- Compiling the following code snippet: struct Base { int foo(int n) { return do_foo(n); } virtual int do_foo(int n) = 0; }; struct Derived : public Base { int do_foo(int n) override final { return n + 2; } }; int f(Derived& d) { return d.do_foo(40); } with g++ -S -O2 f.cxx results correctly in the assembly code: _Z1fR7Derived: .LFB2: .cfi_startproc movl $42, %eax ret This does obviously not happen when the 'final' keyword is removed. However, when we change f() to return foo instead of do_foo: return d.foo(40); the assembly code of f() changes to: _Z1fR7Derived: .LFB2: .cfi_startproc movq (%rdi), %rax leaq _ZN7Derived6do_fooEi(%rip), %rdx movq (%rax), %rax cmpq %rdx, %rax jne .L5 movl $42, %eax ret In other words, it failed to do the inlining. The reason I find this bad is because of std::pmr::memory_resource which follows this exact pattern, class memory_resource { ... void* allocate(size_t __bytes, size_t __alignment = _S_max_align) { return do_allocate(__bytes, __alignment); } void deallocate(void* __p, size_t __bytes, size_t __alignment = _S_max_align) { return do_deallocate(__p, __bytes, __alignment); } ... virtual void* do_allocate(size_t __bytes, size_t __alignment) = 0; virtual void do_deallocate(void* __p, size_t __bytes, size_t __alignment) = 0; ... }; I'd really like to use std::pmr::memory_resource at the moment, but only when the compiler will do the above optimization; then I can specify 'final' for the do_allocate and do_deallocate of my ultra fast pool memory allocators and get rid of the indirection of the virtual functions by making sure the caller has the right type (which is normally the case for the lowest level memory resource classes; only 'upstream' classes will be called through the memory_resource::allocate() member function of the base class, in which we're already one 'level' higher in the memory resource hierarchy, so speed isn't as much as a requirement anymore.