https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63164
Thomas de Bock <tdebock at DRWUK dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tdebock at DRWUK dot com --- Comment #9 from Thomas de Bock <tdebock at DRWUK dot com> --- I managed to implement this optimization on rtti.cc:763 by simply adding a check for if the target type is final (though ideally it should also recognize non-final types with final destructor), then building the generic tree, that compares the addresses of the source object and target type's vtables at runtime. The issue with that is exactly as Florian says, this being a valid solution depends on the source either being statically compiled or __GXX_MERGED_TYPEINFO_NAMES being enabled (which it is not by default), allowing for the optimization and never needing to call __dynamic_cast or compare the type names. Looking into the way clang implements this optimization, we can see at CGExprCXX.cpp:2311 that it decides whether or not the optimization should be applied: // If the destination is effectively final, the cast succeeds if and only // if the dynamic type of the pointer is exactly the destination type. bool IsExact = !IsDynamicCastToVoid && CGM.getCodeGenOpts().OptimizationLevel > 0 && DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() && CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy); with ItaniumCXXABI.cpp:225: bool shouldEmitExactDynamicCast(QualType DestRecordTy) override { return hasUniqueVTablePointer(DestRecordTy); } and ItaniumCXXABI.cpp:194: bool hasUniqueVTablePointer(QualType RecordTy) { const CXXRecordDecl *RD = RecordTy->getAsCXXRecordDecl(); // Under -fapple-kext, multiple definitions of the same vtable may be // emitted. if (!CGM.getCodeGenOpts().AssumeUniqueVTables || getContext().getLangOpts().AppleKext) return false; // If the type_info* would be null, the vtable might be merged with that of // another type. if (!CGM.shouldEmitRTTI()) return false; // If there's only one definition of the vtable in the program, it has a // unique address. if (!llvm::GlobalValue::isWeakForLinker(CGM.getVTableLinkage(RD))) return true; // Even if there are multiple definitions of the vtable, they are required // by the ABI to use the same symbol name, so should be merged at load // time. However, if the class has hidden visibility, there can be // different versions of the class in different modules, and the ABI // library might treat them as being the same. if (CGM.GetLLVMVisibility(RD->getVisibility()) != llvm::GlobalValue::DefaultVisibility) return false; return true; } As Florian said, with this code, would there not still be a case in which the optimization is applied but fails(?): When loading a library with dlopen and RTLD_LOCAL, since the types cannot be resolved, two equivalent types (one from the executable, one from the library loaded with dlopen) will not share a vtable, causing the optimized dynamic_cast code to incorrectly decide the types are not the same. If this is not the case could we not replicate clang's behaviour by checking the value of __GXX_MERGED_TYPEINFO_NAMES around rtti.cc:763, the typeinfo header can simply check the __GXX_MERGED_TYPEINFO_NAMES value and adjust its implementation accordingly, but: - Is it, at the point where the generic tree for dynamic_cast (or hopefully the optimization) is constructed, possible anymore to check for the value of the __GXX_MERGED_TYPEINFO_NAMES preprocessor directive? - If not, would implementing the optimization by adding an additional function (maybe in the typeinfo header) that handles these final type dynamic_casts that checks the value of __GXX_MERGED_TYPEINFO_NAMES, then decides based on that whether to simply compare vtable ptrs or compare the type names too, be realistic?