ChuanqiXu9 wrote: > @usx95 may be able to help with the reproducer. > > In the meantime, I'm trying to collect some information on the compile times. > So far it looks like we have a ~10-15x compile time regression on some > translation units. Without this patch `-ftime-report` shows: > > ``` > ===-------------------------------------------------------------------------=== > Clang front-end time report > ===-------------------------------------------------------------------------=== > Total Execution Time: 39.1940 seconds (39.7238 wall clock) > > ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- > Name --- > 28.2611 ( 77.5%) 1.8439 ( 67.3%) 30.1050 ( 76.8%) 30.5230 ( 76.8%) > Clang front-end timer > 8.1911 ( 22.5%) 0.8980 ( 32.7%) 9.0891 ( 23.2%) 9.2009 ( 23.2%) > Reading modules > 36.4522 (100.0%) 2.7419 (100.0%) 39.1940 (100.0%) 39.7238 (100.0%) > Total > ``` > > With it: > > ``` > ===-------------------------------------------------------------------------=== > Clang front-end time report > ===-------------------------------------------------------------------------=== > Total Execution Time: 466.7373 seconds (1251.6300 wall clock) > ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- > Name --- > 404.7200 ( 96.1%) 40.6383 ( 88.8%) 445.3583 ( 95.4%) 471.9647 ( 37.7%) > Clang front-end timer > 15.2098 ( 3.6%) 3.3586 ( 7.3%) 18.5684 ( 4.0%) 398.1242 ( 31.8%) > Reading modules > 420.9899 (100.0%) 45.7474 (100.0%) 466.7373 (100.0%) 1251.6300 (100.0%) > Total > ``` > > `perf record -g` / `perf report` give the following picture: > > ``` > Children Self Command Shared Object Symbol > + 94.85% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformCallExpr(clang::CallExpr*) [clone > .__uniq.16014532493918845222783194145290083557] ◆ > + 93.47% 0.00% clang clang [.] > clang::Sema::InstantiateFunctionDefinition(clang::SourceLocation, > clang::FunctionDecl*, bool, bool, bool) > ▒ > + 93.37% 83.51% clang clang [.] > clang::ASTReader::LoadExternalSpecializations(clang::Decl const*, bool) > > ▒ > + 93.19% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformCompoundStmt(clang::CompoundStmt*, > bool) [clone .__uniq.16014532493918845222783194▒ > + 93.08% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformUnresolvedLookupExpr(clang::UnresolvedLookupExpr*, > bool) [clone .__uniq.1601453249▒ > + 92.98% 0.00% clang clang [.] > clang::Sema::BuildTemplateIdExpr(clang::CXXScopeSpec const&, > clang::SourceLocation, clang::LookupResult&, bool, > clang::TemplateArgumentListInfo const*) ▒ > + 92.44% 0.00% clang clang [.] > clang::Sema::CheckVarTemplateId(clang::VarTemplateDecl*, > clang::SourceLocation, clang::SourceLocation, clang::TemplateArgumentListInfo > const&) ▒ > + 92.08% 0.00% clang clang [.] > clang::Sema::InstantiateVariableInitializer(clang::VarDecl*, clang::VarDecl*, > clang::MultiLevelTemplateArgumentList const&) > ▒ > + 91.87% 0.00% clang clang [.] > clang::VarTemplateDecl::getPartialSpecializations(llvm::SmallVectorImpl<clang::VarTemplatePartialSpecializationDecl*>&) > const ▒ > + 91.18% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformBinaryOperator(clang::BinaryOperator*) > [clone .__uniq.1601453249391884522278319414▒ > + 91.07% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformExprs(clang::Expr* const*, > unsigned int, bool, llvm::SmallVectorImpl<clang::Expr*>▒ > + 90.70% 0.01% clang clang [.] > clang::Sema::InstantiateVariableDefinition(clang::SourceLocation, > clang::VarDecl*, bool, bool, bool) > ▒ > + 90.41% 0.01% clang clang [.] > clang::Sema::BuildDeclarationNameExpr(clang::CXXScopeSpec const&, > clang::DeclarationNameInfo const&, clang::NamedDecl*, clang::NamedDecl*, > clang::TemplateArgu▒ > + 90.29% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformInitListExpr(clang::InitListExpr*) > [clone .__uniq.16014532493918845222783194145290▒ > + 89.92% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformParenExpr(clang::ParenExpr*) > [clone .__uniq.16014532493918845222783194145290083557▒ > + 89.23% 0.00% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformConditionalOperator(clang::ConditionalOperator*) > [clone .__uniq.160145324939188452▒ > + 84.49% 0.02% clang clang [.] > clang::Sema::RequireCompleteTypeImpl(clang::SourceLocation, clang::QualType, > clang::Sema::CompleteTypeKind, clang::Sema::TypeDiagnoser*) > ▒ > + 84.47% 0.00% clang clang [.] > clang::Sema::InstantiateClassTemplateSpecialization(clang::SourceLocation, > clang::ClassTemplateSpecializationDecl*, clang::TemplateSpecializationKind, > bool) ▒ > + 84.07% 0.01% clang clang [.] > clang::Sema::InstantiateClass(clang::SourceLocation, clang::CXXRecordDecl*, > clang::CXXRecordDecl*, clang::MultiLevelTemplateArgumentList const&, > clang::Templa▒ > + 82.84% 0.02% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformType(clang::TypeLocBuilder&, > clang::TypeLoc) [clone .__uniq.1601453249391884522278▒ > + 82.23% 0.02% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformTemplateSpecializationType(clang::TypeLocBuilder&, > clang::TemplateSpecializationTy▒ > + 81.99% 0.01% clang clang [.] (anonymous > namespace)::TemplateInstantiator::TransformTemplateArgument(clang::TemplateArgumentLoc > const&, clang::TemplateArgumentLoc&, bool) [clone .__uniq.16▒ > + 81.54% 0.00% clang clang [.] > clang::Sema::RequireCompleteDeclContext(clang::CXXScopeSpec&, > clang::DeclContext*) > ▒ > + 80.18% 0.01% clang clang [.] > clang::TreeTransform<(anonymous > namespace)::TemplateInstantiator>::TransformType(clang::TypeSourceInfo*) > [clone .__uniq.16014532493918845222783194145290083557▒ > + 79.88% 0.12% clang clang [.] > clang::Sema::CheckTemplateIdType(clang::TemplateName, clang::SourceLocation, > clang::TemplateArgumentListInfo&) > ▒ > ``` > > I can try to build clang with better debug information and get a higher > fidelity profile, but hopefully this already shows the direction to look at.
Thanks. It looks like `ASTReader::LoadExternalSpecializations(const Decl *D, bool OnlyPartial)` is the hot spot. I didn't think about it. Maybe the problem here is `findAll()`? Since we would always load all the specializations. Or the problem is we may call `findAll()` too many times. I'll try to take a look. And a profiling result with more information will be definitely helpful. https://github.com/llvm/llvm-project/pull/83237 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits