llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang-analysis Author: Balázs Benics (steakhal) <details> <summary>Changes</summary> This patch adds some documentation about the design of the Scalable Static Analysis Framework (SSAF) Summary Extraction part. This mainly focuses on how the custom FrontendAction would load different analyses (their extraction part), and the different formats it should export into. Each FrontendAction call would process a single TU by extracting summaries from them and serializing the results into a file in the desired format. The details are not polished yet, but I think it's still beneficial to have some guidance on how the upcoming components would fit together, hence this document. I'll come back to this document to keep it up-to-date as we proceed with the upstreaming. --- Full diff: https://github.com/llvm/llvm-project/pull/172876.diff 3 Files Affected: - (added) clang/docs/ScalableStaticAnalysisFramework/Framework.rst (+13) - (added) clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst (+105) - (modified) clang/docs/index.rst (+1) ``````````diff diff --git a/clang/docs/ScalableStaticAnalysisFramework/Framework.rst b/clang/docs/ScalableStaticAnalysisFramework/Framework.rst new file mode 100644 index 0000000000000..83983995b38f7 --- /dev/null +++ b/clang/docs/ScalableStaticAnalysisFramework/Framework.rst @@ -0,0 +1,13 @@ +================================== +Scalable Static Analysis Framework +================================== + +This is a framework for writing cross-translation unit analyses in a scalable and extensible setting. + +.. toctree:: + :caption: Table of Contents + :numbered: + :maxdepth: 1 + :glob: + + * \ No newline at end of file diff --git a/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst b/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst new file mode 100644 index 0000000000000..1223145e71d71 --- /dev/null +++ b/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst @@ -0,0 +1,105 @@ +================== +Summary Extraction +================== + +The simplest way to think about the lifetime of a summary extraction is by following the handlers of the ``FrontendAction`` implementing it. +There are 3 APIs that are important for us, that are invoked in this order: + + - ``BeingInvocation()``: Checks the command-line arguments related to summary extraction. + - ``CreateASTConsumer()``: Creates the ASTConsumers for the different summary extractors. + - ``EndSourceFile()``: Serializes and writes the extracted summaries. + +Implementation details +********************** + +Global Registries +================= + +The framework uses *registries* as an extension point for adding new summary analyses or serialization formats. + +A *registry* is basically a global function returning some local static storage housing objects that contain some function pointers. +Think of some cookbook that holds recipes, and the recipe refers to the instructions of how to cook (or *construct*) the *thing*. +Adding to the *registry* (or *cookbook*) can be achieved by creating a translation-unit local static object with a constructor that does this by inserting the given function pointers (*recipe*) to the ``vector/set/map`` of the *registry*. +When the executable starts, it will construct the global objects, thus also applying the side effect of populating the registries with the entries. + +**Pros**: + + - Decentralizes the registration. There is not a single place in the source code where we spell out all of the analyses/formats. + - Plays nicely with downstream extensibility, as downstream users can add their own analyses/formats without touching the source code of the framework; while still benefiting from the upstream-provided analyses/formats. + - Works with static and dynamic linking. In other words, plugins as shared objects compose naturally. + +**Cons**: + + - Registration slows down all ``clang`` users by a tiny amount, even if they don't invoke the summary extraction framework. + - As the registration is now decoupled, it's now a global program property; and potentially more difficult to reason about. + - Complicates testing. + - We have to deal with function pointers, as a layer of indirection, making it harder to debug where the indirect function calls go in an IDE, while statically inspecting the code. + +The general idea +---------------- + +.. code-block:: c++ + + //--- SomeRegistry.h + struct Registrar { + Registrar(std::string Name, void (*Printer)()); + }; + struct Info { + void (*Printer)(); + // Place more function pointers if needed. + }; + std::map<std::string, Info>& getRegistry(); + + //--- SomeRegistry.cpp + std::map<std::string, Info>& getRegistry() { + static std::map<std::string, Info> Storage; + return Storage; + } + Registrar::Registrar(std::string Name, void (*Printer)()) { + bool Inserted = getRegistry().try_emplace(std::move(Name), Info{Printer}).second; + assert(Inserted && "Name was already present in the registry"); + (void)Inserted; + } + + //--- MyAnalysis.cpp + extern void MyAnalysisPrinter() { + printf("MyAnalysisPrinter"); + } + static Registrar MyAnalysis("awesome-analysis", &MyAnalysisPrinter); + + //--- Framework.cpp + void print_all() { + for (const auto &[Name, Entry] : getRegistry()) { + (*Entry.Printer)(); // Invoke the customized printer. + } + } + +Details of ``BeingInvocation()`` +================================ + +#. Processes the different fields populated from the command line. Ensure that mandatory flags are set, etc. +#. For each requested analysis, check if we have a matching ``TUSummaryExtractorInfo`` in the static registry, and diagnose if not. +#. Parse the format name, and check if we have a matching ``FormatInfo`` in the format registry. +#. Lastly, forward the ``BeginInvocation`` call to the wrapped FrontendAction. + + +Details of ``CreateASTConsumer()`` +================================== + +#. Create the wrapped ``FrontendAction`` consumers by calling ``CreateASTConsumer()`` on it. +#. Call ``ssaf::makeTUSummaryExtractor()`` on each requested analysis name. + + #. Look up in the *summary registry* the relevant *Info* object and call the ``Factory`` function pointer to create the relevant ``ASTConsumer``. + #. Remember, we pass a mutable ``TUSummaryBuilder`` reference to the constructor, so the analysis can create ``EntityID`` objects and map them to ``TUSummaryData`` objects in their implementation. Their custom metadata needs to inherit from ``TUSummaryData`` to achieve this. + +#. Lastly, add all of these ``ASTConsumers`` to the ``MultiplexConsumer`` and return that. + + +Details of ``EndSourceFile()`` +============================== + +#. Call the virtual ``writeTUSummary()`` on the serialization format, leading to the desired format handler (such as JSON or binary or something custom - provided by a plugin). + + #. Create the directory structure for the enabled analyses. + #. Serialize ``entities``, ``entity_linkage``, etc. Achieve by calling the matching virtual functions, dispatching to the concrete implementation. + #. The same goes for each enabled analysis, take the ``EntityID`` to ``TUSummaryData`` mapping and serialize them using the analysis-provided ``Serialize`` function pointer. diff --git a/clang/docs/index.rst b/clang/docs/index.rst index 70c8737a2fe0d..a0d0401ed1c86 100644 --- a/clang/docs/index.rst +++ b/clang/docs/index.rst @@ -27,6 +27,7 @@ Using Clang as a Compiler ClangStaticAnalyzer ThreadSafetyAnalysis SafeBuffers + ScalableStaticAnalysisFramework/Framework DataFlowAnalysisIntro FunctionEffectAnalysis AddressSanitizer `````````` </details> https://github.com/llvm/llvm-project/pull/172876 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
