llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) <details> <summary>Changes</summary> --- Full diff: https://github.com/llvm/llvm-project/pull/126724.diff 3 Files Affected: - (modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst (+93-3) - (added) clang/docs/analyzer/images/flamegraph.png () - (added) clang/docs/analyzer/images/uftrace_detailed.png () ``````````diff diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst index 3ee6e117a846528..6d1a5f126223d93 100644 --- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst +++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst @@ -5,6 +5,9 @@ Performance Investigation Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer. A translation unit contains multiple entry points, each of which take multiple steps to analyze. +Performance analysis using ``-ftime-trace`` +=========================================== + You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point. You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL, or using `speedscope <https://speedscope.app>`_. @@ -19,9 +22,8 @@ Here is an example of a time trace produced with .. code-block:: bash :caption: Clang Static Analyzer invocation to generate a time trace of string.c analysis. - clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \ - -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \ - -verify ./clang/test/Analysis/string.c \ + clang -cc1 -analyze -verify clang/test/Analysis/string.c \ + -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \ -ftime-trace=trace.json -ftime-trace-granularity=1 .. image:: ../images/speedscope.png @@ -45,3 +47,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size for a single entry point. You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point. + + +Performance analysis using ``perf`` +=================================== + +`Perf <https://perfwiki.github.io/main/>`_ is a tool for conducting sampling-based profiling. +It's easy to start profiling, you only have 2 prerequisites. +Build with ``-fno-omit-frame-pointer`` and debug info (``-g``). +You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo`` +along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``. +Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble. + +.. code-block:: bash + :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution. + + # -F: Sampling frequency, use `-F max` for maximal frequency + # -g: Enable call-graph recording for both kernel and user space + perf record -F 99 -g -- clang -cc1 -analyze -verify clang/test/Analysis/string.c \ + -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection + +Once you have the profile data, you can use it to produce a Flame graph. +A Flame graph is a visual representation of the stack frames of the samples. +Common stack frame prefixes are squashed together, making up a wider bar. +The wider the bar, the more time was spent under that particular stack frame, +giving a sense of how the overall execution time was spent. + +Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository, +as we will use some scripts from there to convert the ``perf`` samples into a Flame graph. +It's also useful to check out Brendan Gregg's (the author of FlameGraph) +`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_. + + +.. code-block:: bash + :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox. + + perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded + /path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg + firefox perf.svg + +.. image:: ../images/flamegraph.svg + + +Performance analysis using ``uftrace`` +====================================== + +`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data +that you can use to focus and drill down into the timeline of your application. +We will use it to generate Chromium trace JSON. +In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``. +In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``. +All functions are profiled due to static instrumentation. + +There is only one prerequisite to use this tool. +You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``. +This will make it run substantially slower but allows rich instrumentation. +It will also consume many gigabites of storage for a single trace unless filter flags are used during recording. + +.. code-block:: bash + :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON. + + uftrace record clang -cc1 -analyze -verify clang/test/Analysis/string.c \ + -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection + uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json + +.. image:: ../images/uftrace_detailed.png + +In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page +You can also see how deep function calls we may have due to AST visitors. + +Using different filters can reduce the number of functions to record. +For the common options, refer to the ``uftrace`` `documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_. + +Similar filters can be applied for dumping too. That way you can reuse the same (detailed) +recording to selectively focus on some special part using a refinement of the filter flags. +Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_, +thus it needs to be of a limited size. +If you do not apply filters on recording, you will collect a large trace and every dump operation +would need to sieve through the much larger recording which may be annoying if done repeatedly. + +If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts. +Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call. +To see what functions appear frequently in the trace, use this command: + +.. code-block:: bash + + cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50 + +``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``. diff --git a/clang/docs/analyzer/images/flamegraph.png b/clang/docs/analyzer/images/flamegraph.png new file mode 100644 index 000000000000000..b16ec90b9e600db Binary files /dev/null and b/clang/docs/analyzer/images/flamegraph.png differ diff --git a/clang/docs/analyzer/images/uftrace_detailed.png b/clang/docs/analyzer/images/uftrace_detailed.png new file mode 100644 index 000000000000000..fcf681909d07068 Binary files /dev/null and b/clang/docs/analyzer/images/uftrace_detailed.png differ `````````` </details> https://github.com/llvm/llvm-project/pull/126724 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits