[clang] [analyzer][docs] Document how to use perf and uftrace to debug performance issues (PR #126724)

via cfe-commits Tue, 11 Feb 2025 04:54:36 -0800

llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

<details>
<summary>Changes</summary>



---
Full diff: https://github.com/llvm/llvm-project/pull/126724.diff


3 Files Affected:

- (modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst 
(+93-3) 
- (added) clang/docs/analyzer/images/flamegraph.png () 
- (added) clang/docs/analyzer/images/uftrace_detailed.png () 


``````````diff
diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst 
b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..6d1a5f126223d93 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
 Multiple factors contribute to the time it takes to analyze a file with Clang 
Static Analyzer.
 A translation unit contains multiple entry points, each of which take multiple 
steps to analyze.
 
+Performance analysis using ``-ftime-trace``
+===========================================
+
 You can add the ``-ftime-trace=file.json`` option to break down the analysis 
time into individual entry points and steps within each entry point.
 You can explore the generated JSON file in a Chromium browser using the 
``chrome://tracing`` URL,
 or using `speedscope <https://speedscope.app>`_.
@@ -19,9 +22,8 @@ Here is an example of a time trace produced with
 .. code-block:: bash
    :caption: Clang Static Analyzer invocation to generate a time trace of 
string.c analysis.
 
-   clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
-         -setup-static-analyzer 
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
-         -verify ./clang/test/Analysis/string.c \
+   clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+         -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
          -ftime-trace=trace.json -ftime-trace-granularity=1
 
 .. image:: ../images/speedscope.png
@@ -45,3 +47,91 @@ Note: Both Chrome-tracing and speedscope tools might 
struggle with time traces a
 Luckily, in most cases the default max-steps boundary of 225 000 produces the 
traces of approximately that size
 for a single entry point.
 You can use ``-analyze-function=get_global_options`` together with 
``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is a tool for conducting 
sampling-based profiling.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the 
``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring 
``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ 
if you are in trouble.
+
+.. code-block:: bash
+   :caption: Running the Clang Static Analyzer through ``perf`` to gather 
samples of the execution.
+
+   # -F: Sampling frequency, use `-F max` for maximal frequency
+   # -g: Enable call-graph recording for both kernel and user space
+   perf record -F 99 -g --  clang -cc1 -analyze -verify 
clang/test/Analysis/string.c \
+         -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git 
repository,
+as we will use some scripts from there to convert the ``perf`` samples into a 
Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+   :caption: Converting the ``perf`` profile into a Flamegraph, then opening 
it in Firefox.
+
+   perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+   /path/to/FlameGraph/flamegraph.pl perf.folded  > perf.svg
+   firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ 
is a great tool to generate rich profile data
+that you can use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, 
so it should be more precise and thorough than the sampling-based approaches 
like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be 
profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or 
``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+It will also consume many gigabites of storage for a single trace unless 
filter flags are used during recording.
+
+.. code-block:: bash
+   :caption: Recording with ``uftrace``, then dumping the result as a Chrome 
trace JSON.
+
+   uftrace record  clang -cc1 -analyze -verify clang/test/Analysis/string.c \
+         -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
+   uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" 
--time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry 
point, which takes at least 300 nanoseconds to run, visualized by Chrome's 
``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the common options, refer to the ``uftrace`` `documentation 
<https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.
+
+Similar filters can be applied for dumping too. That way you can reuse the 
same (detailed)
+recording to selectively focus on some special part using a refinement of the 
filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or 
`speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+If you do not apply filters on recording, you will collect a large trace and 
every dump operation
+would need to sieve through the much larger recording which may be annoying if 
done repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump as plain 
text and look for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace 
dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+   cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | 
head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump 
--framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.png 
b/clang/docs/analyzer/images/flamegraph.png
new file mode 100644
index 000000000000000..b16ec90b9e600db
Binary files /dev/null and b/clang/docs/analyzer/images/flamegraph.png differ
diff --git a/clang/docs/analyzer/images/uftrace_detailed.png 
b/clang/docs/analyzer/images/uftrace_detailed.png
new file mode 100644
index 000000000000000..fcf681909d07068
Binary files /dev/null and b/clang/docs/analyzer/images/uftrace_detailed.png 
differ

``````````

</details>


https://github.com/llvm/llvm-project/pull/126724
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][docs] Document how to use perf and uftrace to debug performance issues (PR #126724)

Reply via email to