[clang] [analyzer][docs] Document how to use perf and uftrace to debug performance issues (PR #126520)

via cfe-commits Mon, 10 Feb 2025 06:15:33 -0800

llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

<details>
<summary>Changes</summary>



---

Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/126520.diff


3 Files Affected:

- (modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst 
(+91) 
- (added) clang/docs/analyzer/images/flamegraph.svg (+27641) 
- (added) clang/docs/analyzer/images/uftrace_detailed.png () 


``````````diff
diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst 
b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..c33853aca76168d 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
 Multiple factors contribute to the time it takes to analyze a file with Clang 
Static Analyzer.
 A translation unit contains multiple entry points, each of which take multiple 
steps to analyze.
 
+Performance analysis using ``-ftime-trace``
+===========================================
+
 You can add the ``-ftime-trace=file.json`` option to break down the analysis 
time into individual entry points and steps within each entry point.
 You can explore the generated JSON file in a Chromium browser using the 
``chrome://tracing`` URL,
 or using `speedscope <https://speedscope.app>`_.
@@ -45,3 +48,91 @@ Note: Both Chrome-tracing and speedscope tools might 
struggle with time traces a
 Luckily, in most cases the default max-steps boundary of 225 000 produces the 
traces of approximately that size
 for a single entry point.
 You can use ``-analyze-function=get_global_options`` together with 
``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for 
sampling-based profiling of an application.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the 
``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring 
``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ 
if you are in trouble.
+
+.. code-block:: bash
+   :caption: Running the Clang Static Analyzer through ``perf`` to gather 
samples of the execution.
+
+   # -F: Sampling frequency, use `-F max` for maximal frequency
+   # -g: Enable call-graph recording for both kernel and user space
+   perf record -F 99 -g --  clang -cc1 -nostdsysteminc -analyze 
-analyzer-constraints=range \
+         -setup-static-analyzer 
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+         -verify ./clang/test/Analysis/string.c
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git 
repository,
+as we will use some scripts from there to convert the ``perf`` samples into a 
Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+   :caption: Converting the ``perf`` profile into a Flamegraph, then opening 
it in Firefox.
+
+   perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+   /path/to/FlameGraph/flamegraph.pl perf.folded  > perf.svg
+   firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ 
is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your 
application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, 
so it should be more precise and through than the sampling-based approaches 
like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be 
profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or 
``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+
+.. code-block:: bash
+   :caption: Recording with ``uftrace``, then dumping the result as a Chrome 
trace JSON.
+
+   uftrace record clang -cc1 -nostdsysteminc -analyze 
-analyzer-constraints=range \
+         -setup-static-analyzer 
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+         -verify ./clang/test/Analysis/string.c
+   uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" 
--time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry 
point, which takes at least 300 nanoseconds to run, visualized by Chrome's 
``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the `common options 
<https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_,
 refer to the ``uftrace`` documentation.
+
+Similar filters could be applied for dumping too. That way you can reuse the 
same (detailed)
+recording to selectively focus on some special part using a refinement of the 
filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or 
`speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the 
whole recording if called repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump and look 
for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace 
dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+   cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | 
head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump 
--framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.svg 
b/clang/docs/analyzer/images/flamegraph.svg
new file mode 100644
index 000000000000000..d3c4a22c9ff536a
--- /dev/null
+++ b/clang/docs/analyzer/images/flamegraph.svg
@@ -0,0 +1,27641 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd";>
+<svg version="1.1" width="1200" height="934" onload="init(evt)" viewBox="0 0 
1200 934" xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink";>
+<!-- Flame graph stack visualization. See 
https://github.com/brendangregg/FlameGraph for latest version, and 
http://www.brendangregg.com/flamegraphs.html for examples. -->
+<!-- NOTES:  -->
+<defs>
+       <linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
+               <stop stop-color="#eeeeee" offset="5%" />
+               <stop stop-color="#eeeeb0" offset="95%" />
+       </linearGradient>
+</defs>
+<style type="text/css">
+       text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); }
+       #search, #ignorecase { opacity:0.1; cursor:pointer; }
+       #search:hover, #search.show, #ignorecase:hover, #ignorecase.show { 
opacity:1; }
+       #subtitle { text-anchor:middle; font-color:rgb(160,160,160); }
+       #title { text-anchor:middle; font-size:17px}
+       #unzoom { cursor:pointer; }
+       #frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; }
+       .hide { display:none; }
+       .parent { opacity:0.5; }
+</style>
+<script type="text/ecmascript">
+<![CDATA[
+       "use strict";
+       var details, searchbtn, unzoombtn, matchedtxt, svg, searching, 
currentSearchTerm, ignorecase, ignorecaseBtn;
+       function init(evt) {
+               details = document.getElementById("details").firstChild;
+               searchbtn = document.getElementById("search");
+               ignorecaseBtn = document.getElementById("ignorecase");
+               unzoombtn = document.getElementById("unzoom");
+               matchedtxt = document.getElementById("matched");
+               svg = document.getElementsByTagName("svg")[0];
+               searching = 0;
+               currentSearchTerm = null;
+
+               // use GET parameters to restore a flamegraphs state.
+               var params = get_params();
+               if (params.x && params.y)
+                       zoom(find_group(document.querySelector('[x="' + 
params.x + '"][y="' + params.y + '"]')));
+                if (params.s) search(params.s);
+       }
+
+       // event listeners
+       window.addEventListener("click", function(e) {
+               var target = find_group(e.target);
+               if (target) {
+                       if (target.nodeName == "a") {
+                               if (e.ctrlKey === false) return;
+                               e.preventDefault();
+                       }
+                       if (target.classList.contains("parent")) unzoom(true);
+                       zoom(target);
+                       if (!document.querySelector('.parent')) {
+                               // we have basically done a clearzoom so clear 
the url
+                               var params = get_params();
+                               if (params.x) delete params.x;
+                               if (params.y) delete params.y;
+                               history.replaceState(null, null, 
parse_params(params));
+                               unzoombtn.classList.add("hide");
+                               return;
+                       }
+
+                       // set parameters for zoom state
+                       var el = target.querySelector("rect");
+                       if (el && el.attributes && el.attributes.y && 
el.attributes._orig_x) {
+                               var params = get_params()
+                               params.x = el.attributes._orig_x.value;
+                               params.y = el.attributes.y.value;
+                               history.replaceState(null, null, 
parse_params(params));
+                       }
+               }
+               else if (e.target.id == "unzoom") clearzoom();
+               else if (e.target.id == "search") search_prompt();
+               else if (e.target.id == "ignorecase") toggle_ignorecase();
+       }, false)
+
+       // mouse-over for info
+       // show
+       window.addEventListener("mouseover", function(e) {
+               var target = find_group(e.target);
+               if (target) details.nodeValue = "Function: " + 
g_to_text(target);
+       }, false)
+
+       // clear
+       window.addEventListener("mouseout", function(e) {
+               var target = find_group(e.target);
+               if (target) details.nodeValue = ' ';
+       }, false)
+
+       // ctrl-F for search
+       // ctrl-I to toggle case-sensitive search
+       window.addEventListener("keydown",function (e) {
+               if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) {
+                       e.preventDefault();
+                       search_prompt();
+               }
+               else if (e.ctrlKey && e.keyCode === 73) {
+                       e.preventDefault();
+                       toggle_ignorecase();
+               }
+       }, false)
+
+       // functions
+       function get_params() {
+               var params = {};
+               var paramsarr = window.location.search.substr(1).split('&');
+               for (var i = 0; i < paramsarr.length; ++i) {
+                       var tmp = paramsarr[i].split("=");
+                       if (!tmp[0] || !tmp[1]) continue;
+                       params[tmp[0]]  = decodeURIComponent(tmp[1]);
+               }
+               return params;
+       }
+       function parse_params(params) {
+               var uri = "?";
+               for (var key in params) {
+                       uri += key + '=' + encodeURIComponent(params[key]) + 
'&';
+               }
+               if (uri.slice(-1) == "&")
+                       uri = uri.substring(0, uri.length - 1);
+               if (uri == '?')
+                       uri = window.location.href.split('?')[0];
+               return uri;
+       }
+       function find_child(node, selector) {
+               var children = node.querySelectorAll(selector);
+               if (children.length) return children[0];
+       }
+       function find_group(node) {
+               var parent = node.parentElement;
+               if (!parent) return;
+               if (parent.id == "frames") return node;
+               return find_group(parent);
+       }
+       function orig_save(e, attr, val) {
+               if (e.attributes["_orig_" + attr] != undefined) return;
+               if (e.attributes[attr] == undefined) return;
+               if (val == undefined) val = e.attributes[attr].value;
+               e.setAttribute("_orig_" + attr, val);
+       }
+       function orig_load(e, attr) {
+               if (e.attributes["_orig_"+attr] == undefined) return;
+               e.attributes[attr].value = e.attributes["_orig_" + attr].value;
+               e.removeAttribute("_orig_"+attr);
+       }
+       function g_to_text(e) {
+               var text = find_child(e, "title").firstChild.nodeValue;
+               return (text)
+       }
+       function g_to_func(e) {
+               var func = g_to_text(e);
+               // if there's any manipulation we want to do to the function
+               // name before it's searched, do it here before returning.
+               return (func);
+       }
+       function update_text(e) {
+               var r = find_child(e, "rect");
+               var t = find_child(e, "text");
+               var w = parseFloat(r.attributes.width.value) -3;
+               var txt = find_child(e, 
"title").textContent.replace(/\([^(]*\)$/,"");
+               t.attributes.x.value = parseFloat(r.attributes.x.value) + 3;
+
+               // Smaller than this size won't fit anything
+               if (w < 2 * 12 * 0.59) {
+                       t.textContent = "";
+                       return;
+               }
+
+               t.textContent = txt;
+               var sl = t.getSubStringLength(0, txt.length);
+               // check if only whitespace or if we can fit the entire string 
into width w
+               if (/^ *$/.test(txt) || sl < w)
+                       return;
+
+               // this isn't perfect, but gives a good starting point
+               // and avoids calling getSubStringLength too often
+               var start = Math.floor((w/sl) * txt.length);
+               for (var x = start; x > 0; x = x-2) {
+                       if (t.getSubStringLength(0, x + 2) <= w) {
+                               t.textContent = txt.substring(0, x) + "..";
+                               return;
+                       }
+               }
+               t.textContent = "";
+       }
+
+       // zoom
+       function zoom_reset(e) {
+               if (e.attributes != undefined) {
+                       orig_load(e, "x");
+                       orig_load(e, "width");
+               }
+               if (e.childNodes == undefined) return;
+               for (var i = 0, c = e.childNodes; i < c.length; i++) {
+                       zoom_reset(c[i]);
+               }
+       }
+       function zoom_child(e, x, ratio) {
+               if (e.attributes != undefined) {
+                       if (e.attributes.x != undefined) {
+                               orig_save(e, "x");
+                               e.attributes.x.value = 
(parseFloat(e.attributes.x.value) - x - 10) * ratio + 10;
+                               if (e.tagName == "text")
+                                       e.attributes.x.value = 
find_child(e.parentNode, "rect[x]").attributes.x.value + 3;
+                       }
+                       if (e.attributes.width != undefined) {
+                               orig_save(e, "width");
+                               e.attributes.width.value = 
parseFloat(e.attributes.width.value) * ratio;
+                       }
+               }
+
+               if (e.childNodes == undefined) return;
+               for (var i = 0, c = e.childNodes; i < c.length; i++) {
+                       zoom_child(c[i], x - 10, ratio);
+               }
+       }
+       function zoom_parent(e) {
+               if (e.attributes) {
+                       if (e.attributes.x != undefined) {
+                               orig_save(e, "x");
+                               e.attributes.x.value = 10;
+                       }
+                       if (e.attributes.width != undefined) {
+                               orig_save(e, "width");
+                               e.attributes.width.value = 
parseInt(svg.width.baseVal.value) - (10 * 2);
+                       }
+               }
+               if (e.childNodes == undefined) return;
+               for (var i = 0, c = e.childNodes; i < c.length; i++) {
+                       zoom_parent(c[i]);
+               }
+       }
+       function zoom(node) {
+               var attr = find_child(node, "rect").attributes;
+               var width = parseFloat(attr.width.value);
+               var xmin = parseFloat(attr.x.value);
+               var xmax = parseFloat(xmin + width);
+               var ymin = parseFloat(attr.y.value);
+               var ratio = (svg.width.baseVal.value - 2 * 10) / width;
+
+               // XXX: Workaround for JavaScript float issues (fix me)
+               var fudge = 0.0001;
+
+               unzoombtn.classList.remove("hide");
+
+               var el = document.getElementById("frames").children;
+               for (var i = 0; i < el.length; i++) {
+                       var e = el[i];
+                       var a = find_child(e, "rect").attributes;
+                       var ex = parseFloat(a.x.value);
+                       var ew = parseFloat(a.width.value);
+                       var upstack;
+                       // Is it an ancestor
+                       if (0 == 0) {
+                               upstack = parseFloat(a.y.value) > ymin;
+                       } else {
+                               upstack = parseFloat(a.y.value) < ymin;
+                       }
+                       if (upstack) {
+                               // Direct ancestor
+                               if (ex <= xmin && (ex+ew+fudge) >= xmax) {
+                                       e.classList.add("parent");
+                                       zoom_parent(e);
+                                       update_text(e);
+                               }
+                               // not in current path
+                               else
+                                       e.classList.add("hide");
+                       }
+                       // Children maybe
+                       else {
+                               // no common path
+                               if (ex < xmin || ex + fudge >= xmax) {
+                                       e.classList.add("hide");
+                               }
+                               else {
+                                       zoom_child(e, xmin, ratio);
+                                       update_text(e);
+                               }
+                       }
+               }
+               search();
+       }
+       function unzoom(dont_update_text) {
+               unzoombtn.classList.add("hide");
+               var el = document.getElementById("frames").children;
+               for(var i = 0; i < el.length; i++) {
+                       el[i].classList.remove("parent");
+                       el[i].classList.remove("hide");
+                       zoom_reset(el[i]);
+                       if(!dont_update_text) update_text(el[i]);
+               }
+               search();
+       }
+       function clearzoom() {
+               unzoom();
+
+               // remove zoom state
+               var params = get_params();
+               if (params.x) delete params.x;
+               if (params.y) delete params.y;
+               history.replaceState(null, null, parse_params(params));
+       }
+
+       // search
+       function toggle_ignorecase() {
+               ignorecase = !ignorecase;
+               if (ignorecase) {
+                       ignorecaseBtn.classList.add("show");
+               } else {
+                       ignorecaseBtn.classList.remove("show");
+               }
+               reset_search();
+               search();
+       }
+       function reset_search() {
+               var el = document.querySelectorAll("#frames rect");
+               for (var i = 0; i < el.length; i++) {
+                       orig_load(el[i], "fill")
+               }
+               var params = get_params();
+               delete params.s;
+               history.replaceState(null, null, parse_params(params));
+       }
+       function search_prompt() {
+               if (!searching) {
+                       var term = prompt("Enter a search term (regexp " +
+                           "allowed, eg: ^ext4_)"
+                           + (ignorecase ? ", ignoring case" : "")
+                           + "\nPress Ctrl-i to toggle case sensitivity", "");
+                       if (term != null) search(term);
+               } else {
+                       reset_search();
+                       searching = 0;
+                       currentSearchTerm = null;
+                       searchbtn.classList.remove("show");
+                       searchbtn.firstChild.nodeValue = "Search"
+                       matchedtxt.classList.add("hide");
+                       matchedtxt.firstChild.nodeValue = ""
+               }
+       }
+       function search(term) {
+               if (term) currentSearchTerm = term;
+
+               var re = new RegExp(currentSearchTerm, ignorecase ? 'i' : '');
+               var el = document.getElementById("frames").children;
+               var matches = new Object();
+               var maxwidth = 0;
+               for (var i = 0; i < el.length; i++) {
+                       var e = el[i];
+                       var func = g_to_func(e);
+                       var rect = find_child(e, "rect");
+                       if (func == null || rect == null)
+                               continue;
+
+                       // Save max width. Only works as we have a root frame
+                       var w = parseFloat(rect.attributes.width.value);
+                       if (w > maxwidth)
+                               maxwidth = w;
+
+                       if (func.match(re)) {
+                               // highlight
+                               var x = parseFloat(rect.attributes.x.value);
+                               orig_save(rect, "fill");
+                               rect.attributes.fill.value = "rgb(230,0,230)";
+
+                               // remember matches
+                               if (matches[x] == undefined) {
+                                       matches[x] = w;
+                               } else {
+                                       if (w > matches[x]) {
+                                               // overwrite with parent
+                                               matches[x] = w;
+                                       }
+                               }
+                               searching = 1;
+                       }
+               }
+               if (!searching)
+                       return;
+               var params = get_params();
+               params.s = currentSearchTerm;
+               history.replaceState(null, null, parse_params(params));
+
+               searchbtn.classList.add("show");
+               searchbtn.firstChild.nodeValue = "Reset Search";
+
+               // calculate percent matched, excluding vertical overlap
+               var count = 0;
+               var lastx = -1;
+               var lastw = 0;
+               var keys = Array();
+               for (k in matches) {
+                       if (matches.hasOwnProperty(k))
+                               keys.push(k);
+               }
+               // sort the matched frames by their x location
+               // ascending, then width descending
+               keys.sort(function(a, b){
+                       return a - b;
+               });
+               // Step through frames saving only the biggest bottom-up frames
+               // thanks to the sort order. This relies on the tree property
+               // where children are always smaller than their parents.
+               var fudge = 0.0001;     // JavaScript floating point
+               for (var k in keys) {
+                       var x = parseFloat(keys[k]);
+                       var w = matches[keys[k]];
+                       if (x >= lastx + lastw - fudge) {
+                               count += w;
+                               lastx = x;
+                               lastw = w;
+                       }
+               }
+               // display matched percent
+               matchedtxt.classList.remove("hide");
+               var pct = 100 * count / maxwidth;
+               if (pct != 100) pct = pct.toFixed(1)
+               matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%";
+       }
+]]>
+</script>
+<rect x="0.0" y="0" width="1200.0" height="934.0" fill="url(#background)"  />
+<text id="title" x="600.00" y="24" >Flame Graph</text>
+<text id="details" x="10.00" y="917" > </text>
+<text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text>
+<text id="search" x="1090.00" y="24" >Search</text>
+<text id="ignorecase" x="1174.00" y="24" >ic</text>
+<text id="matched" x="1090.00" y="917" > </text>
+<g id="frames">
+<g >
+<title>[unknown] (140,334 samples, 0.09%)</title><rect x="311.8" y="469" 
width="1.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
+<text  x="314.78" y="479.5" ></text>
+</g>
+<g >
+<title>clang::Sema::CheckSingleAssignmentConstraints (259,475 samples, 
0.17%)</title><rect x="665.4" y="325" width="2.0" height="15.0" 
fill="rgb(254,226,54)" rx="2" ry="2" />
+<text  x="668.36" y="335.5" ></text>...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/126520
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][docs] Document how to use perf and uftrace to debug performance issues (PR #126520)

Reply via email to