llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) <details> <summary>Changes</summary> --- Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126520.diff 3 Files Affected: - (modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst (+91) - (added) clang/docs/analyzer/images/flamegraph.svg (+27641) - (added) clang/docs/analyzer/images/uftrace_detailed.png () ``````````diff diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst index 3ee6e117a846528..c33853aca76168d 100644 --- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst +++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst @@ -5,6 +5,9 @@ Performance Investigation Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer. A translation unit contains multiple entry points, each of which take multiple steps to analyze. +Performance analysis using ``-ftime-trace`` +=========================================== + You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point. You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL, or using `speedscope <https://speedscope.app>`_. @@ -45,3 +48,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size for a single entry point. You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point. + + +Performance analysis using ``perf`` +=================================== + +`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application. +It's easy to start profiling, you only have 2 prerequisites. +Build with ``-fno-omit-frame-pointer`` and debug info (``-g``). +You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo`` +along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``. +Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble. + +.. code-block:: bash + :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution. + + # -F: Sampling frequency, use `-F max` for maximal frequency + # -g: Enable call-graph recording for both kernel and user space + perf record -F 99 -g -- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \ + -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \ + -verify ./clang/test/Analysis/string.c + +Once you have the profile data, you can use it to produce a Flame graph. +A Flame graph is a visual representation of the stack frames of the samples. +Common stack frame prefixes are squashed together, making up a wider bar. +The wider the bar, the more time was spent under that particular stack frame, +giving a sense of how the overall execution time was spent. + +Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository, +as we will use some scripts from there to convert the ``perf`` samples into a Flame graph. +It's also useful to check out Brendan Gregg's (the author of FlameGraph) +`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_. + + +.. code-block:: bash + :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox. + + perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded + /path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg + firefox perf.svg + +.. image:: ../images/flamegraph.svg + + +Performance analysis using ``uftrace`` +====================================== + +`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data +that you could use to focus and drill down into the timeline of your application. +We will use it to generate Chromium trace JSON. +In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``. +In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``. +All functions are profiled due to static instrumentation. + +There is only one prerequisite to use this tool. +You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``. +This will make it run substantially slower but allows rich instrumentation. + +.. code-block:: bash + :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON. + + uftrace record clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \ + -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \ + -verify ./clang/test/Analysis/string.c + uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json + +.. image:: ../images/uftrace_detailed.png + +In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page +You can also see how deep function calls we may have due to AST visitors. + +Using different filters can reduce the number of functions to record. +For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation. + +Similar filters could be applied for dumping too. That way you can reuse the same (detailed) +recording to selectively focus on some special part using a refinement of the filter flags. +Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_, +thus it needs to be of a limited size. +In that case though, every dump operation would need to sieve through the whole recording if called repeatedly. + +If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts. +Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call. +To see what functions appear frequently in the trace, use this command: + +.. code-block:: bash + + cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50 + +``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``. diff --git a/clang/docs/analyzer/images/flamegraph.svg b/clang/docs/analyzer/images/flamegraph.svg new file mode 100644 index 000000000000000..d3c4a22c9ff536a --- /dev/null +++ b/clang/docs/analyzer/images/flamegraph.svg @@ -0,0 +1,27641 @@ +<?xml version="1.0" standalone="no"?> +<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> +<svg version="1.1" width="1200" height="934" onload="init(evt)" viewBox="0 0 1200 934" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> +<!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.html for examples. --> +<!-- NOTES: --> +<defs> + <linearGradient id="background" y1="0" y2="1" x1="0" x2="0" > + <stop stop-color="#eeeeee" offset="5%" /> + <stop stop-color="#eeeeb0" offset="95%" /> + </linearGradient> +</defs> +<style type="text/css"> + text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); } + #search, #ignorecase { opacity:0.1; cursor:pointer; } + #search:hover, #search.show, #ignorecase:hover, #ignorecase.show { opacity:1; } + #subtitle { text-anchor:middle; font-color:rgb(160,160,160); } + #title { text-anchor:middle; font-size:17px} + #unzoom { cursor:pointer; } + #frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; } + .hide { display:none; } + .parent { opacity:0.5; } +</style> +<script type="text/ecmascript"> +<![CDATA[ + "use strict"; + var details, searchbtn, unzoombtn, matchedtxt, svg, searching, currentSearchTerm, ignorecase, ignorecaseBtn; + function init(evt) { + details = document.getElementById("details").firstChild; + searchbtn = document.getElementById("search"); + ignorecaseBtn = document.getElementById("ignorecase"); + unzoombtn = document.getElementById("unzoom"); + matchedtxt = document.getElementById("matched"); + svg = document.getElementsByTagName("svg")[0]; + searching = 0; + currentSearchTerm = null; + + // use GET parameters to restore a flamegraphs state. + var params = get_params(); + if (params.x && params.y) + zoom(find_group(document.querySelector('[x="' + params.x + '"][y="' + params.y + '"]'))); + if (params.s) search(params.s); + } + + // event listeners + window.addEventListener("click", function(e) { + var target = find_group(e.target); + if (target) { + if (target.nodeName == "a") { + if (e.ctrlKey === false) return; + e.preventDefault(); + } + if (target.classList.contains("parent")) unzoom(true); + zoom(target); + if (!document.querySelector('.parent')) { + // we have basically done a clearzoom so clear the url + var params = get_params(); + if (params.x) delete params.x; + if (params.y) delete params.y; + history.replaceState(null, null, parse_params(params)); + unzoombtn.classList.add("hide"); + return; + } + + // set parameters for zoom state + var el = target.querySelector("rect"); + if (el && el.attributes && el.attributes.y && el.attributes._orig_x) { + var params = get_params() + params.x = el.attributes._orig_x.value; + params.y = el.attributes.y.value; + history.replaceState(null, null, parse_params(params)); + } + } + else if (e.target.id == "unzoom") clearzoom(); + else if (e.target.id == "search") search_prompt(); + else if (e.target.id == "ignorecase") toggle_ignorecase(); + }, false) + + // mouse-over for info + // show + window.addEventListener("mouseover", function(e) { + var target = find_group(e.target); + if (target) details.nodeValue = "Function: " + g_to_text(target); + }, false) + + // clear + window.addEventListener("mouseout", function(e) { + var target = find_group(e.target); + if (target) details.nodeValue = ' '; + }, false) + + // ctrl-F for search + // ctrl-I to toggle case-sensitive search + window.addEventListener("keydown",function (e) { + if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) { + e.preventDefault(); + search_prompt(); + } + else if (e.ctrlKey && e.keyCode === 73) { + e.preventDefault(); + toggle_ignorecase(); + } + }, false) + + // functions + function get_params() { + var params = {}; + var paramsarr = window.location.search.substr(1).split('&'); + for (var i = 0; i < paramsarr.length; ++i) { + var tmp = paramsarr[i].split("="); + if (!tmp[0] || !tmp[1]) continue; + params[tmp[0]] = decodeURIComponent(tmp[1]); + } + return params; + } + function parse_params(params) { + var uri = "?"; + for (var key in params) { + uri += key + '=' + encodeURIComponent(params[key]) + '&'; + } + if (uri.slice(-1) == "&") + uri = uri.substring(0, uri.length - 1); + if (uri == '?') + uri = window.location.href.split('?')[0]; + return uri; + } + function find_child(node, selector) { + var children = node.querySelectorAll(selector); + if (children.length) return children[0]; + } + function find_group(node) { + var parent = node.parentElement; + if (!parent) return; + if (parent.id == "frames") return node; + return find_group(parent); + } + function orig_save(e, attr, val) { + if (e.attributes["_orig_" + attr] != undefined) return; + if (e.attributes[attr] == undefined) return; + if (val == undefined) val = e.attributes[attr].value; + e.setAttribute("_orig_" + attr, val); + } + function orig_load(e, attr) { + if (e.attributes["_orig_"+attr] == undefined) return; + e.attributes[attr].value = e.attributes["_orig_" + attr].value; + e.removeAttribute("_orig_"+attr); + } + function g_to_text(e) { + var text = find_child(e, "title").firstChild.nodeValue; + return (text) + } + function g_to_func(e) { + var func = g_to_text(e); + // if there's any manipulation we want to do to the function + // name before it's searched, do it here before returning. + return (func); + } + function update_text(e) { + var r = find_child(e, "rect"); + var t = find_child(e, "text"); + var w = parseFloat(r.attributes.width.value) -3; + var txt = find_child(e, "title").textContent.replace(/\([^(]*\)$/,""); + t.attributes.x.value = parseFloat(r.attributes.x.value) + 3; + + // Smaller than this size won't fit anything + if (w < 2 * 12 * 0.59) { + t.textContent = ""; + return; + } + + t.textContent = txt; + var sl = t.getSubStringLength(0, txt.length); + // check if only whitespace or if we can fit the entire string into width w + if (/^ *$/.test(txt) || sl < w) + return; + + // this isn't perfect, but gives a good starting point + // and avoids calling getSubStringLength too often + var start = Math.floor((w/sl) * txt.length); + for (var x = start; x > 0; x = x-2) { + if (t.getSubStringLength(0, x + 2) <= w) { + t.textContent = txt.substring(0, x) + ".."; + return; + } + } + t.textContent = ""; + } + + // zoom + function zoom_reset(e) { + if (e.attributes != undefined) { + orig_load(e, "x"); + orig_load(e, "width"); + } + if (e.childNodes == undefined) return; + for (var i = 0, c = e.childNodes; i < c.length; i++) { + zoom_reset(c[i]); + } + } + function zoom_child(e, x, ratio) { + if (e.attributes != undefined) { + if (e.attributes.x != undefined) { + orig_save(e, "x"); + e.attributes.x.value = (parseFloat(e.attributes.x.value) - x - 10) * ratio + 10; + if (e.tagName == "text") + e.attributes.x.value = find_child(e.parentNode, "rect[x]").attributes.x.value + 3; + } + if (e.attributes.width != undefined) { + orig_save(e, "width"); + e.attributes.width.value = parseFloat(e.attributes.width.value) * ratio; + } + } + + if (e.childNodes == undefined) return; + for (var i = 0, c = e.childNodes; i < c.length; i++) { + zoom_child(c[i], x - 10, ratio); + } + } + function zoom_parent(e) { + if (e.attributes) { + if (e.attributes.x != undefined) { + orig_save(e, "x"); + e.attributes.x.value = 10; + } + if (e.attributes.width != undefined) { + orig_save(e, "width"); + e.attributes.width.value = parseInt(svg.width.baseVal.value) - (10 * 2); + } + } + if (e.childNodes == undefined) return; + for (var i = 0, c = e.childNodes; i < c.length; i++) { + zoom_parent(c[i]); + } + } + function zoom(node) { + var attr = find_child(node, "rect").attributes; + var width = parseFloat(attr.width.value); + var xmin = parseFloat(attr.x.value); + var xmax = parseFloat(xmin + width); + var ymin = parseFloat(attr.y.value); + var ratio = (svg.width.baseVal.value - 2 * 10) / width; + + // XXX: Workaround for JavaScript float issues (fix me) + var fudge = 0.0001; + + unzoombtn.classList.remove("hide"); + + var el = document.getElementById("frames").children; + for (var i = 0; i < el.length; i++) { + var e = el[i]; + var a = find_child(e, "rect").attributes; + var ex = parseFloat(a.x.value); + var ew = parseFloat(a.width.value); + var upstack; + // Is it an ancestor + if (0 == 0) { + upstack = parseFloat(a.y.value) > ymin; + } else { + upstack = parseFloat(a.y.value) < ymin; + } + if (upstack) { + // Direct ancestor + if (ex <= xmin && (ex+ew+fudge) >= xmax) { + e.classList.add("parent"); + zoom_parent(e); + update_text(e); + } + // not in current path + else + e.classList.add("hide"); + } + // Children maybe + else { + // no common path + if (ex < xmin || ex + fudge >= xmax) { + e.classList.add("hide"); + } + else { + zoom_child(e, xmin, ratio); + update_text(e); + } + } + } + search(); + } + function unzoom(dont_update_text) { + unzoombtn.classList.add("hide"); + var el = document.getElementById("frames").children; + for(var i = 0; i < el.length; i++) { + el[i].classList.remove("parent"); + el[i].classList.remove("hide"); + zoom_reset(el[i]); + if(!dont_update_text) update_text(el[i]); + } + search(); + } + function clearzoom() { + unzoom(); + + // remove zoom state + var params = get_params(); + if (params.x) delete params.x; + if (params.y) delete params.y; + history.replaceState(null, null, parse_params(params)); + } + + // search + function toggle_ignorecase() { + ignorecase = !ignorecase; + if (ignorecase) { + ignorecaseBtn.classList.add("show"); + } else { + ignorecaseBtn.classList.remove("show"); + } + reset_search(); + search(); + } + function reset_search() { + var el = document.querySelectorAll("#frames rect"); + for (var i = 0; i < el.length; i++) { + orig_load(el[i], "fill") + } + var params = get_params(); + delete params.s; + history.replaceState(null, null, parse_params(params)); + } + function search_prompt() { + if (!searching) { + var term = prompt("Enter a search term (regexp " + + "allowed, eg: ^ext4_)" + + (ignorecase ? ", ignoring case" : "") + + "\nPress Ctrl-i to toggle case sensitivity", ""); + if (term != null) search(term); + } else { + reset_search(); + searching = 0; + currentSearchTerm = null; + searchbtn.classList.remove("show"); + searchbtn.firstChild.nodeValue = "Search" + matchedtxt.classList.add("hide"); + matchedtxt.firstChild.nodeValue = "" + } + } + function search(term) { + if (term) currentSearchTerm = term; + + var re = new RegExp(currentSearchTerm, ignorecase ? 'i' : ''); + var el = document.getElementById("frames").children; + var matches = new Object(); + var maxwidth = 0; + for (var i = 0; i < el.length; i++) { + var e = el[i]; + var func = g_to_func(e); + var rect = find_child(e, "rect"); + if (func == null || rect == null) + continue; + + // Save max width. Only works as we have a root frame + var w = parseFloat(rect.attributes.width.value); + if (w > maxwidth) + maxwidth = w; + + if (func.match(re)) { + // highlight + var x = parseFloat(rect.attributes.x.value); + orig_save(rect, "fill"); + rect.attributes.fill.value = "rgb(230,0,230)"; + + // remember matches + if (matches[x] == undefined) { + matches[x] = w; + } else { + if (w > matches[x]) { + // overwrite with parent + matches[x] = w; + } + } + searching = 1; + } + } + if (!searching) + return; + var params = get_params(); + params.s = currentSearchTerm; + history.replaceState(null, null, parse_params(params)); + + searchbtn.classList.add("show"); + searchbtn.firstChild.nodeValue = "Reset Search"; + + // calculate percent matched, excluding vertical overlap + var count = 0; + var lastx = -1; + var lastw = 0; + var keys = Array(); + for (k in matches) { + if (matches.hasOwnProperty(k)) + keys.push(k); + } + // sort the matched frames by their x location + // ascending, then width descending + keys.sort(function(a, b){ + return a - b; + }); + // Step through frames saving only the biggest bottom-up frames + // thanks to the sort order. This relies on the tree property + // where children are always smaller than their parents. + var fudge = 0.0001; // JavaScript floating point + for (var k in keys) { + var x = parseFloat(keys[k]); + var w = matches[keys[k]]; + if (x >= lastx + lastw - fudge) { + count += w; + lastx = x; + lastw = w; + } + } + // display matched percent + matchedtxt.classList.remove("hide"); + var pct = 100 * count / maxwidth; + if (pct != 100) pct = pct.toFixed(1) + matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%"; + } +]]> +</script> +<rect x="0.0" y="0" width="1200.0" height="934.0" fill="url(#background)" /> +<text id="title" x="600.00" y="24" >Flame Graph</text> +<text id="details" x="10.00" y="917" > </text> +<text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text> +<text id="search" x="1090.00" y="24" >Search</text> +<text id="ignorecase" x="1174.00" y="24" >ic</text> +<text id="matched" x="1090.00" y="917" > </text> +<g id="frames"> +<g > +<title>[unknown] (140,334 samples, 0.09%)</title><rect x="311.8" y="469" width="1.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" /> +<text x="314.78" y="479.5" ></text> +</g> +<g > +<title>clang::Sema::CheckSingleAssignmentConstraints (259,475 samples, 0.17%)</title><rect x="665.4" y="325" width="2.0" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" /> +<text x="668.36" y="335.5" ></text>... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/126520 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits