Hi Eric, thanks for the feedback > Something I think would be good would be to compare/contrast against rr as an "exploring alternatives" section of the document.
I'll include that. I've done some comparative research on rr and I think I can provide valuable input. > I think the document should also be made available/adapted to be part of the documentation on "why lldb is implementing this feature/what it can be used for/why". I think this information is scattered throughout the document, but I'll make sure to answer this in one of the first paragraphs. Thanks! - Walter Il giorno ven 18 set 2020 alle ore 19:58 Eric Christopher < echri...@gmail.com> ha scritto: > Hi Walter, > > I've only done a brief scan of the document but, in general, I'm favorable > of the goals, aim, and approach. Something I think would be good would be > to compare/contrast against rr as an "exploring alternatives" section of > the document. I think the document should also be made available/adapted to > be part of the documentation on "why lldb is implementing this feature/what > it can be used for/why". > > Thanks so much for starting this and looking forward to the work and > collaboration. > > -eric > > On Thu, Sep 17, 2020 at 8:28 PM Walter via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> Hi all, >> >> >> >> Here I propose, along with Greg Clayton, Processor Trace support for LLDB. >> I’m attaching a link to the document that contains this proposal if that’s >> easier to read for you: >> https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f >> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1cOVTGp1sL-5FHBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI_edit-23heading-3Dh.t5mblb9ugv8f&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=o6vqoYYbn-Tz_d34hoLJvWhEnnhracOO6yDsMzq8wR0&e=>. >> Please make any comments in this mail list. >> >> >> >> If you want to quickly know what Processor Trace can do, you can read this >> https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__easyperf.net_blog_2019_08_23_Intel-2DProcessor-2DTrace&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=iaErHaf8byXlZb1YFUk0BpQ-duMhNouUUMyktLm3soQ&e=>. >> >> >> >> Any comments are appreciated, especially the ones regarding the commands the >> user will interact with. >> >> >> >> Thanks, >> >> Walter Erquinigo. >> >> >> >> >> >> # RFC: Processor Trace Support in LLDB >> >> >> >> >> >> # What is processor tracing? >> >> >> >> Processor tracing works by capturing information about the execution of a >> process so that the control flow of the program can be reconstructed later. >> Implementations of this are Intel Processor Trace for X86, x86_64 >> ([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)) >> and ARM CoreSight for some ARM devices >> ([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)). >> >> >> >> As a clarifying example, with these technologies it’s possible to trace all >> the threads of a process, and after the process has finished, reconstruct >> every single instruction address each thread has executed. This could >> include some additional information like timestamps, async CPU events, >> kernel instructions, bus clock ratio changes, etc. On the other hand, memory >> and registers are not traced as a way to limit the size of the trace. >> >> >> >> >> >> # Intel Processor Trace as the first implementation >> >> >> >> We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so >> that in the future similar technologies can be onboarded in LLDB. >> >> >> >> Intel PT has the following features: >> >> >> >> >> >> >> >> * Control flow tracing in a highly encoded format >> >> * 3% to 5% slowdown when capturing >> >> * No memory nor registers captured >> >> * Kernel tracing support >> >> * Timestamps of branches are produced, which can be used for profiling >> >> * Adjustable size of trace buffer >> >> * Supported on most Intel CPUs since 2015 >> >> * X86 and x86_64 only >> >> * Official support only on Linux >> >> * Basic support on Windows >> >> * Decoding/analysis can be done on any operating system >> >> >> >> A very nice introduction to Intel PT can be found >> [https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html) >> and >> [https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace). >> Totally recommended to fully grasp the impact of this project. >> >> >> >> More technical details are in >> [https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt). >> >> >> >> Even more technical details are in the processor manual >> [https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf) >> >> >> >> >> >> # Basic Definitions >> >> >> >> >> >> >> >> * Trace file: A trace file basically contains the information of the >> target addresses of each branch or jump within the program execution in a >> highly encoded format. >> >> * Capturing: The act of tracing a process and producing a trace file. >> >> * Decoding: Decoding outputs a sequential list of instructions given a >> trace file and the images of a process. Decoding is generally an offline >> step as it’s expensive. >> >> * Trace buffer: In order to limit the size of the trace, an on-memory >> circular buffer can be used, keeping the most recent branching information. >> The trace file is a snapshot of this. >> >> * Gap: Sporadically some branching information can be lost or be >> impossible to decode, which creates a gap in the reconstructed control flow. >> >> >> >> >> >> # New LLDB features >> >> >> >> >> >> >> >> * Loading traces: We want to load traces potentially from other computers, >> and have LLDB symbolicating it. A flow like the following should be possible >> \ >> >> >> >> >> >> ``` >> >> $ trace load /path/to/trace >> >> $ trace dump --instructions >> >> pid: '1234', tid: '1981309' >> >> a.out`main >> >> [57] 0x400549 <+13>: movl %eax, -0x4(%rbp) >> >> a.out`bar() >> >> [56] 0x40053b <+46>: retq >> >> [55] 0x40053a <+45>: leave >> >> [54] 0x400537 <+42>: movl -0x4(%rbp), %eax >> >> [53] 0x400535 <+40>: jle 0x400525 ; <+24> at >> main.cpp:7 >> >> [52] 0x400531 <+36>: cmpl $0x3, -0x8(%rbp) >> >> [51] 0x40052d <+32>: addl $0x1, -0x8(%rbp) >> >> [50] 0x40052a <+29>: addl %eax, -0x4(%rbp) >> >> a.out`foo() >> >> [49] 0x400567 <+15>: retq >> >> [48] 0x400566 <+14>: popq %rbp >> >> [47] 0x400563 <+11>: movl -0x4(%rbp), %eax >> >> [46] 0x40055c <+4>: movl $0x2a, -0x4(%rbp) >> >> >> >> ... >> >> [1] 0x400559 <+1>: movq %rsp, %rbp >> >> [0] 0x400558 <+0>: pushq %rbp >> >> >> >> >> >> // Format: >> >> ``` >> >> >> >> >> >> >> >> ` // [instruction index] <instruction disassembly> \ >> >> `Notice the resemblance to loading a core file, but in this case we can get >> the control flow, printed in reverse order in this example. >> >> >> >> >> >> >> >> * Decoding: LLDB can use libipt >> ([https://github.com/intel/libipt](https://github.com/intel/libipt)), which >> is the low level Intel PT decoding library, to convert trace files into >> instructions. >> >> * Showing instructions: LLDB can output the list of instructions of the >> control flow, as shown above >> >> * Showing function calls: Similarly, LLDB can print a hierarchical view of >> the function calls. A flow like this should be possible: \ >> >> >> >> >> >> ``` >> >> $ trace load /path/to/trace >> >> $ trace dump --function-calls >> >> pid: '1234', tid: '1981309' >> >> [50] a.out`bar() 0x40052a >> >> [45] a.out`zaz() 0x400558 >> >> [40] a.out`baz() 0x400559 >> >> [30] a.out`foo() 0x400567 >> >> ``` >> >> >> >> >> >> >> >> ` [0] a.out`main 0x400000 \ >> >> \ >> >> `This functionality allows LLDB to reconstruct the call stack at any point >> and potentially do reverse debugging. >> >> >> >> * Capturing: LLDB can also do the Intel PT capturing of a live process, so >> that at any stop the user can do reverse stepping or simply inspect the >> trace. A possible flow is: >> >> >> >> ``` >> >> $ <stopped at main> >> >> $ b main.cpp:50 >> >> $ trace start intel-pt // this initiates the tracing >> >> $ continue >> >> $ <stopped at main.cpp:50> >> >> $ trace dump --instructions >> >> pid: '1234', tid: '1981309' >> >> a.out`main >> >> [57] 0x400549 <+13>: movl %eax, -0x4(%rbp) >> >> a.out`bar() >> >> [56] 0x40053b <+46>: retq >> >> [55] 0x40053a <+45>: leave >> >> ``` >> >> >> >> >> >> >> >> Displaying time information: If the trace contains timing information, >> we could also display it along with each instruction, e.g. >> >> >> >> >> >> ``` >> >> a.out`bar() >> >> [56: 1600284226]: 0x40053b <+46>: retq >> >> ... >> >> [4: 1600284200]: 0x40053a <+45>: leave >> >> // Format: >> >> // [instruction index: unix timestamp] <instruction disassembly> >> >> ``` >> >> >> >> >> >> >> >> Furthermore, we could display the time spent in each function. >> >> >> >> >> >> >> >> # Future LLDB features >> >> >> >> >> >> >> >> * Reverse Stepping: With the hierarchical reconstruction of the function >> calls, along with the individual instructions, LLDB can offer reverse >> stepping. Operations like reverse-next, reverse-step-out, reverse-continue >> could work by traversing the trace. We plan to work on this once the >> features presented above are in place. >> >> * Trace-based profiling >> >> * SB API of the mentioned features >> >> >> >> >> >> # Why is this useful? >> >> >> >> >> >> >> >> * Bug root-causing: >> >> * For example, a crash in a production Release build ends up being >> analyzed with logs, a coredump, and a stack trace. Logs are not >> comprehensive, and a stack trace only contains the final state of the >> program. Providing the user with the control flow of the last milliseconds >> gives a tremendous amount of information that is game-changing in >> root-causing issues. It could be said that the user goes from a single stack >> trace to a list of stack traces. >> >> * Reverse stepping enables more efficient debugging, as it reduces the >> number of iterations to efficiently root-cause bugs. More often than not, >> reproducing a bug takes a considerable amount of time, and the user needs to >> reproduce it several times until the correct breakpoints are hit. This takes >> a considerable amount of time. Giving the user the information of what has >> been executed so far can help them figuring out where’s the location to >> place a breakpoint, or to very easily figure out what went wrong. >> >> * Low cost: unlike other similar technologies, Intel PT has an almost >> negligible performance cost regardless of whether the build is optimized or >> not, making it appealing to a wide range of scenarios. >> >> * This infrastructure can be used for enabling other tools like >> non-sample-based profilers with instruction-level accuracy, security >> analyzers that check if certain memory regions are executed, and trace >> comparators, which could find bugs by comparing similar traces. >> >> >> >> >> >> # Goals of this document: >> >> >> >> >> >> >> >> * Gather feedback on the basic Trace implementation, which would include >> the following basic operations: loading, decoding, and dumping. >> >> * Create awareness about this work. >> >> * Get a green light on the current set of patches implementing this >> feature starting with https://reviews.llvm.org/D85705. >> >> >> >> >> >> # Non-Goals: >> >> >> >> >> >> >> >> * Discuss how reverse-stepping will be implemented. This can be left for >> another discussion. Once the Trace architecture is in place and robust, >> reverse-stepping can then be discussed, as it’s a more controversial change >> than this one. >> >> * Explain thoroughly Intel PT. >> >> >> >> >> >> # Existing Tool Support >> >> >> >> >> >> >> >> * GDB has a basic implementation of the features above >> ([https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html](https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html)) >> and some ideas are taken from there. >> >> * Perf is a standalone tool that can do capturing and decoding. >> >> * The Linux kernel has full support for doing capturing at thread, logical >> cpu or cgroup level. >> >> * Intel developed a basic version of Intel PT support in LLDB as an >> external plugin. >> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674), >> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b). >> >> >> >> >> >> # New Trace Commands >> >> >> >> Based on this patch >> [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705), there >> would be a common Trace class along with plug-in implementations. >> >> >> >> >> >> ## Trace loading >> >> >> >> >> >> ### $ trace load /path/to/trace/settings/file.json >> >> >> >> As decoding a trace requires the images of the object files, the trace files >> and some CPU information, it’s convenient to have a JSON file that describes >> an entire trace session. The following JSON schema could be used. >> >> >> >> >> >> ``` >> >> { >> >> "trace": { >> >> … // plug-in specific information >> >> }, >> >> "processes": [ // process information common to all trace plug-ins >> >> { >> >> "pid": integer, >> >> "triple": string, // llvm-triple >> >> "threads": [ >> >> { >> >> "tid": integer, >> >> "traceFile": string >> >> } >> >> ], >> >> "modules": [ >> >> { >> >> "systemPath": string, // original path of the module at runtime >> >> "file"?: string, // copy of the file if not available at >> "systemPath" >> >> "loadAddress": string, // string address in hex or decimal form >> >> "uuid"?: string, >> >> } >> >> ] >> >> } >> >> ] >> >> } >> >> // Notes: >> >> // All paths are either absolute or relative to the settings file. >> >> ``` >> >> >> >> >> >> **Corefiles:** >> >> >> >> We plan to extend this schema to support corefiles, but we would leave it >> out of this discussion, as can be easily seen as an extension of this basic >> schema. >> >> >> >> **Implementation details:** >> >> >> >> To make our first implementation easier, we’ll ask for an individual trace >> file per thread. This is the simpler collection mode for Intel PT. >> >> >> >> The entire json file will be translated into a Trace object, which contains >> the trace information of each thread and process in it. >> >> >> >> Each process in the json file will be represented as a new Target. >> Similarly, threads and modules for each target will be created following the >> json file. This is very similar to what loading a minidump or coredump does. >> >> >> >> Each Target will be associated with a Trace, and multiple targets can share >> the same Trace. The contract is that Trace is assumed to end at the current >> PC of each thread of the target. >> >> >> >> >> >> ### $ trace schema <plug-in> >> >> >> >> This command prints the JSON schema of the trace settings file for the >> provided plug-in. It would output something similar to this >> >> >> >> >> >> ``` >> >> { >> >> "trace": { >> >> "type": "intel-pt", >> >> "pt_cpu": { >> >> "vendor": "intel" | "unknown", >> >> "family": integer, >> >> "model": integer, >> >> "stepping": integer >> >> } >> >> }, >> >> "processes": [ >> >> { >> >> "pid": integer, >> >> "triple": string, // llvm-triple >> >> "threads": [ >> >> { >> >> "tid": integer, >> >> "traceFile": string >> >> } >> >> ], >> >> "modules": [ >> >> { >> >> "systemPath": string, // original path of the module at runtime >> >> "file"?: string, // copy of the file if not available at >> "systemPath" >> >> "loadAddress": string, // string address in hex or decimal form >> >> "uuid"?: string, >> >> } >> >> ] >> >> } >> >> ] >> >> } >> >> // Notes: >> >> // All paths are either absolute or relative to the settings file. >> >> ``` >> >> >> >> >> >> >> >> ### $ trace dump [--verbose] [-t tid1] [-t tid2] ... >> >> >> >> Print the trace information corresponding to the provided thread ids of the >> currently selected target, which would mainly include the same information >> as the trace settings file. If no tid is provided, the currently selected >> thread is used. This would be useful for debugging. The information would be >> like >> >> >> >> Modules: >> >> >> >> <module info like systemPath, file, load address, uuid, size> >> >> >> >> Threads: >> >> >> >> <thread info like location of trace file, number of instructions (if >> already decoded), number of function calls (if already decoded)> >> >> >> >> If <--verbose> is passed, the original settings.json file is printed as >> well. >> >> >> >> >> >> ## Decoder-based commands >> >> >> >> The following commands require decoding the trace and are of the form. >> “trace dump <action> [-t <tid>]”. If tids are not specified, then the >> current thread or the current target will be used. >> >> >> >> >> >> ### $ trace dump --instructions [-t <tid>] [-c <count> = 10] [-o >> <offset> = 0] >> >> >> >> This command would print the last <count> instructions starting at the >> given offset from the last instruction in the trace. The output would be >> similar to that of the “disassembly” command and would include timing >> information if available. >> >> >> >> >> >> ``` >> >> $ trace dump --instructions -c 5 >> >> pid: '1234', tid: '1981309' >> >> a.out`main >> >> [57] 0x400549 <+13>: movl %eax, -0x4(%rbp) >> >> a.out`bar() >> >> [56] 0x40053b <+46>: retq >> >> [55] 0x40053a <+45>: leave >> >> [54] error -13. 'no memory mapped at this address' >> >> a.out`foo() >> >> [53] 0x400567 <+15>: retq >> >> ``` >> >> >> >> >> >> Repeating the command would continue printing where it was left off in the >> last run. >> >> >> >> **Implementation details:** >> >> >> >> Each instruction output by the decoder is either an actual instruction or an >> error. An error can be caused due to a collection error (e.g. internal CPU >> buffer overflow error) or a decoding error (e.g. the image of an object file >> is missing while decoding). These errors represent gaps in the trace and the >> user should know about them, so we print them accordingly in this dump. >> >> >> >> Each instruction (including errors) has an index in the decoded trace, and >> serves as a checkpoint. >> >> >> >> >> >> ### $ trace dump --function-calls [-t <tid>] [-c <count> = 10] [-o >> <offset> = 0] [--flat] >> >> >> >> This command would print the hierarchical list of function calls. Similar to >> the “--instructions” command, it would show the last <count> function >> calls with the given offset from the last instructions. Timing information >> would be included if available. >> >> >> >> >> >> ``` >> >> $ trace dump --function-calls >> >> pid: '1234', tid: '1981309' >> >> [50] a.out`bar() 0x40052a >> >> [45] a.out`zaz() 0x400558 >> >> [40] a.out`baz() 0x400559 >> >> [30] a.out`foo() 0x400567 >> >> [0] a.out`main 0x400000 >> >> ``` >> >> >> >> >> >> Repeating the command would continue printing where it was left off in the >> last run. >> >> >> >> If <--flat> is passed, then instead of a hierarchical view, a flat list >> would be produced. >> >> >> >> >> >> ## Capturing command >> >> >> >> >> >> ### $ trace start <plugin_name> [-t <tid>] [--all] [-b >> <buffer_size_in_KB>] >> >> >> >> This command will start tracing the given thread of the currently selected >> target, or all the threads of that target if “--all” is passed. If “--all” >> is passed, any thread created after this command will also be traced >> automatically. >> >> >> >> Besides, the optional -b parameter can define the size of each trace buffer >> to be created. I haven’t yet decided a default one, but 1M might be >> acceptable, as it traces around 1 million instructions on average according >> to Intel, and that’s more than enough for a useful analysis. >> >> >> >> For an initial implementation, the plugin_name parameter will be required >> (e.g. intel-pt). Later a more automated mechanism for finding the right >> plugin can be implemented. >> >> >> >> **Implementation notes:** >> >> >> >> There’s already a basic implementation in lldb as an external plugin. It’s >> in >> [https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/](https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/) >> created by >> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b). >> It hasn’t received much attention and has been mostly unmaintained since it >> was created. It’s already capable of tracing a given thread and collecting >> the trace buffer. We plan to reuse that logic, which is already working. >> >> >> >> A Trace object will be created and will be associated with the current >> Target. >> >> >> >> Any interaction with trace, like dumping instructions, will trigger a fetch >> of the most recent trace buffer, unless it hasn’t changed. >> >> >> >> When multiple threads are traced, each one will have its own trace buffer, >> as sharing one buffer in multiple threads requires knowing when each context >> switch happened so that the decoded trace can be split correctly among >> threads. This is beyond the scope of the initial version of this project. >> >> >> >> >> >> ### $ trace save /path/to/file.json [--copy-images] >> >> >> >> This creates a bundle trace with settings saved in the given json file for >> the current process. By default, it doesn’t create any copy of the images >> loaded on the process, unless the “--copy-images” parameter is specified. >> That parameter is useful for analyzing the trace in a machine other than >> where it was captured. >> >> >> >> >> >> # Remote Protocol Changes >> >> >> >> No remote protocol changes are required, as >> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674) and >> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b) >> already created them some years ago. >> >> >> >> >> >> # Build Requirements >> >> >> >> In order to build LLDB with this support, it has to be linked with a build >> of libipt >> [https://github.com/intel/libipt](https://github.com/intel/libipt), which is >> the decoder. >> >> >> >> >> >> # Operating System Requirements for Collection/Tracing >> >> >> >> Collection can only be done on linux if the file >> /sys/bus/event_source/devices/intel_pt/type is defined. The logic gating >> this feature is already checked in and defined in >> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674). >> >> >> >> >> >> # Testing >> >> >> >> It’s fortunately straightforward to test this feature. It’s possible to >> capture traces with perf or with the future “trace start” / ”trace save” >> commands and create trace bundles with their corresponding settings .json >> file. Analyzing those traces should give the same results on any machine, >> making testing deterministic. >> [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705) and >> descendents already implement some deterministic tests. >> >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> > -- - Walter Erquínigo Pezo
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev