https://bugs.kde.org/show_bug.cgi?id=492835

            Bug ID: 492835
           Summary: No way to tell callgrind/cachegrind about just-in-time
                    compiled frames
    Classification: Developer tools
           Product: valgrind
           Version: 3.23.0
          Platform: unspecified
                OS: Linux
            Status: REPORTED
          Severity: wishlist
          Priority: NOR
         Component: callgrind
          Assignee: josef.weidendor...@gmx.de
          Reporter: timona...@gmail.com
  Target Milestone: ---

Created attachment 173455
  --> https://bugs.kde.org/attachment.cgi?id=173455&action=edit
callgrind.out before and after attempting to "augment" it, plus matching perf
map file.

SUMMARY

I'm working on MoarVM, a language runtime that includes a jit compiler.

When a program generates its own code, there's already multiple different ways
to tell a variety of developer tools what those memory regions are, and
whatever extra information you like along with it:

 * In order to unwind stack frames coming from jitted code, GDB lets you notify
it of new functions being created, you store whatever extra custom info you
need in client-program memory, and then load a .so into gdb itself as a "jit
reader". I believe you can then also add a boatload of other stuff if you like,
since much of the GDB API for blocks and symbols is usable.
     * MoarVM jit reader (not merged or part of any release yet):
https://github.com/MoarVM/MoarVM/commit/3c63afed7d524852aab9a9335b9d1089d2f5410b
     * gdb documentation about the jit reader:
https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-JIT-Debug-Info-Readers.html
 * In order to get samples merged into the frames they belong to when recording
with perf, you can write a perf-$PID.map to /tmp that assigns a name to a start
address + length tuple
     *
https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt
     * lots of VMs have some kind of flag or agent/plugin that creates the perf
maps, for example the perf-map-agent for java
     * When using just a perf map, the "annotate" function in `perf report`
will not work. This is a view that shows machine code and source code lines
along with counts of corresponding samples.
     * MoarVM looks at the env var MVM_JIT_PERF_MAP, if it's anything other
than empty it generates the file.
 * For more complex needs, your program can write out a "jitdump" file that is
then used with `perf inject` to augment a `perf.data` file with recordings.
`perf inject` can also be used to add buildid annotations in the right places
of a `perf.data` file.
     *
https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-inject.txt
     *
https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jitdump-specification.txt
     * I believe that jitdump is what makes it possible to "annotate" jitted
frames in the `perf report` UI.
 * libunwind has an interface where you can describe how registers relevant to
unwinding change at different positions for the instruction pointer:
     * https://www.nongnu.org/libunwind/man/libunwind-dynamic(3).html
 * surely there are other things like this that I haven't encountered yet

When recording the execution of MoarVM, the results from callgrind and
cachegrind (and probably other tools as well), when loaded into kcachegrind,
have just the memory address of the given jitted frames as their name, and
"(unknown)" as their location, and the "source code" tab shows "The function is
located in this ELF object:" "(unknown)", which makes sense.

callgrind_annotate simply shows these frames like "???:0x00000000090cb000
[???]"

I tried an approach similar to `perf inject` where I located the "definitions"
of strings in the fn category that match the address of each entry of the perf
map file I have, then later whenever i see the same fn or cfn number, i also
spit out a fl/cfi and ob/cob.

Unfortunately I had to do a lot of guesswork with how fl/fi, cfl/cfi, and such
really work, so the file I ended up with seems to have the "augmented" frames
kind-of split in half, if that makes sense; if i understand correctly, they
appear once with correct call chains leading up to them but no callees, and
once with no call chains leading up to them, but the list of callees appears
more sensible.

I have attached a zstd compressed tarball with the before and after "augmented"
callgrind.out files, as well as the perf map file I used to do the
augmentation. It is helpful to know that jitted frames in MoarVM are called
directly from MVM_jit_code_enter (not sure if there are exceptions, I don't
think there are.)

STEPS TO REPRODUCE
1. Install a rakudo package, for experimenting with the jit and perf map,
anything newer than ~6 years old should work.
2. Here's an invocation of callgrind + rakudo that gives you a perf map file as
well. If you don't have a /usr/share/dict/words, any file with a few tens of
thousands of lines should be more than enough to generate jitted frames and
some decent recording data.
    env MVM_JIT_PERF_MAP=1 valgrind --tool=callgrind --dump-instr=yes --
rakudo-m -e 'my %idx; for "/usr/share/dict/words".IO.lines { for .comb {
%idx{$_}++ } }; say %idx.sort.tail(10);'
3. Additionally, you can supply a path via the `MVM_JIT_DUMP_BYTECODE` env var
where moar will write the compiled native code (no headers or anything, just
the bytes that we jump into). There will be a subfolder with the PID in its
name.
4. Check the resulting callgrind.out.$PID file with callgrind_annotate and/or
kcachegrind.

OBSERVED RESULT
The resulting output contains many raw memory addresses with no indication
where they come from, instead of function names, source file paths, and line
numbers.

Additionally, the "source code" and "machine code" tabs in KCachegrind have no
way to find a place to look for details on these functions.

EXPECTED RESULT
I would hope that at least function name, file name and line number could
become available for jitted frames. The MoarVM JIT can also associate line
numbers with addresses in the jitted code, so it would be great if it was also
possible to make that visible in kcachegrind and callgrind_annotate.

Ideally, there would not be "yet another" way to make jitted frame information
useful in valgrind, compared to what gdb, perf, libunwind, and how-ever many
other projects have already built. If the interface is simple enough, for
example just a few valgrind client commands like you can already use to teach
memcheck about custom memory allocators, that make information available to
valgrind, that would be okay if it doesn't fully match any existing interface.
It could be made partially compatible with gdb's jit reader API perhaps.

I would very much prefer not having to create a real, full, and proper ELF
structure, whether in memory or on disk.

Having the jitted frames immediately work instead of first having to run some
kind of script or tool to combine callgrind.out and whatever has the needed
extra information would be much preferred, if possible.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to