On Thu, Mar 20, 2025, at 10:28 AM, Serhei Makarov wrote:
> On Tue, Dec 10, 2024, at 4:42 PM, Serhei Makarov wrote:
>> This email sketches an 'unwinder cache' interface for libdwfl, derived
>> from recent eu-stacktrace code [1] and based on Christian Hergert's
>> adaptation of eu-stacktrace as sysprof-live-unwinder [2]. The intent is
>> to remove the need for a lot of boilerplate code that would be
>> identical among profiling tools that use libdwfl to unwind perf_events
>> stack samples. But since this becomes a library design, it's in need of
>> feedback and bikeshedding.
> In advance of finishing up the Dwfl_Process_Tracker patchset
> (initial version currently under review),
> wanted to post an update summarizing the current api design.
Redid performance analysis based on the released code. I'm seeing a reduction
of sysprof-live-unwinder overhead from 7~8% to 3~6% (with the framepointer
version of sysprof providing a baseline of about 1.5%). So there is some
variance (the lower the overhead gets, the harder it is to keep conditions
identical, apparently), but the performance is moving along; the next
bottleneck to look at is dwfl_linux_proc_report. I may want to automate the
performance testing procedure to get fully exact numbers I'm comfortable with
reporting.
There is an issue with spontaneous exit of sysprofd (cleanly, with a "Stopping
RAPL monitor" message) that I'm trying to understand. Hypotheses -- may be a
polkit issue, or may be sysprof-live-unwinder not handling some error result
gracefully. (The eu-stacktrace tool + prototype sysprof patches don't exhibit
this behaviour.) I expect I'll need to make another revision of the
sysprof-live-unwinder patches at
https://git.sr.ht/~serhei/sysprof-experiments/log/serhei/live-unwinder
With the Elf caching in Dwfl_Process_Tracker, I counted (via attached simple
patch) how many times an Elf struct was retrieved from cache vs how many were
newly created. On a quick test with a swaywm system the 'created' number
stabilized at ~186 created Elf structs with the 'cached' number ~400 and rising
as I keep running the profiler. On gnome3, stabilizes at ~282 structs with the
'cached' number ~500 and rising. Obviously, this is not super meaningful, as
the number can be made arbitrarily good by running the profiler for longer
periods of time :p but it's worth verifying that the caching works.
All the best,
Serhei
diff --git a/libdwfl/dwfl_process_tracker_find_elf.c b/libdwfl/dwfl_process_tracker_find_elf.c
index 72621bb1..d966346f 100644
--- a/libdwfl/dwfl_process_tracker_find_elf.c
+++ b/libdwfl/dwfl_process_tracker_find_elf.c
@@ -38,6 +38,9 @@
#include "libdwflP.h"
+static int created_elf = 0;
+static int cached_elf = 0;
+
/* TODO: Consider making this a public api, dwfl_process_tracker_find_cached_elf. */
bool
find_cached_elf (Dwfl_Process_Tracker *tracker,
@@ -68,6 +71,8 @@ find_cached_elf (Dwfl_Process_Tracker *tracker,
*elfp = ent->elf;
*file_name = strdup(ent->module_name);
*fdp = ent->fd;
+ cached_elf ++;
+ fprintf(stderr, "= dwfl_process_tracker_find_elf retrieves CACHED (%d created / %d cached) name=%s fd=%d elfp=%p ref_count=%d\n", created_elf, cached_elf, ent->module_name, ent->fd, ent->elf, ent->elf->ref_count); /* DEBUG */
return true;
}
@@ -118,6 +123,8 @@ cache_elf (Dwfl_Process_Tracker *tracker,
ent->last_mtime = sb.st_mtime;
}
rwlock_unlock(tracker->elftab_lock);
+ created_elf ++;
+ fprintf(stderr, "+ dwfl_process_tracker_find_elf CREATES (%d created / %d cached) new name=%s file_name=%s fd=%d elfp=%p ref_count=%d\n", created_elf, cached_elf, ent->module_name, file_name, fd, elf, elf ? elf->ref_count : 0); /* DEBUG */
return true;
}