On Wed, Apr 22, 2026 at 4:09 PM Kugan Vivekanandarajah via Gcc
<[email protected]> wrote:
>
> Hi,
>
> Building on our discussions from the last Cauldron, we propose creating a 
> native, simplified AutoFDO tool for GCC to replace our current reliance on 
> external Google tools which is not actively maintained. I'll follow up with a 
> detailed design document as soon as we have a consensus on the proposal.
>
> Thanks,
> Kugan
>
> Summary
> =======
>
> We propose a standalone, minimal tool for generating AutoFDO profiles
> that can be consumed by the GCC AutoFDO toolchain, with the goal of
> integrating it into the GCC repository. The tool would
> support: (1) offline read existing perf.data (single-process or
> system-wide) and produce a profile for a target binary; (2) direct
> attach to a process via the PMU (LBR or BRBE / SPE), bypassing perf
> record and building the profile from the live sample stream; (3)
> system-wide read perf.data from system-wide collection (e.g. perf
> record -a), filter samples by the target application, and generate
> gcov/profile for that application. This keeps the design simple,
> dependencies minimal, and the tool easier to maintain in step with GCC.
>
> Motivation
> ==========
>
> - Current AutoFDO tools (e.g. from Google) are not widely used with
> GCC. LLVM has a similar profile-creation tool integrated with the
> compiler. A tightly coupled tool for GCC would allow for easy
> development and upgradation.
>
> - A lightweight tool that generates AutoFDO profiles for the GCC
> AutoFDO toolchain (with minimal perf parsing and minimal DWARF) can
> be memory efficient.
>
> - An optional mode to pipe PMU data directly via perf_event_open (LBR or
> BRBE, and SPE) makes the tool more memory efficient.
>
> Goals
> =====
>
> 1. Simplicity - One job: turn samples (from file or live) into
> AutoFDO profiles that can be consumed by the GCC AutoFDO toolchain.
>
> 2. Minimal dependencies -  Dependent only on libraries such as libdwarf
> for DWARF parsing (no large frameworks).

Could we reuse libbacktrace's code here? Instead of requiring libdwarf
and libelf as a dependency?
Or maybe libiberty/simple-object code?

libbacktrace/elf.c is copyrighted by the FSF even. I can even see this
tool being supported on Mac OS (though I don't know if there is a perf
like tool there).

LTO used to depend on libelf but that was removed years ago in favor
of using simple-object.
libbacktrace will be updated to support the newer dwarf standards as
they are added to GCC/gas especially when it comes to lines support.
Depending on the version of libdwarf has one extra issue of needing
the version that support dwarf5 (and soon 6).

Thanks,
Andrea

>
> 3. Input modes -  Offline (perf.data, single-process or system-wide),
> direct (tool runs the workload and reads LBR or BRBE (or SPE) via
> the PMU, bypassing perf record), and system-wide (perf.data from
> system-wide collection, then generate gcov for a chosen application).
>
> 4. Easier maintenance Ideally part of or released with GCC for fast
> iteration when profile format or DWARF expectations change.
>
> Requirements
> ============
>
> - Read perf.data and parse branch stack (LBR or BRBE) records to obtain
> (address, count) for the target binary; support SPE (ARM Statistical
> Profiling Extension) as an extension point (parse SPE/AUX records when
> present, stub or full implementation).
>
> - Parse MMAP2 (and MMAP) records to map runtime addresses to the
> profiled binary and file offsets. Support system-wide perf.data:
> filter samples by target binary (using MMAP2/COMM/pid) and produce
> gcov (or AutoFDO profile) for that application only.
>
> - Direct mode: run a user-supplied command, attach via perf_event_open,
> read LBR or BRBE (or SPE) from the kernel ring buffer, and parse MMAP2/MMAP
> from the same stream; produce the same profile format as offline mode
> without writing perf.data.
>
> - Use a minimal DWARF subset to map instruction addresses to (source
> file, line, discriminator) for the target binary (e.g. line table,
> address ranges, minimal subprogram info).
>
> - Emit profile output in the format consumed by the GCC AutoFDO
> toolchain (e.g. gcov-style or the format used by -fauto-profile).
>
> - Portability: support Linux (perf_event_open, LBR or BRBE, SPE); other
> hosts/PMUs can be added later without changing the core design.
>
> Use Cases
> =========
>
> - Offline from existing perf.data: user has perf.data from a
> single-process run; tool produces profile for the target binary.
>
> - Direct: one-shot "run and profile": user runs the tool with a
> command; tool executes it, attaches via PMU, collects samples (no
> perf.data file).
>
> - System-wide: user has perf.data from "perf record -a"; tool filters
> by target binary (-b <binary>) and produces profile for that
> application only.
>
> - CI / automated builds: script runs perf record then tool, or tool in
> direct mode, or system-wide perf then tool with -b <binary>.
>
> Dependencies
> ============
>
> The tool is dependent only on libraries such as libdwarf for DWARF
> parsing (and libelf as typically required by libdwarf for ELF access).
> Perf/PMU data is read via standard system interfaces (e.g. perf_event_open
> for direct mode; perf.data file for offline/system-wide). No dependency
> on the perf userspace tool for direct mode; for offline mode, input is
> a perf.data file (produced by perf record or any writer of that format).
>
> Scope of the Tool
> =================
>
> Parse only branch stack (LBR or BRBE), MMAP2 (and MMAP), and optionally
> SPE. Map addresses to source (file, line, discriminator) via a minimal
> DWARF subset (line table, address ranges; libdwarf sufficient). Output
> is the profile format consumed by the GCC AutoFDO toolchain. Offline:
> read perf.data (single-process or system-wide; if system-wide, filter by
> -b <binary>). Direct: run user command, attach via perf_event_open,
> read from kernel ring buffer (no perf.data).
>
> Benefits
> ========
>
> Small codebase and minimal dependencies ease review. In-tree with GCC
> allows fast iteration when profile format or DWARF expectations change.
> Single workflow supports offline, direct, and system-wide use.
>
> Integration
> ===========
>
> The tool will be kept as part of the GCC repository (e.g. contrib/ or a
> dedicated directory), built and installed with GCC, so it stays in step
> with the compiler and profile format.
>
> Technical Outline
> =================
>
> Input: perf.data (offline or system-wide) or live via perf_event_open
> (direct). Processing: (address, count) from LBR/BRBE/SPE; MMAP2 for
> address->binary and filtering by target when system-wide; minimal DWARF
> for address->(file, line, discriminator); aggregate for GCC AutoFDO
> toolchain. Output: profile in toolchain format.
>
> Extensions
> ==========
>
> Add support for gathering branch profiles and memory profile as an
> extension to the current gcov format.
>
>

Reply via email to