On Wed, Apr 22, 2026 at 4:09 PM Kugan Vivekanandarajah via Gcc <[email protected]> wrote: > > Hi, > > Building on our discussions from the last Cauldron, we propose creating a > native, simplified AutoFDO tool for GCC to replace our current reliance on > external Google tools which is not actively maintained. I'll follow up with a > detailed design document as soon as we have a consensus on the proposal. > > Thanks, > Kugan > > Summary > ======= > > We propose a standalone, minimal tool for generating AutoFDO profiles > that can be consumed by the GCC AutoFDO toolchain, with the goal of > integrating it into the GCC repository. The tool would > support: (1) offline read existing perf.data (single-process or > system-wide) and produce a profile for a target binary; (2) direct > attach to a process via the PMU (LBR or BRBE / SPE), bypassing perf > record and building the profile from the live sample stream; (3) > system-wide read perf.data from system-wide collection (e.g. perf > record -a), filter samples by the target application, and generate > gcov/profile for that application. This keeps the design simple, > dependencies minimal, and the tool easier to maintain in step with GCC. > > Motivation > ========== > > - Current AutoFDO tools (e.g. from Google) are not widely used with > GCC. LLVM has a similar profile-creation tool integrated with the > compiler. A tightly coupled tool for GCC would allow for easy > development and upgradation. > > - A lightweight tool that generates AutoFDO profiles for the GCC > AutoFDO toolchain (with minimal perf parsing and minimal DWARF) can > be memory efficient. > > - An optional mode to pipe PMU data directly via perf_event_open (LBR or > BRBE, and SPE) makes the tool more memory efficient. > > Goals > ===== > > 1. Simplicity - One job: turn samples (from file or live) into > AutoFDO profiles that can be consumed by the GCC AutoFDO toolchain. > > 2. Minimal dependencies - Dependent only on libraries such as libdwarf > for DWARF parsing (no large frameworks).
Could we reuse libbacktrace's code here? Instead of requiring libdwarf and libelf as a dependency? Or maybe libiberty/simple-object code? libbacktrace/elf.c is copyrighted by the FSF even. I can even see this tool being supported on Mac OS (though I don't know if there is a perf like tool there). LTO used to depend on libelf but that was removed years ago in favor of using simple-object. libbacktrace will be updated to support the newer dwarf standards as they are added to GCC/gas especially when it comes to lines support. Depending on the version of libdwarf has one extra issue of needing the version that support dwarf5 (and soon 6). Thanks, Andrea > > 3. Input modes - Offline (perf.data, single-process or system-wide), > direct (tool runs the workload and reads LBR or BRBE (or SPE) via > the PMU, bypassing perf record), and system-wide (perf.data from > system-wide collection, then generate gcov for a chosen application). > > 4. Easier maintenance Ideally part of or released with GCC for fast > iteration when profile format or DWARF expectations change. > > Requirements > ============ > > - Read perf.data and parse branch stack (LBR or BRBE) records to obtain > (address, count) for the target binary; support SPE (ARM Statistical > Profiling Extension) as an extension point (parse SPE/AUX records when > present, stub or full implementation). > > - Parse MMAP2 (and MMAP) records to map runtime addresses to the > profiled binary and file offsets. Support system-wide perf.data: > filter samples by target binary (using MMAP2/COMM/pid) and produce > gcov (or AutoFDO profile) for that application only. > > - Direct mode: run a user-supplied command, attach via perf_event_open, > read LBR or BRBE (or SPE) from the kernel ring buffer, and parse MMAP2/MMAP > from the same stream; produce the same profile format as offline mode > without writing perf.data. > > - Use a minimal DWARF subset to map instruction addresses to (source > file, line, discriminator) for the target binary (e.g. line table, > address ranges, minimal subprogram info). > > - Emit profile output in the format consumed by the GCC AutoFDO > toolchain (e.g. gcov-style or the format used by -fauto-profile). > > - Portability: support Linux (perf_event_open, LBR or BRBE, SPE); other > hosts/PMUs can be added later without changing the core design. > > Use Cases > ========= > > - Offline from existing perf.data: user has perf.data from a > single-process run; tool produces profile for the target binary. > > - Direct: one-shot "run and profile": user runs the tool with a > command; tool executes it, attaches via PMU, collects samples (no > perf.data file). > > - System-wide: user has perf.data from "perf record -a"; tool filters > by target binary (-b <binary>) and produces profile for that > application only. > > - CI / automated builds: script runs perf record then tool, or tool in > direct mode, or system-wide perf then tool with -b <binary>. > > Dependencies > ============ > > The tool is dependent only on libraries such as libdwarf for DWARF > parsing (and libelf as typically required by libdwarf for ELF access). > Perf/PMU data is read via standard system interfaces (e.g. perf_event_open > for direct mode; perf.data file for offline/system-wide). No dependency > on the perf userspace tool for direct mode; for offline mode, input is > a perf.data file (produced by perf record or any writer of that format). > > Scope of the Tool > ================= > > Parse only branch stack (LBR or BRBE), MMAP2 (and MMAP), and optionally > SPE. Map addresses to source (file, line, discriminator) via a minimal > DWARF subset (line table, address ranges; libdwarf sufficient). Output > is the profile format consumed by the GCC AutoFDO toolchain. Offline: > read perf.data (single-process or system-wide; if system-wide, filter by > -b <binary>). Direct: run user command, attach via perf_event_open, > read from kernel ring buffer (no perf.data). > > Benefits > ======== > > Small codebase and minimal dependencies ease review. In-tree with GCC > allows fast iteration when profile format or DWARF expectations change. > Single workflow supports offline, direct, and system-wide use. > > Integration > =========== > > The tool will be kept as part of the GCC repository (e.g. contrib/ or a > dedicated directory), built and installed with GCC, so it stays in step > with the compiler and profile format. > > Technical Outline > ================= > > Input: perf.data (offline or system-wide) or live via perf_event_open > (direct). Processing: (address, count) from LBR/BRBE/SPE; MMAP2 for > address->binary and filtering by target when system-wide; minimal DWARF > for address->(file, line, discriminator); aggregate for GCC AutoFDO > toolchain. Output: profile in toolchain format. > > Extensions > ========== > > Add support for gathering branch profiles and memory profile as an > extension to the current gcov format. > >
