Hi,

Building on our discussions from the last Cauldron, we propose creating a 
native, simplified AutoFDO tool for GCC to replace our current reliance on 
external Google tools which is not actively maintained. I'll follow up with a 
detailed design document as soon as we have a consensus on the proposal.

Thanks,
Kugan

Summary
=======

We propose a standalone, minimal tool for generating AutoFDO profiles
that can be consumed by the GCC AutoFDO toolchain, with the goal of
integrating it into the GCC repository. The tool would
support: (1) offline read existing perf.data (single-process or
system-wide) and produce a profile for a target binary; (2) direct
attach to a process via the PMU (LBR or BRBE / SPE), bypassing perf
record and building the profile from the live sample stream; (3)
system-wide read perf.data from system-wide collection (e.g. perf
record -a), filter samples by the target application, and generate
gcov/profile for that application. This keeps the design simple,
dependencies minimal, and the tool easier to maintain in step with GCC.

Motivation
==========

- Current AutoFDO tools (e.g. from Google) are not widely used with
GCC. LLVM has a similar profile-creation tool integrated with the
compiler. A tightly coupled tool for GCC would allow for easy
development and upgradation.

- A lightweight tool that generates AutoFDO profiles for the GCC
AutoFDO toolchain (with minimal perf parsing and minimal DWARF) can
be memory efficient.

- An optional mode to pipe PMU data directly via perf_event_open (LBR or
BRBE, and SPE) makes the tool more memory efficient.

Goals
=====

1. Simplicity - One job: turn samples (from file or live) into
AutoFDO profiles that can be consumed by the GCC AutoFDO toolchain.

2. Minimal dependencies -  Dependent only on libraries such as libdwarf
for DWARF parsing (no large frameworks).

3. Input modes -  Offline (perf.data, single-process or system-wide),
direct (tool runs the workload and reads LBR or BRBE (or SPE) via
the PMU, bypassing perf record), and system-wide (perf.data from
system-wide collection, then generate gcov for a chosen application).

4. Easier maintenance Ideally part of or released with GCC for fast
iteration when profile format or DWARF expectations change.

Requirements
============

- Read perf.data and parse branch stack (LBR or BRBE) records to obtain
(address, count) for the target binary; support SPE (ARM Statistical
Profiling Extension) as an extension point (parse SPE/AUX records when
present, stub or full implementation).

- Parse MMAP2 (and MMAP) records to map runtime addresses to the
profiled binary and file offsets. Support system-wide perf.data:
filter samples by target binary (using MMAP2/COMM/pid) and produce
gcov (or AutoFDO profile) for that application only.

- Direct mode: run a user-supplied command, attach via perf_event_open,
read LBR or BRBE (or SPE) from the kernel ring buffer, and parse MMAP2/MMAP
from the same stream; produce the same profile format as offline mode
without writing perf.data.

- Use a minimal DWARF subset to map instruction addresses to (source
file, line, discriminator) for the target binary (e.g. line table,
address ranges, minimal subprogram info).

- Emit profile output in the format consumed by the GCC AutoFDO
toolchain (e.g. gcov-style or the format used by -fauto-profile).

- Portability: support Linux (perf_event_open, LBR or BRBE, SPE); other
hosts/PMUs can be added later without changing the core design.

Use Cases
=========

- Offline from existing perf.data: user has perf.data from a
single-process run; tool produces profile for the target binary.

- Direct: one-shot "run and profile": user runs the tool with a
command; tool executes it, attaches via PMU, collects samples (no
perf.data file).

- System-wide: user has perf.data from "perf record -a"; tool filters
by target binary (-b <binary>) and produces profile for that
application only.

- CI / automated builds: script runs perf record then tool, or tool in
direct mode, or system-wide perf then tool with -b <binary>.

Dependencies
============

The tool is dependent only on libraries such as libdwarf for DWARF
parsing (and libelf as typically required by libdwarf for ELF access).
Perf/PMU data is read via standard system interfaces (e.g. perf_event_open
for direct mode; perf.data file for offline/system-wide). No dependency
on the perf userspace tool for direct mode; for offline mode, input is
a perf.data file (produced by perf record or any writer of that format).

Scope of the Tool
=================

Parse only branch stack (LBR or BRBE), MMAP2 (and MMAP), and optionally
SPE. Map addresses to source (file, line, discriminator) via a minimal
DWARF subset (line table, address ranges; libdwarf sufficient). Output
is the profile format consumed by the GCC AutoFDO toolchain. Offline:
read perf.data (single-process or system-wide; if system-wide, filter by
-b <binary>). Direct: run user command, attach via perf_event_open,
read from kernel ring buffer (no perf.data).

Benefits
========

Small codebase and minimal dependencies ease review. In-tree with GCC
allows fast iteration when profile format or DWARF expectations change.
Single workflow supports offline, direct, and system-wide use.

Integration
===========

The tool will be kept as part of the GCC repository (e.g. contrib/ or a
dedicated directory), built and installed with GCC, so it stays in step
with the compiler and profile format.

Technical Outline
=================

Input: perf.data (offline or system-wide) or live via perf_event_open
(direct). Processing: (address, count) from LBR/BRBE/SPE; MMAP2 for
address->binary and filtering by target when system-wide; minimal DWARF
for address->(file, line, discriminator); aggregate for GCC AutoFDO
toolchain. Output: profile in toolchain format.

Extensions
==========

Add support for gathering branch profiles and memory profile as an
extension to the current gcov format.

 

Reply via email to