> Hi all, > > Per prior discussions, here is the specification we are proposing for > version 4 of the AutoFDO GCOV profile format. This is a complete re-design > focused around being a backwards-compatible and extensible format which > can be partially read by the compiler. > > ================= > > Table of Contents > > 1. Introduction > 2. Motivation > 3. Open Questions > 4. Binary Format > 4.1. File Header > 4.2. Section Layout > 4.3. String Table Section (GCOV_STRING_TABLE) > 4.4. Summary Section (GCOV_SUMMARY_INFO) > 4.5. File Names Section (GCOV_FILE_NAMES) > 4.6. Symbol Names Section (GCOV_SYMBOL_NAMES) > 4.7. Symbol Info Section (GCOV_SYMBOL_INFO) > 4.8. Location Encoding > 4.9. Compact Encoding Mode > 5. Textual Format > 5.1. Overview > 5.2. Grammar > 5.3. Example > 6. Extensibility > 6.1. Binary Format > 6.2. Textual Format > 6.3. Forward Compatibility > 7. Experimental Results > > > 1. Introduction > > The existing GCOV AutoFDO profile formats (versions 2 and 3) use a > fixed, non-extensible tag-length sequential layout. Version 4 redesigns > the format to be a section-based one with the following goals: > > - The ability to partially read the profile, i.e. unrequired sections are > simply skipped by a reader. > > - Forward compatibility through typed location records with a > trailing size field for unknown types. This is useful (for example) for > future work regarding branch profiling for better if-conversion > heuristics. > > - Reduced file size through trie-compressed string tables, > optional discriminators, zero-count elision, and variable-width > sample counts, as detailed in later sections. > > - Optional compact encoding using Protocol Buffers-style varints > for further size reduction. > > - A companion human-readable textual format for inspection. > > Measured against a GCC bootstrap profile (58 MB in v3), the v4 > binary format achieves -43% (33 MB) in normal mode and -72% > (16 MB) in compact mode. Against SPEC CPU 2017 profiles, it > also achieves -43% in normal mode and -72% in compact mode on average. > > All multi-byte integers in the binary format are unsigned and > stored in big-endian byte order. > > "Varint" refers to the variable-length integer encoding defined by Protocol > Buffers: each byte uses 7 data bits with the MSB as a continuation flag, > least significant group first > (see https://protobuf.dev/programming-guides/encoding/#varints). > > 2. Motivation > > The primary issue with the existing GCOV format is the lack of > extensibility > and backwards-compatibility. Adding features to the format necessitates > bumping the version number, which also requires updating all of the tooling > working with the format. > > Secondly, the profile is required to be streamed in its entirety from disk > whenever it is read, which leads to duplicate work when multiple TUs are > compiled as the compiler has to read the entire profile per TU. Having the > ability to read the profile partially allows getting rid of most of this > duplicate work. > > 3. Open Questions > > - Can the GCOV profile format be renamed to something better? The > current name is confusing and collides with the existing GCC PGO > format. Perhaps "AFDO" could work.
I would wote for renaming. Gcov is already confusing for -fprofile-use but comes from a history. With auto-fdo it even more confusing since the file format is not compatible. .afdo works for me (or perhaps with gcc in name). > > 4. Binary Format > > 4.1. File Header > > The file begins with a fixed header followed by a section table. > > GCOV_MAGIC 4 bytes ASCII 'gcov' (0x67636F76) If we rename it perhaps afdo. gcov tools can be extended to recognize it meaningully. > GCOV_VERSION 4 bytes 0x00000004 > GCOV_HEADER_BITMASK 1 byte > Bit 7 : compact flag > Bits 0-6 : reserved > GCOV_NUM_SECTIONS 7 bytes Number of entries in the > section table (excludes the > two fixed sections below) > > Immediately following are offset/size pairs for two fixed > sections, then the section table: > > GCOV_SUMMARY_OFFSET 8 bytes > GCOV_SUMMARY_SIZE 8 bytes > GCOV_FILE_NAMES_OFFSET 8 bytes > GCOV_FILE_NAMES_SIZE 8 bytes > > GCOV_SECTION (repeated GCOV_NUM_SECTIONS times): > GCOV_SECTION_OFFSET 8 bytes > GCOV_SECTION_SIZE 8 bytes > > All offsets are byte offsets from the start of the file. > > The offsets for the summary and file names sections are provided > for fast access. There is no specified ordering of any of the > sections. The indexing is done based on the order they are placed > within the file, and is inclusive of the two sections mentioned > before. > > 4.2. Section Layout > > Each section begins with a one-byte bitmask: > > GCOV_SECTION_BITMASK 1 byte > Bit 7 : compact flag for this section > Bits 0-6 : section type > > Defined section types: > > 0x01 GCOV_STRING_TABLE > 0x02 GCOV_SUMMARY_INFO > 0x03 GCOV_FILE_NAMES > 0x04 GCOV_SYMBOL_NAMES > 0x05 GCOV_SYMBOL_INFO > > The section bitmask is followed immediately by the section data as > defined in the subsections below. > > 4.3. String Table Section (GCOV_STRING_TABLE) > > Each translation unit has its own string table section, storing > its symbol name strings in a path-compressed trie. This exploits > the shared prefixes common in C++ mangled names (e.g. > "_ZNSt6vector..."). > > GCOV_NUM_STRINGS 4 bytes Total number of strings > > Followed by a serialized trie starting at the root node: > > GCOV_TRIE_NODE: > TRIE_NODE_BITMASK 1 byte > Bit 7: terminal flag (a complete string ends here) > Bits 0-6: number of children (0-127) > TRIE_TERMINAL_ID 4 bytes (present only if terminal) > String index for this terminal > TRIE_CHILD (repeated for each child): > TRIE_EDGE_LABEL_LENGTH 2 bytes > TRIE_EDGE_LABEL variable (TRIE_EDGE_LABEL_LENGTH > bytes) > GCOV_TRIE_NODE (recursively) > > The trie is path-compressed: chains of nodes where each node has > exactly one child and is not a terminal are collapsed into a single > edge with a multi-character label. To reconstruct a string, > edge labels are concatenated from the root to the terminal node, like a > typical trie. Path-compressions looks like a good idea I did not think of previously. We wnat to have independent sgring tables to support independent reading? Note that with profile-use we will end up with possibly massive TUs anyway (which is a problem e.g. for dwarf). > > 4.4. Summary Section (GCOV_SUMMARY_INFO) > > The summary section contains aggregate statistics for the entire > profile. > > SUMMARY_TOTAL_COUNT 8 bytes > SUMMARY_MAX_COUNT 8 bytes > SUMMARY_MAX_FN_COUNT 8 bytes > SUMMARY_NUM_COUNTS 8 bytes > SUMMARY_NUM_FUNCTIONS 8 bytes > SUMMARY_NUM_DETAILED_ENTRIES 8 bytes > > Followed by detailed histogram entries: > > SUMMARY_DETAILED_ENTRY (repeated SUMMARY_NUM_DETAILED_ENTRIES > times): > ENTRY_CUTOFF 4 bytes Cutoff value > ENTRY_MIN_COUNT 8 bytes Minimum count at this percentile > ENTRY_NUM_COUNTS 8 bytes Number of counters at or above this > minimum We may want to add more fileds later. I wonder if we want to have some method of declaring what fields are present and what is their size. > > 7.1. SPEC CPU 2017 > > Profile v3 v4 v4c v4 v4c > ------- -- -- --- ----- ----- > cpugcc_r 7.1M 4.1M 1.9M -43% -74% > cpuxalan_r 1.3M 790K 516K -40% -61% > deepsjeng_r 103K 59K 30K -44% -71% > exchange2_r 27K 16K 7.7K -43% -71% > leela_r 299K 195K 113K -35% -62% > mcf_r 27K 16K 7.8K -43% -71% > omnetpp_r 1.2M 764K 425K -37% -65% > perlbench_r 412K 229K 105K -44% -75% > specrand_ir 3.3K 1.6K 705 -53% -79% > x264_r 434K 247K 113K -43% -74% > xz_r 75K 46K 23K -39% -69% > > TOTAL 11M 6.4M 3.2M -42% -71% > > 7.2. GCC Bootstrap > > Profile v3 v4 v4c v4 v4c > ------- -- -- --- ----- ----- > all.fda 56M 32M 16M -43% -71% This does look nice. I am looking forward discussing it tomorrow. I wonder if you did any comparison to LLVM's implementation? Honza > > -- > Regards, > Dhruv >
