From: Dhruv Chawla <dhr...@nvidia.com> Introduction ------------
Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish profile information for `function_instance's with the same base name, when suffixes are removed. To fix this, source file names should be tracked in the GCOV file information to disambiguate functions. This issue occurs when privatized clones are created for an LTO partition, when there are static functions that have the same name in the same partition. Proposed solution ----------------- 1. In the string_table section of the GCOV file, each function name will have the source file-name that it came from written after it, in sequence. The current layout of the file is: GCOV_TAG_AFDO_FILE_NAMES <function 1> <function 2> ... With this change the layout becomes: GCOV_TAG_AFDO_FILE_NAMES <file name 1> <file name 2> ... <function 1> <file 1 idx> <function 2> <file 2 idx> ... 2. AUTO_PROFILE_VERSION will be increased from 2 to 3 as this is a breaking change to the GCOV file format used by AutoFDO. A patch is attached with this RFC for a prototype implementation. There is an open question here: What about backwards compatibility? Should a lack of source file-name information be handled in the code (to keep supporting version 2)? Example ------- As an example, consider the following code: === test.c === #define TRIP 1000000000 __attribute__((noinline, noipa)) static void effect_1() {} __attribute__((noinline, noipa)) static void effect_2() {} __attribute__((noinline, noipa)) static int foo() { return 5; } // Prevent GCC from optimizing the loop __attribute__((noinline, noipa)) int use(int x) { volatile int y = x; return x; } extern void global(); int main() { // 1'000'000'000 for (int i = 0; i < TRIP; i++) { // Call only 50% of the time if (use(i) < TRIP / 2) { global(); } if (foo() < 5) { effect_1(); } else { effect_2(); } } } === test-2.c === __attribute__((noinline, noipa)) static void do_nothing() {} __attribute__((noinline, noipa)) static void effect_1() { do_nothing(); } __attribute__((noinline, noipa)) static void effect_2() { do_nothing(); } void global() { effect_1(); effect_2(); } === === There are four LTO privatized clones created here, two for effect_1() and two for effect_2(). If effect_1.lto_priv.0 and effect_2.lto_priv.0 are created for test.c, and effect_1.lto_priv.1 and effect_2.lto_priv.1 are created for test-2.c, then: - effect_1.lto_priv.0 is never executed - effect_2.lto_priv.0 is executed 100% of the time - effect_1.lto_priv.1 and effect_2.lto_priv.1 are executed 50% of the time This is reflected in the gcov dump: main total:3475985 head:0 <...> 11: 429139 effect_2.lto_priv.0:421383 14: 0 5: global total:407915 0: 204155 effect_1.lto_priv.1:203895 0.1: 203760 effect_2.lto_priv.1:203976 use total:436707 head:436706 0: 436707 foo total:434247 head:434246 0: 434247 effect_2.lto_priv.0 total:421383 head:421383 0: 421383 do_nothing total:407756 head:407756 0: 407756 effect_2.lto_priv.1 total:203976 head:203976 0: 203976 do_nothing:204004 effect_1.lto_priv.1 total:203895 head:203895 0: 203895 do_nothing:203752 Note that effect_1.lto_priv.0 does not show up at all. When annotating the code, auto-profile is not able to distinguish between the two effect_1 functions and ends up using the effect_1.lto_priv.1 profile for both functions. It also merges the profiles for both effect_2 clones: ;; Function effect_1 (effect_1.lto_priv.0, funcdef_no=0, decl_uid=23321, cgraph_uid=1, symbol_order=0) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_1 () { <bb 2> [count: 209702]: return; } ;; Function effect_2 (effect_2.lto_priv.0, funcdef_no=1, decl_uid=23322, cgraph_uid=2, symbol_order=1) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_2 () { <bb 2> [count: 627698]: return; } ;; Function effect_1 (effect_1.lto_priv.1, funcdef_no=4, decl_uid=23329, cgraph_uid=8, symbol_order=6) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_1 () { <bb 2> [count: 209702]: do_nothing (); [tail call] return; } ;; Function effect_2 (effect_2.lto_priv.1, funcdef_no=5, decl_uid=23330, cgraph_uid=9, symbol_order=7) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_2 () { <bb 2> [count: 627698]: do_nothing (); [tail call] return; } effect_1.lto_priv.0 should actually have a 0 count, and the profiles for effect_2.lto_priv.{0,1} should not be merged. After adding the file names to the GCOV info, the dump looks like the following: main:test.c total:3373660 head:0 <...> 11: 421399 effect_2.lto_priv.0:test.c:412102 14: 0 5: global:test-2.c total:403456 0: 201888 effect_1.lto_priv.1:test-2.c:201719 0.1: 201568 effect_2.lto_priv.1:test-2.c:201696 foo:test.c total:432888 head:432888 0: 432888 use:test.c total:412260 head:412260 0: 412260 effect_2.lto_priv.0:test.c total:412104 head:412102 0: 412104 do_nothing:test-2.c total:403359 head:403359 0: 403359 effect_1.lto_priv.1:test-2.c total:201719 head:201719 0: 201719 do_nothing:test-2.c:201619 effect_2.lto_priv.1:test-2.c total:201697 head:201696 0: 201697 do_nothing:test-2.c:201740 Each function has been annotated with the source file that it came from. This makes the tree-optimized dump like the following: ;; Function effect_1 (effect_1.lto_priv.0, funcdef_no=0, decl_uid=23321, cgraph_uid=1, symbol_order=0) __attribute__((noipa, noinline, noclone, no_icf)) void effect_1 () { <bb 2> [local count: 1073741824]: return; } ;; Function effect_2 (effect_2.lto_priv.0, funcdef_no=1, decl_uid=23322, cgraph_uid=2, symbol_order=1) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_2 () { <bb 2> [count: 412102]: return; } ;; Function effect_1 (effect_1.lto_priv.1, funcdef_no=4, decl_uid=23329, cgraph_uid=8, symbol_order=6) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_1 () { <bb 2> [count: 201719]: do_nothing (); [tail call] return; } ;; Function effect_2 (effect_2.lto_priv.1, funcdef_no=5, decl_uid=23330, cgraph_uid=9, symbol_order=7) (hot) __attribute__((noipa, noinline, noclone, no_icf)) void effect_2 () { <bb 2> [count: 201697]: do_nothing (); [tail call] return; } Here, the functions have been annotated as expected. Limitations ----------- As stated in PR120229, source files with the same name will still be broken. It may be worth adding more of the path into the information to disambiguate this case. Bootstrapped and regtested on aarch64-linux-gnu. Dhruv Chawla (1): [RFC][AutoFDO] Source filename tracking in GCOV gcc/auto-profile.cc | 101 ++++++++++++++++++++++++++++++---- gcc/testsuite/lib/profopt.exp | 2 +- 2 files changed, 91 insertions(+), 12 deletions(-) -- 2.44.0