From: Dhruv Chawla <dhr...@nvidia.com>

Introduction
------------

Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish
profile information for `function_instance's with the same base name, when
suffixes are removed. To fix this, source file names should be tracked in the
GCOV file information to disambiguate functions. This issue occurs when
privatized clones are created for an LTO partition, when there are
static functions that have the same name in the same partition.

Proposed solution
-----------------

1. In the string_table section of the GCOV file, each function name will have
   the source file-name that it came from written after it, in sequence. The
   current layout of the file is:

   GCOV_TAG_AFDO_FILE_NAMES
   <function 1> <function 2> ...

   With this change the layout becomes:

   GCOV_TAG_AFDO_FILE_NAMES
   <file name 1> <file name 2> ...
   <function 1> <file 1 idx> <function 2> <file 2 idx> ...

2. AUTO_PROFILE_VERSION will be increased from 2 to 3 as this is a breaking
   change to the GCOV file format used by AutoFDO.

A patch is attached with this RFC for a prototype implementation. There
is an open question here: What about backwards compatibility? Should a lack of
source file-name information be handled in the code (to keep supporting version
2)?

Example
-------

As an example, consider the following code:

=== test.c ===

#define TRIP 1000000000

__attribute__((noinline, noipa)) static void effect_1() {}
__attribute__((noinline, noipa)) static void effect_2() {}
__attribute__((noinline, noipa)) static int foo() { return 5; }

// Prevent GCC from optimizing the loop
__attribute__((noinline, noipa)) int use(int x) { volatile int y = x; return x; 
}

extern void global();
int main() {
  // 1'000'000'000
  for (int i = 0; i < TRIP; i++) {
    // Call only 50% of the time
    if (use(i) < TRIP / 2) {
      global();
    }

    if (foo() < 5) {
      effect_1();
    } else {
      effect_2();
    }
  }
}

=== test-2.c ===

__attribute__((noinline, noipa)) static void do_nothing() {}
__attribute__((noinline, noipa)) static void effect_1() { do_nothing(); }
__attribute__((noinline, noipa)) static void effect_2() { do_nothing(); }

void global() { effect_1(); effect_2(); }

=== ===

There are four LTO privatized clones created here, two for effect_1() and
two for effect_2(). If effect_1.lto_priv.0 and effect_2.lto_priv.0 are created
for test.c, and effect_1.lto_priv.1 and effect_2.lto_priv.1 are created for
test-2.c, then:
- effect_1.lto_priv.0 is never executed
- effect_2.lto_priv.0 is executed 100% of the time
- effect_1.lto_priv.1 and effect_2.lto_priv.1 are executed 50% of the time

This is reflected in the gcov dump:

main total:3475985 head:0
  <...>
  11: 429139  effect_2.lto_priv.0:421383
  14: 0
  5: global total:407915
    0: 204155  effect_1.lto_priv.1:203895
    0.1: 203760  effect_2.lto_priv.1:203976
use total:436707 head:436706
  0: 436707
foo total:434247 head:434246
  0: 434247
effect_2.lto_priv.0 total:421383 head:421383
  0: 421383
do_nothing total:407756 head:407756
  0: 407756
effect_2.lto_priv.1 total:203976 head:203976
  0: 203976  do_nothing:204004
effect_1.lto_priv.1 total:203895 head:203895
  0: 203895  do_nothing:203752

Note that effect_1.lto_priv.0 does not show up at all.

When annotating the code, auto-profile is not able to distinguish between
the two effect_1 functions and ends up using the effect_1.lto_priv.1 profile
for both functions. It also merges the profiles for both effect_2 clones:

;; Function effect_1 (effect_1.lto_priv.0, funcdef_no=0, decl_uid=23321, 
cgraph_uid=1, symbol_order=0) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_1 ()
{
  <bb 2> [count: 209702]:
  return;
}

;; Function effect_2 (effect_2.lto_priv.0, funcdef_no=1, decl_uid=23322, 
cgraph_uid=2, symbol_order=1) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_2 ()
{
  <bb 2> [count: 627698]:
  return;
}

;; Function effect_1 (effect_1.lto_priv.1, funcdef_no=4, decl_uid=23329, 
cgraph_uid=8, symbol_order=6) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_1 ()
{
  <bb 2> [count: 209702]:
  do_nothing (); [tail call]
  return;
}

;; Function effect_2 (effect_2.lto_priv.1, funcdef_no=5, decl_uid=23330, 
cgraph_uid=9, symbol_order=7) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_2 ()
{
  <bb 2> [count: 627698]:
  do_nothing (); [tail call]
  return;
}

effect_1.lto_priv.0 should actually have a 0 count, and the profiles for
effect_2.lto_priv.{0,1} should not be merged. After adding the file names to
the GCOV info, the dump looks like the following:

main:test.c total:3373660 head:0
  <...>
  11: 421399  effect_2.lto_priv.0:test.c:412102
  14: 0
  5: global:test-2.c total:403456
    0: 201888  effect_1.lto_priv.1:test-2.c:201719
    0.1: 201568  effect_2.lto_priv.1:test-2.c:201696
foo:test.c total:432888 head:432888
  0: 432888
use:test.c total:412260 head:412260
  0: 412260
effect_2.lto_priv.0:test.c total:412104 head:412102
  0: 412104
do_nothing:test-2.c total:403359 head:403359
  0: 403359
effect_1.lto_priv.1:test-2.c total:201719 head:201719
  0: 201719  do_nothing:test-2.c:201619
effect_2.lto_priv.1:test-2.c total:201697 head:201696
  0: 201697  do_nothing:test-2.c:201740

Each function has been annotated with the source file that it came from. This
makes the tree-optimized dump like the following:

;; Function effect_1 (effect_1.lto_priv.0, funcdef_no=0, decl_uid=23321, 
cgraph_uid=1, symbol_order=0)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_1 ()
{
  <bb 2> [local count: 1073741824]:
  return;
}

;; Function effect_2 (effect_2.lto_priv.0, funcdef_no=1, decl_uid=23322, 
cgraph_uid=2, symbol_order=1) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_2 ()
{
  <bb 2> [count: 412102]:
  return;
}

;; Function effect_1 (effect_1.lto_priv.1, funcdef_no=4, decl_uid=23329, 
cgraph_uid=8, symbol_order=6) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_1 ()
{
  <bb 2> [count: 201719]:
  do_nothing (); [tail call]
  return;
}

;; Function effect_2 (effect_2.lto_priv.1, funcdef_no=5, decl_uid=23330, 
cgraph_uid=9, symbol_order=7) (hot)
__attribute__((noipa, noinline, noclone, no_icf))
void effect_2 ()
{
  <bb 2> [count: 201697]:
  do_nothing (); [tail call]
  return;
}

Here, the functions have been annotated as expected.

Limitations
-----------

As stated in PR120229, source files with the same name will still be broken.
It may be worth adding more of the path into the information to disambiguate
this case.

Bootstrapped and regtested on aarch64-linux-gnu.

Dhruv Chawla (1):
  [RFC][AutoFDO] Source filename tracking in GCOV

 gcc/auto-profile.cc           | 101 ++++++++++++++++++++++++++++++----
 gcc/testsuite/lib/profopt.exp |   2 +-
 2 files changed, 91 insertions(+), 12 deletions(-)

-- 
2.44.0

Reply via email to