https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Created attachment 61957
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61957&action=edit
patch to autofdo for multiple source locations per single instruction

This is patch which makes the autofdo tool to handle multiple source locations
per single instruction.  It makes profiles more fine-grained but hits
pre-existing problem with associating locations to inline stacks. Here is
testcase that can be trained:

static int p1(int a)
{
        return a+1;
}
static int p2(int a)
{
        return a+2;
}

__attribute__ ((noipa))
int p3 (int a)
{                             /* Line 12 */
        return p1(p2(a));     /* Line 13 */
}

int
main(void)
{
        int ret;
        for (int i = 0; i < 1000000000; i++)
                ret += p3 (0);
        return ret;
}


Theere is single add instruction that is a result of optimizing p1, p2 and p3
together.

We get following profile:

p3 total:463580 head:191668
  3: 92716
  2.1: p1.__uniq.183670898460993768453328813661018809772 total:370864
    0: 92716
    2: 92716
    11: 92716
    12: 92716
main total:371959 head:0
  1: 0
  2: 0
  3: 0
  3.1: 92977
  3.2: 92977
  4: 93028  p3:95834
  4.1: 92977
  5: 0
  6: 0

It correctly represents that p1 is inlined in p3 but p2 is missing (as
discussed din this bug already). However another problem is that p1 profile
contains:
    11: 92716
    12: 92716
while p1 has no lines 11 and 12 at all.  This corresponds to lines 12 and 13 if
the source code.  The problem is that we get:

p3:
.LVL0:
        # DEBUG a => di
.LFB2:
        .file 1 "a.c"
        # a.c:12:1
        .loc 1 12 1 view -0
        .cfi_startproc
        # a.c:13:9
        .loc 1 13 9 view .LVU1
        # DEBUG a => di+0x2
.LBB6:
.LBI6:
        # a.c:1:12
        .loc 1 1 12 view .LVU2
.LBB7:
        # a.c:3:9
        .loc 1 3 9 view .LVU3
        # DEBUG a RESET
        # a.c:3:17
        .loc 1 3 17 is_stmt 0 view .LVU4
        leal    3(%rdi), %eax
.LBE7:
.LBE6:

There is single lea with locations a.c:1 (entry of p1), a.c:3 (body of p1),
a.c:12 and a.c:13 (which is prologue and body of p3).
Since there is subprogram of p1 with range LBB6...LBE6

        .uleb128 0xb    # (DIE (0xc3) DW_TAG_inlined_subroutine)
        .long   0x111   # DW_AT_abstract_origin
        .quad   .LBI6   # DW_AT_entry_pc
        .byte   .LVU2   # DW_AT_GNU_entry_view
        .quad   .LBB6   # DW_AT_low_pc
        .quad   .LBE6-.LBB6     # DW_AT_high_pc
        .byte   0x1     # DW_AT_call_file (a.c)
        .byte   0xd     # DW_AT_call_line
        .byte   0x10    # DW_AT_call_column
        .byte   0x1     # DW_AT_GNU_discriminator

this is all assigned by autofdo tools as well as gdb to p1's body.
As discussed with Richi on IRC there do not seem to be a way to differentiate
multiple locations with different inline stack in dwarf5, which is quite a
problem here.

Reply via email to