https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093
--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> --- Created attachment 61957 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61957&action=edit patch to autofdo for multiple source locations per single instruction This is patch which makes the autofdo tool to handle multiple source locations per single instruction. It makes profiles more fine-grained but hits pre-existing problem with associating locations to inline stacks. Here is testcase that can be trained: static int p1(int a) { return a+1; } static int p2(int a) { return a+2; } __attribute__ ((noipa)) int p3 (int a) { /* Line 12 */ return p1(p2(a)); /* Line 13 */ } int main(void) { int ret; for (int i = 0; i < 1000000000; i++) ret += p3 (0); return ret; } Theere is single add instruction that is a result of optimizing p1, p2 and p3 together. We get following profile: p3 total:463580 head:191668 3: 92716 2.1: p1.__uniq.183670898460993768453328813661018809772 total:370864 0: 92716 2: 92716 11: 92716 12: 92716 main total:371959 head:0 1: 0 2: 0 3: 0 3.1: 92977 3.2: 92977 4: 93028 p3:95834 4.1: 92977 5: 0 6: 0 It correctly represents that p1 is inlined in p3 but p2 is missing (as discussed din this bug already). However another problem is that p1 profile contains: 11: 92716 12: 92716 while p1 has no lines 11 and 12 at all. This corresponds to lines 12 and 13 if the source code. The problem is that we get: p3: .LVL0: # DEBUG a => di .LFB2: .file 1 "a.c" # a.c:12:1 .loc 1 12 1 view -0 .cfi_startproc # a.c:13:9 .loc 1 13 9 view .LVU1 # DEBUG a => di+0x2 .LBB6: .LBI6: # a.c:1:12 .loc 1 1 12 view .LVU2 .LBB7: # a.c:3:9 .loc 1 3 9 view .LVU3 # DEBUG a RESET # a.c:3:17 .loc 1 3 17 is_stmt 0 view .LVU4 leal 3(%rdi), %eax .LBE7: .LBE6: There is single lea with locations a.c:1 (entry of p1), a.c:3 (body of p1), a.c:12 and a.c:13 (which is prologue and body of p3). Since there is subprogram of p1 with range LBB6...LBE6 .uleb128 0xb # (DIE (0xc3) DW_TAG_inlined_subroutine) .long 0x111 # DW_AT_abstract_origin .quad .LBI6 # DW_AT_entry_pc .byte .LVU2 # DW_AT_GNU_entry_view .quad .LBB6 # DW_AT_low_pc .quad .LBE6-.LBB6 # DW_AT_high_pc .byte 0x1 # DW_AT_call_file (a.c) .byte 0xd # DW_AT_call_line .byte 0x10 # DW_AT_call_column .byte 0x1 # DW_AT_GNU_discriminator this is all assigned by autofdo tools as well as gdb to p1's body. As discussed with Richi on IRC there do not seem to be a way to differentiate multiple locations with different inline stack in dwarf5, which is quite a problem here.