https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093
--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> --- > in the end I'm not sure what's "wrong" here and why you think you are missing p2 - p2 is not executed, you shouldn't get any profile on it. Seems we kind of disagree on how "executed" is defined. If you compile with -O0 then p2 is executed and you can breakpoint in it (gdb) break p2 Breakpoint 1 at 0x40112c: file t.c, line 8. (gdb) r Starting program: /tmp/a.out Breakpoint 1, p2 (a=0) at t.c:8 8 return a+2; In optimized binary both p1, p2 and part of p3 are executed as a single instruction: .loc 1 14 9 view .LVU1 .LBB6: .LBI6: .loc 1 2 12 view .LVU2 .LBB7: .loc 1 4 9 view .LVU3 .loc 1 4 17 is_stmt 0 view .LVU4 leal 3(%rdi), %eax .LBE7: .LBE6: I believe that debug markers are designed to make debugging of optimized binary closer to debugging of optimized binary in such situations and it seems reasonable to expect that if I breakpoint in p2 it will trigger both in optimized and unoptimized binary. If you do i++; i++; i++; which is equivalent code but without putting things to random inlines, it will work, since the debug statements will not be discarded. I actually code the block removal code long time ago, but it was before debug statements stuff. AFDO needs kind of similar behaviour since it reads profile of optimized binary and retrofits it to not yet fully optimized code and relies on debug info to hold this together. This is bit of an extreme example and it is easy to fix the issue at profile read in. However, it is based on what happens in deepsjeng. In C++ if you have getter/setters and iterators for everything, often multiple calls get combined and if we lose the locations we may end up losing info on loop headers that confused hot/cold logic.