https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> --- LLVM also gets execution counts wrong, just the different (and less harmful) way: test:270773509:9780 1: 9116 2: 51984 for ( 4: 51984 i<s <- this is i<s and should also have large count 5: 7081488 i++ 6: 7081488 a[i]++ 7: 8576 main:36431:0 1: 0 2.1: 9051 3: 9278 test:9780 4: 0 I am confused why the autofdo tools does this. In the internal loop we output: .L4: .loc 1 10 11 is_stmt 1 view .LVU8 <- a[i] .loc 1 10 15 is_stmt 0 view .LVU9 <- ++ movdqa a(%rax), %xmm0 addq $16, %rax paddd %xmm1, %xmm0 movaps %xmm0, a-16(%rax) .loc 1 9 15 is_stmt 1 view .LVU10 <- i++ .loc 1 8 16 view .LVU11 <- i<s cmpq %rax, %rdx jne .L4 Exchanging to .loc 1 8 16 view .LVU11 <- i<s .loc 1 9 15 is_stmt 1 view .LVU10 <- i++ yields to: test total:2652901 head:4123 3: 0 4: 4123 5: 1322715 6: 1322715 7: 3348 main total:3983 head:0 1: 0 2.1: 1916 3: 2067 test:1925 4: 0 So it seems that the tool only takes only the first location of the sample, which is odd, since debug stmts may come from multiple original basic blocks and this fact is not visible. Ideally we could do something like: .L4: .loc 1 10 11 is_stmt 1 view .LVU8 <- a[i] movdqa a(%rax), %xmm0 .loc 1 9 15 is_stmt 1 view .LVU10 <- i++ addq $16, %rax .loc 1 10 15 is_stmt 0 view .LVU9 <- ++ paddd %xmm1, %xmm0 movaps %xmm0, a-16(%rax) .loc 1 8 16 view .LVU11 <- i<s cmpq %rax, %rdx jne .L4 Which would make things to work (since there are no chained debug stmts) and breakpointing would be less surprising but I understand it is not designed to work this way.... llvm does .LBB0_4: # =>This Inner Loop Header: Depth=1 .loc 0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15 movdqa (%rsi,%rdi), %xmm1 movdqa 16(%rsi,%rdi), %xmm2 psubd %xmm0, %xmm1 psubd %xmm0, %xmm2 movdqa %xmm1, (%rsi,%rdi) movdqa %xmm2, 16(%rsi,%rdi) .loc 0 9 15 discriminator 33 # ll.c:9:15 addq $32, %rsi cmpq %rsi, %rdx jne .LBB0_4 So it has only line 9 and 10. Large discriminator numbers seems to be FS discriminator encoding. LLVM assigns discriminators twice. First one is done similarly as we do, but scaled up. I think it is supposed to handle when statement gets duplicated into multiple basic blocks, like a[i]++ does. So it has: .loc 0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15 movdqa (%rsi,%rdi), %xmm1 movdqa 16(%rsi,%rdi), %xmm2 psubd %xmm0, %xmm1 psubd %xmm0, %xmm2 movdqa %xmm1, (%rsi,%rdi) movdqa %xmm2, 16(%rsi,%rdi) for the vectorized body and .loc 0 10 15 is_stmt 1 # ll.c:10:15 leaq (%rcx,%rdx,4), %rdi incl (%rsi,%rdi) for epilogue. Tool has -fuse_discriminator_encoding option which then merges values back. I will look into what this really does.