https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80952

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|missed-optimization         |
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-06-02
     Ever confirmed|0                           |1

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
Confirmed, it's caused by adding -fprofile-update option in GCC 7.1:
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html

Where by default (when one uses -pthread), it's -fprofile-update=atomic.
The option guarantees that collected profile is not corrupted as updating is
racy.

Using -fprofile-generate and -fprofile-update=single causes:

pr80952.cpp: In function ‘main._omp_fn.0’:
pr80952.cpp:40:1: error: corrupted profile info: profile data is not
flow-consistent
 }
 ^
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
3-4 thought to be 32
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
3-13 thought to be -2
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
11-8 thought to be -239929
pr80952.cpp:40:1: error: corrupted profile info: number of executions for edge
11-12 thought to be 247808373

Running perf confirms that locking is bottleneck:


    0.00 :        4014a3:       lock addq $0x1,0x204024(%rip)        # 6054d0
<__gcov0.main._omp_fn.0+0x30>
   49.28 :        4014ac:       cmp    %ecx,%r8d
    0.00 :        4014af:       jl     4014c3 <main._omp_fn.0+0x93>
    0.00 :        4014b1:       lock addq $0x1,0x203ffe(%rip)        # 6054b8
<__gcov0.main._omp_fn.0+0x18>
         :                              {
         :                                      if (dividend % divisor == 0) {
   50.12 :        4014ba:       mov    %esi,%eax
    0.01 :        4014bc:       cltd   

Well, I planned to provide profile update method where there will be function
local counters that will be merged to global at function exit.
That would definitely help in this example.

Reply via email to