Gprof can account for less than 1/3 of execution time?!?!

2010-02-21 Thread Jon Turner

I have recently encountered a gross inaccuracy in gprof that
I can't explain. Yes, I know gprof uses a sampling technique
so I should not expect a high level of precision, but the results
I am getting clearly reflect a more fundamental issue.

The program in question has been compiled with -pg for all
source code files. The time command reports 20 seconds of user
time (which is consistent with personal observation) but
the gprof output accounts for only about 6 seconds of the
execution time. I have eliminated all IO from the program,
and the results remain consistent.

Gprof is sampling the program every 10 ms, so in the observed
20 seconds of execution time, it should be taking 2000 samples,
which should be enough to avoid any grows inconsistencies.

Any ideas would be appreciated.

Jon


Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-21 Thread Jon Turner

Maucci, Cyrille wrote:

Hello Jon,

I'm used to gprof on HPUX and can tell you that on HPUX when we gprof
an executable, its only works on all the objects present in the
executable but not the shared libs. Actually on HPUX, either you
choose to gprof the exe or the libs but not both. When you want both
you go to more advanced tools like Caliper.

So I don't know which platform you were running on there and if gcc's
gprof works as HPUX's gprof, but if there's the same limitation as
with HPUX's gprof, maybe this is what you've hit?

>
This was run on an AMD Opteron running Linux.


++Cyrille PS: how can you claim you have eliminated all I/Os?


Not sure what you are asking. I have deleted all input
and output statements from the program. I replace the
original input by a subroutine that generates the test
data internally. In this case, I am running the program
just to get the gprof data.

If gprof were missing 10% of the execution time, I would
shrug and say no big deal. But it's missing 70% of the
execution time, which seems to imply something
fundamentally wrong with either gprof or the way
I am using it.

Jon


-Original Message- From: gcc-ow...@gcc.gnu.org
[mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jon Turner Sent: Sunday,
February 21, 2010 7:27 PM To: gcc@gcc.gnu.org Subject: Gprof can
account for less than 1/3 of execution time?!?!

I have recently encountered a gross inaccuracy in gprof that I can't
explain. Yes, I know gprof uses a sampling technique so I should not
expect a high level of precision, but the results I am getting
clearly reflect a more fundamental issue.

The program in question has been compiled with -pg for all source
code files. The time command reports 20 seconds of user time (which
is consistent with personal observation) but the gprof output
accounts for only about 6 seconds of the execution time. I have
eliminated all IO from the program, and the results remain
consistent.

Gprof is sampling the program every 10 ms, so in the observed 20
seconds of execution time, it should be taking 2000 samples, which
should be enough to avoid any grows inconsistencies.

Any ideas would be appreciated.

Jon


Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-21 Thread Jon Turner

Yes, it is statically linked. In any case, there is very
little usage of external libraries here.

Jon

Alan Modra wrote:

On Sun, Feb 21, 2010 at 12:27:04PM -0600, Jon Turner wrote:

The program in question has been compiled with -pg for all
source code files.


Linked statically too?  If not, the missing time is probably spent in
libc.so or other shared libraries.



Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Jon Turner

You're not listening. I am using -pg and the program is statically
linked. The concern I am raising is not about the function counting,
but the reported running times, which are determined by sampling
(read the gprof manual, if this is news to you).

In this case, the mcount overhead cannot account for the discrepancy,
since that would cause gprof to OVER-estimate the run time, while
in this case it is UNDER-estimating. It's missing about 70% of the
actual running time in the program.

It conceivably I am doing something wrong. I hope so, since
once I know what it is, I can fix it. But at the moment,
it's hard to avoid the suspicion that something about the
gprof implementation is deeply flawed.

Jon

Joern Rennecke wrote:

Quoting Michael Matz :


Hi,

On Sun, 21 Feb 2010, Jon Turner wrote:


I have recently encountered a gross inaccuracy in gprof that
I can't explain. Yes, I know gprof uses a sampling technique


This is incorrect.  Code compiled with -pg will call mcount on each
function entry.  If there are many calls (compared to other computations)
the mcount overhead might become fairly large.


The mcount overhead actually depends on the machine description, although
most ports have standardized on a very runtime profligate scheme.


Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Jon Turner

Ok, this is not as simple as I would like to make it,
but hopefully it's simple enough. I've placed a tar file at

www.arl.wustl.edu/~jst/gprof-tarfile

In it, you will find a directory with all the source code
needed to observe the problem for yourself.
The top level directory contains a linux executable called
maxFlo, which you should be able to run on a linux box
as is. But if you want/need to compile things yourself,
type "make clean" and "make all" in the top level
directory and you should get a fresh copy of maxFlo.

The basic test, with my results appears below.

% time maxFlo
20.356u 0.001s 0:20.38 99.8%0+0k 0+0io 0pf+0w

Note, that's 20 seconds of user time. Now for gprof,

% maxFlo
% gprof maxFlo | more
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 18.87  1.17 1.1727649 0.00 0.00  shortPath::findPath()
 18.39  2.31 1.14 58654752 0.00 0.00  flograph::res(int, int) 
const
 13.87  3.17 0.86 61391904 0.00 0.00  graph::mate(int, int) 
const
 12.18  3.93 0.76 55889952 0.00 0.00  graph::next(int, int) 
const
  6.29  4.32 0.39 61391904 0.00 0.00  graph::left(int) const
  5.65  4.67 0.35 65055289 0.00 0.00  graph::term(int) const
  5.16  4.99 0.32 61391904 0.00 0.00  digraph::tail(int) const
  4.03  5.24 0.25 70488952 0.00 0.00  graph::n() const
  3.63  5.46 0.23  9232153 0.00 0.00  list::operator&=(int)
  2.10  5.59 0.13  9165337 0.00 0.00  list::operator<<=(int)
  1.94  5.71 0.12 58654752 0.00 0.00  graph::m() const
  1.45  5.80 0.0927649 0.00 0.00  list::makeSpace()
  1.29  5.88 0.08  9165337 0.00 0.00  list::operator[](int) 
const
  1.21  5.96 0.08  9165337 0.00 0.00  graph::first(int) const
  0.81  6.01 0.05  2737152 0.00 0.00  flograph::addFlow(int, 
int, int)
  0.81  6.06 0.0527648 0.00 0.00  augPath::augment()
  0.65  6.10 0.04  2737152 0.00 0.00  min(int, int)
  0.48  6.13 0.03 15466777 0.00 0.00  flograph::src() const
  0.48  6.16 0.03 fatal(char*)
  0.24  6.17 0.02 operator<<(std::ostream&, 
list const&)
  0.16  6.18 0.01  9287496 0.00 0.00  flograph::snk() const
  0.16  6.19 0.01  9165338 0.00 0.00  list::empty() const
  0.16  6.20 0.01 list::clear()
  0.00  6.20 0.0027649 0.00 0.00  list::freeSpace()
  0.00  6.20 0.0027649 0.00 0.00  list::list(int)
  0.00  6.20 0.0027649 0.00 0.00  list::~list()
  0.00  6.20 0.00 1140 0.00 0.00  digraph::join(int, int)
  0.00  6.20 0.00 1140 0.00 0.00  flograph::changeCap(int, 
int)
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _Z7badCasei
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN4list9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN4misc6cflushERSic
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN5graph9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN7augPathC2ER8flographRi
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN7digraph9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN8flograph9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN9shortPathC2ER8flograph
Ri
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  badCase(int)
  0.00  6.20 0.001 0.00 0.00  graph::makeSpace()
  0.00  6.20 0.001 0.00 0.00  graph::graph(int, int)
  0.00  6.20 0.001   

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Jon Turner

Doh! Thanks, Nathan. I think you put your finger on it.
I was well aware of the overhead that gprof can introduce,
but did not recognize that this overhead was not being
counted by gprof.

Jon

Nathan Froyd wrote:

On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote:

In it, you will find a directory with all the source code
needed to observe the problem for yourself.
The top level directory contains a linux executable called
maxFlo, which you should be able to run on a linux box
as is. But if you want/need to compile things yourself,
type "make clean" and "make all" in the top level
directory and you should get a fresh copy of maxFlo.


So, compiling maxFlo with no -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 


real0m3.465s
user0m3.460s
sys 0m0.000s

Compiling maxFlo with -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 


real0m9.780s
user0m9.760s
sys 0m0.010s

Notice that ~60% of the running time with gprof enabled is simply
overhead from call counting and the like.  That time isn't recorded by
gprof.  That alone accounts for your report about gprof ignoring 2/3 of
the execution time.

Checking to see whether maxFlo is a dynamic executable (since you
claimed earlier that you were statically linking your program):

@nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo 
  linux-vdso.so.1 =>  (0x7fff2977f000)

  libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000)
  libm.so.6 => /lib/libm.so.6 (0x7fb42299d000)
  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000)
  libc.so.6 => /lib/libc.so.6 (0x7fb422417000)
  /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000)

So calls to shared library functions (such as functions in libm) will
not be caught by gprof.  Those calls count account for a significant
amount of running time of your program and gprof can't tell you about
them.

Inspecting the gmon.out file:

@nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 16.09  0.37 0.3727649 0.00 0.00  shortPath::findPath()

 12.61  0.66 0.29 55889952 0.00 0.00  graph::next(int,int) const
 11.96  0.94 0.28 61391904 0.00 0.00  graph::mate(int,int) const
 10.87  1.19 0.25 58654752 0.00 0.00  flograph::res(int,int) 
const
 10.44  1.43 0.24 _fini
  6.96  1.59 0.16 65055289 0.00 0.00  graph::term(int) const
  6.96  1.75 0.16 61391904 0.00 0.00  digraph::tail(int) const
[...lots of stuff elided...]
  0.00  2.30 0.001 0.00 0.00  graph

gprof is telling you about 2.3 seconds of your execution time.  With the
factors above accounted for, that doesn't seem unreasonable.

-Nathan