Gprof can account for less than 1/3 of execution time?!?!
I have recently encountered a gross inaccuracy in gprof that I can't explain. Yes, I know gprof uses a sampling technique so I should not expect a high level of precision, but the results I am getting clearly reflect a more fundamental issue. The program in question has been compiled with -pg for all source code files. The time command reports 20 seconds of user time (which is consistent with personal observation) but the gprof output accounts for only about 6 seconds of the execution time. I have eliminated all IO from the program, and the results remain consistent. Gprof is sampling the program every 10 ms, so in the observed 20 seconds of execution time, it should be taking 2000 samples, which should be enough to avoid any grows inconsistencies. Any ideas would be appreciated. Jon
Re: Gprof can account for less than 1/3 of execution time?!?!
Maucci, Cyrille wrote: Hello Jon, I'm used to gprof on HPUX and can tell you that on HPUX when we gprof an executable, its only works on all the objects present in the executable but not the shared libs. Actually on HPUX, either you choose to gprof the exe or the libs but not both. When you want both you go to more advanced tools like Caliper. So I don't know which platform you were running on there and if gcc's gprof works as HPUX's gprof, but if there's the same limitation as with HPUX's gprof, maybe this is what you've hit? > This was run on an AMD Opteron running Linux. ++Cyrille PS: how can you claim you have eliminated all I/Os? Not sure what you are asking. I have deleted all input and output statements from the program. I replace the original input by a subroutine that generates the test data internally. In this case, I am running the program just to get the gprof data. If gprof were missing 10% of the execution time, I would shrug and say no big deal. But it's missing 70% of the execution time, which seems to imply something fundamentally wrong with either gprof or the way I am using it. Jon -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jon Turner Sent: Sunday, February 21, 2010 7:27 PM To: gcc@gcc.gnu.org Subject: Gprof can account for less than 1/3 of execution time?!?! I have recently encountered a gross inaccuracy in gprof that I can't explain. Yes, I know gprof uses a sampling technique so I should not expect a high level of precision, but the results I am getting clearly reflect a more fundamental issue. The program in question has been compiled with -pg for all source code files. The time command reports 20 seconds of user time (which is consistent with personal observation) but the gprof output accounts for only about 6 seconds of the execution time. I have eliminated all IO from the program, and the results remain consistent. Gprof is sampling the program every 10 ms, so in the observed 20 seconds of execution time, it should be taking 2000 samples, which should be enough to avoid any grows inconsistencies. Any ideas would be appreciated. Jon
Re: Gprof can account for less than 1/3 of execution time?!?!
Yes, it is statically linked. In any case, there is very little usage of external libraries here. Jon Alan Modra wrote: On Sun, Feb 21, 2010 at 12:27:04PM -0600, Jon Turner wrote: The program in question has been compiled with -pg for all source code files. Linked statically too? If not, the missing time is probably spent in libc.so or other shared libraries.
Re: Gprof can account for less than 1/3 of execution time?!?!
You're not listening. I am using -pg and the program is statically linked. The concern I am raising is not about the function counting, but the reported running times, which are determined by sampling (read the gprof manual, if this is news to you). In this case, the mcount overhead cannot account for the discrepancy, since that would cause gprof to OVER-estimate the run time, while in this case it is UNDER-estimating. It's missing about 70% of the actual running time in the program. It conceivably I am doing something wrong. I hope so, since once I know what it is, I can fix it. But at the moment, it's hard to avoid the suspicion that something about the gprof implementation is deeply flawed. Jon Joern Rennecke wrote: Quoting Michael Matz : Hi, On Sun, 21 Feb 2010, Jon Turner wrote: I have recently encountered a gross inaccuracy in gprof that I can't explain. Yes, I know gprof uses a sampling technique This is incorrect. Code compiled with -pg will call mcount on each function entry. If there are many calls (compared to other computations) the mcount overhead might become fairly large. The mcount overhead actually depends on the machine description, although most ports have standardized on a very runtime profligate scheme.
Re: Gprof can account for less than 1/3 of execution time?!?!
Ok, this is not as simple as I would like to make it, but hopefully it's simple enough. I've placed a tar file at www.arl.wustl.edu/~jst/gprof-tarfile In it, you will find a directory with all the source code needed to observe the problem for yourself. The top level directory contains a linux executable called maxFlo, which you should be able to run on a linux box as is. But if you want/need to compile things yourself, type "make clean" and "make all" in the top level directory and you should get a fresh copy of maxFlo. The basic test, with my results appears below. % time maxFlo 20.356u 0.001s 0:20.38 99.8%0+0k 0+0io 0pf+0w Note, that's 20 seconds of user time. Now for gprof, % maxFlo % gprof maxFlo | more Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 18.87 1.17 1.1727649 0.00 0.00 shortPath::findPath() 18.39 2.31 1.14 58654752 0.00 0.00 flograph::res(int, int) const 13.87 3.17 0.86 61391904 0.00 0.00 graph::mate(int, int) const 12.18 3.93 0.76 55889952 0.00 0.00 graph::next(int, int) const 6.29 4.32 0.39 61391904 0.00 0.00 graph::left(int) const 5.65 4.67 0.35 65055289 0.00 0.00 graph::term(int) const 5.16 4.99 0.32 61391904 0.00 0.00 digraph::tail(int) const 4.03 5.24 0.25 70488952 0.00 0.00 graph::n() const 3.63 5.46 0.23 9232153 0.00 0.00 list::operator&=(int) 2.10 5.59 0.13 9165337 0.00 0.00 list::operator<<=(int) 1.94 5.71 0.12 58654752 0.00 0.00 graph::m() const 1.45 5.80 0.0927649 0.00 0.00 list::makeSpace() 1.29 5.88 0.08 9165337 0.00 0.00 list::operator[](int) const 1.21 5.96 0.08 9165337 0.00 0.00 graph::first(int) const 0.81 6.01 0.05 2737152 0.00 0.00 flograph::addFlow(int, int, int) 0.81 6.06 0.0527648 0.00 0.00 augPath::augment() 0.65 6.10 0.04 2737152 0.00 0.00 min(int, int) 0.48 6.13 0.03 15466777 0.00 0.00 flograph::src() const 0.48 6.16 0.03 fatal(char*) 0.24 6.17 0.02 operator<<(std::ostream&, list const&) 0.16 6.18 0.01 9287496 0.00 0.00 flograph::snk() const 0.16 6.19 0.01 9165338 0.00 0.00 list::empty() const 0.16 6.20 0.01 list::clear() 0.00 6.20 0.0027649 0.00 0.00 list::freeSpace() 0.00 6.20 0.0027649 0.00 0.00 list::list(int) 0.00 6.20 0.0027649 0.00 0.00 list::~list() 0.00 6.20 0.00 1140 0.00 0.00 digraph::join(int, int) 0.00 6.20 0.00 1140 0.00 0.00 flograph::changeCap(int, int) 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _Z7badCasei 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN4list9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN4misc6cflushERSic 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN5graph9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN7augPathC2ER8flographRi 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN7digraph9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN8flograph9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN9shortPathC2ER8flograph Ri 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 badCase(int) 0.00 6.20 0.001 0.00 0.00 graph::makeSpace() 0.00 6.20 0.001 0.00 0.00 graph::graph(int, int) 0.00 6.20 0.001
Re: Gprof can account for less than 1/3 of execution time?!?!
Doh! Thanks, Nathan. I think you put your finger on it. I was well aware of the overhead that gprof can introduce, but did not recognize that this overhead was not being counted by gprof. Jon Nathan Froyd wrote: On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote: In it, you will find a directory with all the source code needed to observe the problem for yourself. The top level directory contains a linux executable called maxFlo, which you should be able to run on a linux box as is. But if you want/need to compile things yourself, type "make clean" and "make all" in the top level directory and you should get a fresh copy of maxFlo. So, compiling maxFlo with no -pg option: @nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo real0m3.465s user0m3.460s sys 0m0.000s Compiling maxFlo with -pg option: @nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo real0m9.780s user0m9.760s sys 0m0.010s Notice that ~60% of the running time with gprof enabled is simply overhead from call counting and the like. That time isn't recorded by gprof. That alone accounts for your report about gprof ignoring 2/3 of the execution time. Checking to see whether maxFlo is a dynamic executable (since you claimed earlier that you were statically linking your program): @nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo linux-vdso.so.1 => (0x7fff2977f000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000) libm.so.6 => /lib/libm.so.6 (0x7fb42299d000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000) libc.so.6 => /lib/libc.so.6 (0x7fb422417000) /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000) So calls to shared library functions (such as functions in libm) will not be caught by gprof. Those calls count account for a significant amount of running time of your program and gprof can't tell you about them. Inspecting the gmon.out file: @nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 16.09 0.37 0.3727649 0.00 0.00 shortPath::findPath() 12.61 0.66 0.29 55889952 0.00 0.00 graph::next(int,int) const 11.96 0.94 0.28 61391904 0.00 0.00 graph::mate(int,int) const 10.87 1.19 0.25 58654752 0.00 0.00 flograph::res(int,int) const 10.44 1.43 0.24 _fini 6.96 1.59 0.16 65055289 0.00 0.00 graph::term(int) const 6.96 1.75 0.16 61391904 0.00 0.00 digraph::tail(int) const [...lots of stuff elided...] 0.00 2.30 0.001 0.00 0.00 graph gprof is telling you about 2.3 seconds of your execution time. With the factors above accounted for, that doesn't seem unreasonable. -Nathan