Bug/limitation in 'time'
Configuration Information [Automatically generated, do not change]: Machine: i686 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='$ uname output: Linux Ubuntu1m 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5 18:04:39 UTC 2012 i686 i686 i$ Machine Type: i686-pc-linux-gnu Bash Version: 4.2 Patch Level: 24 Release Status: release Description: The basic problem is that if you time an operation that starts lots of sub processes - an operation whose execution is dominated by starting short-lived subprocesses - then time does not accurately record CPU time. In fact it is off by a lot. It reports the system as being about 75% idle when the task is entirely CPU bound. I've verified that the task is CPU bound by using iostat, mpstat, perf, and a few other programs. My theory is that the process creation and teardown time is 'lost' and not attributed to the parent or child process. Therefore this is probably a limitation of the numbers made available by the kernel. And therefore it is probably best addressed (in the short term anyway) by acknowledging this limitation in the man page. I think it's important because when I hit this problem (using $(expr) for looping in shell scripts is slow) I initially assumed that my task was not CPU bound, because that is what 'time' told me. This then led me down the wrong path in my investigation. I ran bashbug from a VM running Linux but I have reproed this bug on several 'normal' installs of Ubuntu, both 32-bit and 64-bit, on six-core/twelve-thread machines. Repeat-By: The great thing about this bug/weakness is that it is extremely easy to reproduce. The shell script below does the trick. #!/bin/bash # Warning: this code is excessively slow function ExprCount() { i=$1 while [ $i -gt 0 ]; do i=$(expr $i - 1) #sleep 0.001 done echo Just did $1 iterations using expr math } time ExprCount 1000 Here are sample results running it - note that I'm using both bash-time and /usr/bin/time, and there results are quite similar, except for rounding differences. $ /usr/bin/time counttest.sh Just did 1000 iterations using expr math real0m0.737s user 0m0.016s sys 0m0.124s 0.01user 0.12system 0:00.74elapsed 19%CPU (0avgtext+0avgdata 5376maxresident)k 0inputs+0outputs (0major+322859minor)pagefaults 0swaps In particular notice that the sum of user+sys time is significantly lower than 'real' time, despite the fact that this test is 100% CPU bound. Fix: The documentation for the 'time' built-in (and the 'time' stand-alone command for that matter) should be updated to acknowledge that the CPU time totals will be short, sometimes significantly.
RE: Bug/limitation in 'time'
Thanks -- good to know that there is a fast and POSIX compliant method of doing this. I should have included my optimized counting loop -- it's what we switched to when we realized that $(expr) was a problem. Here it is now: # This code performs quite well function BashCount() { i=$1 while [ $i -gt 0 ]; do (( i-- )) done echo Just did $1 iterations using bash math } time BashCount 15 It's a *lot* faster, of course. BTW, I've poked around in the 'time' source code enough to know that it is just displaying the results of wait3(), so the misleading CPU consumption information is ultimately a wait3()/kernel issue. However showing this in the documentation would be great. -Original Message- From: Chris F.A. Johnson [mailto:ch...@cfajohnson.com] Sent: Saturday, March 16, 2013 7:16 PM To: bug-bash@gnu.org Cc: Chris Down; Bruce Dawson Subject: Re: Bug/limitation in 'time' On Sun, 17 Mar 2013, Chris Down wrote: > Hi Bruce, > > On 2013-03-16 17:41, Bruce Dawson wrote: >> I think it's important because when I hit this problem (using $(expr) >> for looping in shell scripts is slow) I initially assumed that my >> task was not CPU bound, because that is what 'time' told me. This >> then led me down the wrong path in my investigation. > > No comment on the issue itself, but this is just not a good way of > writing bash arithmetic loops: > >> #!/bin/bash >> # Warning: this code is excessively slow function ExprCount() { >> i=$1 >> while [ $i -gt 0 ]; do >> i=$(expr $i - 1) >> #sleep 0.001 >> done >> echo Just did $1 iterations using expr math } time >> ExprCount 1000 > > You're forking 1000 subshells for `expr' when you can quite easily do > it on your current shell. A better way would be to write it like this: > >ExprCount() { >for (( i = $1 ; i > 0 ; i-- )); do >: >done >echo "$1 iterations" >} Or, in a POSIX-compliant manner: ExprCount() { i=$1 while [ $(( i -= 1 )) -ge 0 ]; do : done echo Just did $1 iterations using expr math } -- Chris F.A. Johnson, <http://cfajohnson.com/> Author: Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress) Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
RE: Bug/limitation in 'time'
Yep, we've changed our loops to use roughly that syntax. Unfortunately a lot of online resources recommend the $(expr) technique. My understanding is that using $(expr) is more portable, because i-- is bash specific, but I don't really know. -Original Message- From: Chris Down [mailto:ch...@chrisdown.name] Sent: Saturday, March 16, 2013 6:58 PM To: Bruce Dawson Cc: bug-bash@gnu.org; b...@packages.debian.org Subject: Re: Bug/limitation in 'time' Hi Bruce, On 2013-03-16 17:41, Bruce Dawson wrote: > I think it's important because when I hit this problem (using $(expr) > for looping in shell scripts is slow) I initially assumed that my task > was not CPU bound, because that is what 'time' told me. This then led > me down the wrong path in my investigation. No comment on the issue itself, but this is just not a good way of writing bash arithmetic loops: > #!/bin/bash > # Warning: this code is excessively slow function ExprCount() { > i=$1 > while [ $i -gt 0 ]; do > i=$(expr $i - 1) > #sleep 0.001 > done > echo Just did $1 iterations using expr math } time > ExprCount 1000 You're forking 1000 subshells for `expr' when you can quite easily do it on your current shell. A better way would be to write it like this: ExprCount() { for (( i = $1 ; i > 0 ; i-- )); do : done echo "$1 iterations" } Best, Chris
RE: Bug/limitation in 'time'
The man page is clear that it is displaying the results of wait3(). However it doesn't mention that this means that sub-process startup time is not accounted for. That's what I feel should be clarified. Otherwise a CPU bound task may appear to not be CPU bound. My expectation is that the sum of 'user' and 'sys' time should equal the elapsed time because the overall task is 100% CPU bound (I've confirmed this). It is unfortunate that the sub-process startup time is not accounted for in 'user' or 'sys' time, and I think it would be appropriate to document this. Anyway, if the docs can be updated I think that would be great. If not, no worries. -Original Message- From: Pierre Gaston [mailto:pierre.gas...@gmail.com] Sent: Saturday, March 16, 2013 11:31 PM To: Bruce Dawson Cc: bug-bash@gnu.org Subject: Re: Bug/limitation in 'time' On Sun, Mar 17, 2013 at 4:33 AM, Bruce Dawson wrote: > Thanks -- good to know that there is a fast and POSIX compliant method > of doing this. I should have included my optimized counting loop -- > it's what we switched to when we realized that $(expr) was a problem. Here it is now: > > # This code performs quite well > function BashCount() { > i=$1 > while [ $i -gt 0 ]; do > (( i-- )) > done > echo Just did $1 iterations using bash math } time BashCount > 15 > > It's a *lot* faster, of course. BTW, I've poked around in the 'time' > source code enough to know that it is just displaying the results of > wait3(), so the misleading CPU consumption information is ultimately a > wait3()/kernel issue. However showing this in the documentation would be great. At least the man page of time on my ubuntu system is pretty much clear about what it does. The result is not striking me as impossible though, I can imagine a lot of real time spent waiting for the scheduler to run expr and then to run bash again. I tried a little experiment that I think shows the importance of the scheduler on the real time result: I run at the same time this little loop with different "niceness" i=0;time while ((i++<1));do /bin/echo -n;done sudo nice -n 19 bash -c 'i=0;time while ((i++<1));do /bin/echo -n;done' 2>&1| sed s/^/19:\ / & sudo nice -n -20 bash -c 'i=0;time while ((i++<1));do /bin/echo -n;done' 2>&1| sed s/^/-20:\ / I get: -20: real 0m9.331s -20: user 0m0.468s -20: sys0m1.504s 19: real0m14.004s 19: user0m0.532s 19: sys 0m1.660s so the nicer loop takes twice as much real time indicating that much real time is spent waiting for the process to run.
RE: Bug/limitation in 'time'
Chris Down pointed that out. My loop now looks like this -- portable (I believe) and fast: BashCount() { for (( i = $1 ; i > 0 ; i-- )); do : done echo Just did $1 iterations using bash math } -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Sunday, March 17, 2013 8:36 PM To: Bruce Dawson Cc: 'Chris Down'; bug-bash@gnu.org; b...@packages.debian.org Subject: Re: Bug/limitation in 'time' Bruce Dawson wrote: > Yep, we've changed our loops to use roughly that syntax. Unfortunately > a lot of online resources recommend the $(expr) technique. My > understanding is that using $(expr) is more portable, because i-- is > bash specific, but I don't really know. but $((i-=1)) isn't Bash specific and does the same thing...
RE: Bug/limitation in 'time' (kernel setings?)...
I'll give those a try. BTW, I just posted the blog post to share what I'd found. You can see it here: http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/ I hope it's accurate, and I do think it would be worth mentioning the issue in the documentation for 'time' and the bash 'time' internal command. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Monday, March 18, 2013 7:15 PM To: Bruce Dawson Cc: bug-bash@gnu.org; b...@packages.debian.org Subject: Re: Bug/limitation in 'time' (kernel setings?)... Bruce Dawson wrote: > Configuration Information [Automatically generated, do not change]: > Machine: i686 > OS: linux-gnu > Compiler: gcc > Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' > -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='$ uname output: Linux > Ubuntu1m 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5 > 18:04:39 UTC 2012 i686 i686 i$ > Machine Type: i686-pc-linux-gnu > > Bash Version: 4.2 > Patch Level: 24 > Release Status: release > > Description: > The basic problem is that if you time an operation that starts > lots of sub processes - an operation whose execution is dominated by > starting short-lived subprocesses - then time does not accurately > record CPU time. In fact it is off by a lot. It reports the system as > being about 75% idle when the task is entirely CPU bound. I've > verified that the task is CPU bound by using iostat, mpstat, perf, and a few other programs. BTW, a few things you didn't mention, your kernel version, What scheduler your kernel is using... Also the settings of TICK_CPUACCOUNTING vs. IRQ_TIME_ACCOUNTING from your '/prog/config.gz' might be interesting... Those have to do with cpu accounting and the default is 'simple tick based cputime accounting', Vs. a non-default of 'fine granularity task level IRQ accounting. The latter might give you more accurate timing info for your tests, but really have no clue .. just something to try, I suppose. Also the setting of your HZ/clock value... HZ_100/HZ_250/HZ_1000 (1/X for /second tick granularity). Do you have 'NO_HZ' set, HIGH_RES_TIMERS? Also important -- how your kernel is compiled: PREEMPT_NONE, PREEMPT_VOLUNTARY, or PREEMPT I'm not sure, how the RCU settings figure in but there's settings there that sound like it might affect how fast you get rescheduled like 'Accelerate last non-dyntick-idle CPU grace period (RCU_FAST_NO_HZ), and RCU_BOOST. None of that, of course is BASH related, but it might very well affect your issue, if you wanted to try different kernel options. No guarantee that any would affect it either ... but thought I'd mention some settings I thought that might affect it. EVen the power level settings of your machine -- power savings, vs. on-demand, vs. maxperformance... It sounds like the ACCOUNTING thing might be most pertinent, but thought I'd throw in the others just to be thorough...
RE: Bug/limitation in 'time' (kernel setings?)...
Thanks Pierre. The profile results, especially the zoom profiler screen shot, show that virtually all of the CPU time being consumed is from bash and its child processes. The system is otherwise idle with no other processes running to any significant degree. My system is ~99.5% idle when I'm not running the test, and ~91.5% idle when the test is running. Even when I do run other processes simultaneously they don't slow down the bash/expr task because with twelve hardware threads there are lots to go around. -Original Message- From: Pierre Gaston [mailto:pierre.gas...@gmail.com] Sent: Tuesday, March 19, 2013 2:00 AM To: Bruce Dawson Cc: bug-bash@gnu.org; b...@packages.debian.org Subject: Re: Bug/limitation in 'time' (kernel setings?)... On Tue, Mar 19, 2013 at 5:03 AM, Bruce Dawson wrote: > I'll give those a try. > > BTW, I just posted the blog post to share what I'd found. You can see > it > here: > > http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/ > > I hope it's accurate, and I do think it would be worth mentioning the > issue in the documentation for 'time' and the bash 'time' internal command. > For what it's worth, I still thinks that time is not lying (though the man page warns about possible inaccuracies), Your loop with expr might be "cpu bound" but it does not run often because other processes are given a chance to run.
RE: Bug/limitation in 'time' (kernel setings?)...
Hmmm -- again, that doesn't match what I see. bash is suspended while it waits for expr to run, and the instant that expr completes bash runs again. This can be seen in the profiler screenshot. The CPU is busy at all times, and always busy on exactly one process. The scheduler is invoked whenever one process readies another so the time slice length doesn't matter in this scenario -- bash resumes execution the instant that expr completes. If the scheduler didn't do this -- instantly wake up a process when its wait is satisfied -- then vast amounts of time would be wasted. The time slice length is, I believe, only relevant for preempting processes. On a heavily loaded system processes are time sliced between CPUs at a resolution dictated by the time slice length, but since my system is 90+% idle this isn't relevant. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Tuesday, March 19, 2013 8:19 AM To: Bruce Dawson Cc: 'Pierre Gaston'; bug-bash@gnu.org; b...@packages.debian.org Subject: Re: Bug/limitation in 'time' (kernel setings?)... I think you misunderstand...what Pierre is saying -- your process is scheduled out to give others a *chance* to run and the scheduler isn't called often enough to roll you back in immediately when it finds out that no one else needs their time ticks... That's why I mentioned all of the scheduling info in the kernel. Another thing to play with is length of time slices, but I have a feeling as soon as you hit the eval you are yielding the cpu to the scheduler, so while you may be *waiting* 100% on the cpu, much of that is spent waiting for cpu, not using cpu. Bruce Dawson wrote: > Thanks Pierre. > > The profile results, especially the zoom profiler screen shot, show > that virtually all of the CPU time being consumed is from bash and its > child processes. The system is otherwise idle with no other processes > running to any significant degree. My system is ~99.5% idle when I'm > not running the test, and ~91.5% idle when the test is running. > -Original Message- > From: Pierre Gaston [mailto:pierre.gas...@gmail.com] For what it's > worth, I still thinks that time is not lying (though the man page > warns about possible inaccuracies), Your loop with expr might be "cpu > bound" but it does not run often because other processes are given a > chance to run. > >
RE: Bug/limitation in 'time' (kernel setings?)...
How many CPUs do you have? 20% usage doesn't mean anything without knowing that. On a five-core system 20% usage would imply one full core. > It's your kernel settings that are causing issue. Hmmm? My kernel is correctly ensuring that there is no idle time. It is switching instantly between bash and expr. If your kernel is not then you should get that fixed -- going idle is not efficient. I'm not happy with my kernel for moving the processes around to so many different cores, but that is by-design, and normally works well. I'm also told that 'top' is problematic for determining CPU usage. I'm not sure what its limitations are. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Tuesday, March 19, 2013 9:10 AM To: Bruce Dawson; bug-bash Subject: Re: Bug/limitation in 'time' (kernel setings?)... Bruce Dawson wrote: > Hmmm -- again, that doesn't match what I see. bash is suspended while > it waits for expr to run, and the instant that expr completes bash runs again. > This can be seen in the profiler screenshot. The CPU is busy at all > times, and always busy on exactly one process. --- I don't see that on my system. I ran exprcount() { for (( i=$1;i>0;i=$(expr $i-1) )); do :; done } time exprcount 1000 2.05sec 0.12usr 1.33sys (71.12% cpu) Ishtar:law/bin> time exprcount 1 20.43sec 1.62usr 12.98sys (71.48% cpu) --- But 'top' never shows over 20% usage , vs. function f { i=1000;while ((i--));do :; done ;} time f 31.73sec 29.25usr 2.41sys (99.76% cpu) and top shows 100% cpu usage for 'f' i.e. My system seems to include scheduling time (or a large part of it) as part of the 'time' figures. Note.. you said your loop took 2 seconds on your 3.2GHz sys. My system was idling at 1.6GHz nearly the whole test. It's your kernel settings that are causing issue.
RE: Bug/limitation in 'time' (kernel setings?)...
ngle task. sched_latency_ns on my system is 2400, just as on yours. $uname -a Linux brucedglados2 3.2.0-38-generic-pae #60-Ubuntu SMP Wed Feb 13 13:47:26 UTC 2013 i686 i686 i386 GNU/Linux I tried grabbing /proc/config.gz but that file doesn't exist. Blog post is here: http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/ > If that latency of 24 is how long before the scheduler wakes up and switches tasks No. Definitely not. The 24 ms scheduler latency is for rescheduling of long running tasks. It does not apply to the case where a process is waiting on some event, such as a child process dying. I am 100% positive about this, on Linux and on Windows. Both operating systems can, if conditions demand it, schedule threads/processes hundreds of times in 24 ms. I look at process scheduling data on Windows all the time and if Linux was that much less efficient then Steve Ballmer would be laughing maniacally and Linux wouldn't be used on servers. > Note.. you said your loop took 2 seconds on your 3.2GHz sys. Yes, but that was then my system was running at 1.1 GHz. When my system ramps up to its full speed then the 1,000 iterations are done in less than a second. Our systems are quite consistent there. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Tuesday, March 19, 2013 3:41 PM To: Bruce Dawson Cc: 'bug-bash' Subject: Re: Bug/limitation in 'time' (kernel setings?)... Linda Walsh wrote: > > Bruce Dawson wrote: >> How many CPUs do you have? 20% usage doesn't mean anything without >> knowing that. On a five-core system 20% usage would imply one full core. --- Another tool to check out if you have it is 'qps'... it can show the time slice/schedule interval ... I can see it scheduling 24ms/ intervals for bash when it runs a separate proc/loop. It sounds a bit sketchy to me why it would take 76ms for it to be scheduled for another run, but ... that's about what it measures out it in real time. The fact that your kernel doesn't show you that ... I don't know what to say, other than it's how your kernel is configured. Now looking in my /proc/sys/kernel I see sched_latency_ns = 2400 (24 ms). I also see the scheduler doesn't wakup all the time: but about every 4 ms. (I inserted the commas): proc/sys/kernel# for i in sched*;do > echo $i: $(<$i) > done sched_autogroup_enabled: 1 sched_child_runs_first: 0 sched_domain: sched_latency_ns: 24,000,000 sched_migration_cost_ns: 500,000 sched_min_granularity_ns: 3,000,000 sched_nr_migrate: 32 sched_rt_period_us: 1,000,000 sched_rt_runtime_us: 950,000 sched_shares_window_ns: 10,000,000 sched_time_avg_ms: 1000 sched_tunable_scaling: 1 sched_wakeup_granularity_ns: 4,000,000 So if the child runs first = 0, the parent keeps running at first, If that latency of 24 is how long before the scheduler wakes up and switches tasks, that 24 ms wasted, then same for child. That's about a 33% duty cycle there -- which can likely be tweaked by tweaking the above numbers...
RE: Bug/limitation in 'time' (kernel setings?)...
Note that the ~2ms/process create is when the power management fails to get invoked. It drops to ~0.6ms/process create when the CPU ramps up to full speed. The chart you list from twelve years ago shows 6.0 ms to spawn a new process, so my machine is about ten times faster -- seems about right. Cloning a process is not comparable because in my test the kernel has to load and exec() expr, not just fork() bash. > Right there, it shows you that the scheduler is taking almost 33% of your time. I don't follow your logic. How do you know that it is the scheduler? I can see that when you run your schedtool script it shows 15% to 34.4% of CPU time consumed. I say that the rest is process startup, you say scheduler. I'm not sure what evidence points to the scheduler, and I'm not sure what math you used to come up with the 33% figure. You'll have to elucidate. > Then your kernel is misconfigured for your workload. What? My kernel is executing the tasks fine. When my CPU runs at full speed it is slightly faster than yours, and when it runs at 1/3 speed it is slower than yours. In both cases 'time' reports that the CPU is not pegged when it is. Our kernels are behaving quite similarly. > What documentation would you have updated? The bash documentation where it mentions the 'time' command, and the 'time' documentation. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Wednesday, March 20, 2013 1:12 AM To: Bruce Dawson Cc: bug-bash Subject: Re: Bug/limitation in 'time' (kernel setings?)... Bruce Dawson wrote: > The reply below is way too long, but oh well. > > The summary is: > I am 100% certain that my ExprCount() loop is CPU bound. Your stats claim 530 iterations/second when spawning expr. That's ~2ms/process create. In an article from 14 years ago on a 200Mhz Pentium, Linux process creates were ~1ms. http://www.linuxjournal.com/article/2941 (table @ http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/029 /2941/2941f1.jpg) If you believe that a modern processor running at 1.6GHz, takes twice as long as a 14 year old pentium, then there is little likelihood that you will be convinced by any logical argument. You need to rebuild your kernel with the config option turned on -- in fact, you need to build your own kernel if you want to do any sort of kernel benchmarks. > You seem convinced that the task is not CPU bound on my machine, but > I'm convinced that it is. --- Then your kernel is misconfigured for your workload. You said: > I am 100% certain that my ExprCount() loop is CPU bound. It is > unfortunate that time gives inaccurate information about this. It > would be nice if the documentation acknowledged that in order to save > future generations from confusion. What documentation would you have updated? Here is a perl program: #!/usr/bin/perl die "need count" unless @ARGV; my $count=$ARGV[0]; my $counter=$count; while ($counter-->0) { my $pid; if (my $pid=fork() ) { waitpid($pid,0); }else { exec "/bin/true"; # use 'exit(0)' to just time forks w/no image load } } - > /usr/bin/time /tmp/sp.pl 1 time schedtool -a 3 -e /tmp/sp.pl 1 6.83sec 0.05usr 0.97sys (15.05% cpu) --- But if I don't confine it to 1 cpu: > time /tmp/sp.pl 1 10.54sec 0.20usr 2.59sys (26.52% cpu) Just forks: > time /tmp/sp.pl 1 7.23sec 0.12usr 2.36sys (34.35% cpu) > time /tmp/sp.pl 1 7.21sec 0.12usr 2.35sys (34.39% cpu) --- 1cpu forks: > time schedtool -a 3 -e /tmp/sp.pl 1 4.32sec 0.06usr 0.95sys (23.69% cpu) Ishtar:law> time schedtool -a 3 -e /tmp/sp.pl 1 4.29sec 0.05usr 0.96sys (23.80% cpu) - Right there, it shows you that the scheduler is taking almost 33% of your time. That may be cpu, but it is not cpu that your program is using. That's why I say your accounting is off. >From these figures, I would say it's closer to 1000/second
RE: Bug/limitation in 'time' (kernel setings?)...
I looked at the CPU sampling data to try to confirm what is going on. The profile data (captured by zoom, using perf) shows ~90% of the samples hitting in various invocations of expr. The tool won't summarize call stacks across processes so I manually looked at a bunch of them. I looked at samples in 15 invocations of expr. Each invocation runs for less than a ms and the sampling frequency is one KHz so no individual process got more than one sample. The samples I saw were: - page_fault in module vmlinux - page_fault-> flush_tlb_page in module vmlinux - do_lookup_x in ld-2.15.so - perf_event__output_id_sample in vmlinux - page_fault-> do_page_fault in vmlinux - do_lookup_x in ld-2.15.so - stub_execve-> load_elf_binary-> exec_mmap in vmlinux - page_fault-> handle_pte_fault-> copy_page_c in vmlinux - page_fault-> handle_pte_fault-> page_add_file_rmap in vmlinux - system_call_fastpath-> sys_mmap-> mmap_region in vmlinux - page_fault-> do_page_fault-> handle_mm_fault in vmlinux - in unknown module - page_fault-> handle_pte_fault-> page_add_file_rmap in vmlinux - in unknown module - page_fault-> do_page_fault-> up_read in vmlinux To summarize, the zoom profiler says that ~90% of CPU time when running ExprCount() is spent in expr. Of that time it says that more than half of it is spent in page_fault or mmap. That sounds consistent with the claim that it is process startup that is consuming the missing CPU time that 'time' fails to account for. I see no indications in any of the data of a significant amount of CPU time being spent in the OS scheduler. Neither the context switch data nor the sampling data supports that conclusion. I could be wrong of course but I think my conclusion makes sense, and it is supported by significant evidence. FWIW. -Original Message- From: Bruce Dawson [mailto:brucedaw...@cygnus-software.com] Sent: Wednesday, March 20, 2013 7:44 AM To: 'Linda Walsh' Cc: 'bug-bash' Subject: RE: Bug/limitation in 'time' (kernel setings?)... Note that the ~2ms/process create is when the power management fails to get invoked. It drops to ~0.6ms/process create when the CPU ramps up to full speed. The chart you list from twelve years ago shows 6.0 ms to spawn a new process, so my machine is about ten times faster -- seems about right. Cloning a process is not comparable because in my test the kernel has to load and exec() expr, not just fork() bash. > Right there, it shows you that the scheduler is taking almost 33% of your time. I don't follow your logic. How do you know that it is the scheduler? I can see that when you run your schedtool script it shows 15% to 34.4% of CPU time consumed. I say that the rest is process startup, you say scheduler. I'm not sure what evidence points to the scheduler, and I'm not sure what math you used to come up with the 33% figure. You'll have to elucidate. > Then your kernel is misconfigured for your workload. What? My kernel is executing the tasks fine. When my CPU runs at full speed it is slightly faster than yours, and when it runs at 1/3 speed it is slower than yours. In both cases 'time' reports that the CPU is not pegged when it is. Our kernels are behaving quite similarly. > What documentation would you have updated? The bash documentation where it mentions the 'time' command, and the 'time' documentation. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Wednesday, March 20, 2013 1:12 AM To: Bruce Dawson Cc: bug-bash Subject: Re: Bug/limitation in 'time' (kernel setings?)... Bruce Dawson wrote: > The reply below is way too long, but oh well. > > The summary is: > I am 100% certain that my ExprCount() loop is CPU bound. Your stats claim 530 iterations/second when spawning expr. That's ~2ms/process create. In an article from 14 years ago on a 200Mhz Pentium, Linux process creates were ~1ms. http://www.linuxjournal.com/article/2941 (table @ http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/029 /2941/2941f1.jpg) If you believe that a modern processor running at 1.6GHz, takes twice as long as a 14 year old pentium, then there is little likelihood that you will be convinced by any logical argument. You need to rebuild your kernel with the config option turned on -- in fact, you need to build your own kernel if you want to do any sort of kernel benchmarks. > You seem convinced that the task is not CPU bound on my machine, but > I'm convinced that it is. --- Then your kernel is misconfigured for your workload. You said: > I am 100% certain that my ExprCount() loop is CPU bound. It is > unfortunate that time gives inaccurate information about this. It > would be nice if the documentation acknowledged that in order to save > future generations from confusion. -
RE: Bug/limitation in 'time' (kernel setings?)...
Hey, fascinating stuff here. I'm learning a lot of useful new tools and techniques. I like your perl scripts. I tend to avoid perl because I don't know it well and I'm not a fan. Personal preference. Good point about using my 'nop' program. It might simplify testing slightly, but shouldn't change the results. > 2) Better, why not use the simple 'fork'/exit' which doesn't call a separate program; Remember that this investigation started with a real problem, and that real problem was associated with using $(expr) for counting. fork/exit is a totally different scenario. When I run your first perl program -- I call it serialperl.pl because it is a serial process where, roughly speaking, only one process is executing at a time, it can do about 5,000 process creates per second. 'time' shows 21% CPU usage on my four-core eight-thread system. However "iostat -c 1" shows about 13.6% CPU utilization, which corresponds to a bit more than one core. So, as I have always claimed, 'time' is underestimating the CPU time required to execute my task. The sampling data I provided (separate e-mail) shows that this is mostly from page faults, which are associated with process startup. Your second perl program -- I call it parallel perl because it runs a lot of processes in parallel -- is trickier to measure. It runs really fast, but if I pass too big a number to it in order to get it to run long enough for a useful test then it kills my box. Apparently having 25,000 processes running in parallel is too much for Ubuntu to handle, even with 32 GB of RAM. Pity. However, I did get enough data to confirm my theory about why it runs so much faster than serialper.pl. It's not because scheduler overhead is lower (in fact, having 20,000 processes running in parallel is a huge burden to the scheduler). It's because it is running in parallel. When I run "parallelperl.pl 2" then I see CPU usage (measured by "iostat -c 1" running in another window) spike to ~80%. It would probably go higher if I could get a long enough running test without DOSing my machine. So, the reason parallelperl.pl runs faster is because it is running in parallel. > Note that in executing 'no program' (just doing a fork), but with the task switching (scheduler activity), we get, at best ~ 35% cpu usage: To be clear, we get 35% CPU usage reported by 'time'. That is entirely different from actual total CPU usage. I think we agree about that, we're just debating where that extra time is going. Clarity is critical in these tricky discussions. In fact, I think it is crucial to report both what 'time' is saying the process and its children are doing, and what "iostat -c 1" says the overall system is doing during the run. Otherwise we are trying to infer complex behavior with one eye closed. The 'time' results are only of interest if we compare them to "iostat -c 1" or some similar system-wide measurement. > The difference between the fork {wait/exit} and the fork(all) wait(all); is that the scheduler is not invoked each time through the loop. I don't think that is a correct analysis and it is not justified by the evidence. I think the change in performance is because the forked copies are able to run in parallel. "iostat -c 1" strongly supports this conclusion. Even if I'm wrong you have hardly shown that the difference is the scheduler. To do that you would have to have sampling profiler data that showed a majority of samples hitting in the scheduler. I've done that test and it shows the majority of samples hitting in page faults, associated with process startup. I've provided significant evidence that process startup is the culprit: - CPU scheduling graphs (pretty pictures) showing expr consuming the extra time - sampling data showing page faults in expr consuming the extra time I have seen no evidence for the scheduler being the issue. If the scheduler was the issue then that would suggest a hugely inefficient scheduler, which is a bold claim to make. Either way, the consistent and pervasive conclusion is that time's CPU consumption results are significantly understated when multiple processes are involved. It would be nice to have that acknowledged in the documentation. If not, well, that's why I have a blog. -Original Message- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Wednesday, March 20, 2013 11:38 AM To: Bruce Dawson Cc: 'bug-bash' Subject: Re: Bug/limitation in 'time' (kernel setings?)... 1) Why not use your 'null' program, which in your blog, you state makes little difference. 2) Better, why not use the simple 'fork'/exit' which doesn't call a separate program; (that's one reason for switching to perl or better, would be C; Note