Bug/limitation in 'time'

2013-03-16 Thread Bruce Dawson
Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='$
uname output: Linux Ubuntu1m 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5
18:04:39 UTC 2012 i686 i686 i$
Machine Type: i686-pc-linux-gnu

Bash Version: 4.2
Patch Level: 24
Release Status: release

Description:
The basic problem is that if you time an operation that starts lots
of sub processes - an operation whose execution is dominated by starting
short-lived subprocesses - then time does not accurately record CPU time. In
fact it is off by a lot. It reports the system as being about 75% idle when
the task is entirely CPU bound. I've verified that the task is CPU bound by
using iostat, mpstat, perf, and a few other programs.

My theory is that the process creation and teardown time is 'lost' and not
attributed to the parent or child process. Therefore this is probably a
limitation of the numbers made available by the kernel. And therefore it is
probably best addressed (in the short term anyway) by acknowledging this
limitation in the man page.

I think it's important because when I hit this problem (using $(expr) for
looping in shell scripts is slow) I initially assumed that my task was not
CPU bound, because that is what 'time' told me. This then led me down the
wrong path in my investigation.

I ran bashbug from a VM running Linux but I have reproed this bug on several
'normal' installs of Ubuntu, both 32-bit and 64-bit, on
six-core/twelve-thread machines.

Repeat-By:
   The great thing about this bug/weakness is that it is extremely easy
to reproduce. The shell script below does the trick.

#!/bin/bash
# Warning: this code is excessively slow
function ExprCount() {
i=$1
while [ $i -gt 0 ]; do
i=$(expr $i - 1)
#sleep 0.001
done
echo Just did $1 iterations using expr math
}
time ExprCount 1000

Here are sample results running it - note that I'm using both bash-time and
/usr/bin/time, and there results are quite similar, except for rounding
differences. 

$ /usr/bin/time counttest.sh
Just did 1000 iterations using expr math

real0m0.737s
user   0m0.016s
sys  0m0.124s
0.01user 0.12system 0:00.74elapsed 19%CPU (0avgtext+0avgdata
5376maxresident)k
0inputs+0outputs (0major+322859minor)pagefaults 0swaps

In particular notice that the sum of user+sys time is significantly lower
than 'real' time, despite the fact that this test is 100% CPU bound.

Fix:
The documentation for the 'time' built-in (and the 'time'
stand-alone command for that matter) should be updated to acknowledge that
the CPU time totals will be short, sometimes significantly.





RE: Bug/limitation in 'time'

2013-03-16 Thread Bruce Dawson
Thanks -- good to know that there is a fast and POSIX compliant method of
doing this. I should have included my optimized counting loop -- it's what
we switched to when we realized that $(expr) was a problem. Here it is now:

# This code performs quite well
function BashCount() {
i=$1
while [ $i -gt 0 ]; do
(( i-- ))
done
echo Just did $1 iterations using bash math
}
time BashCount 15

It's a *lot* faster, of course. BTW, I've poked around in the 'time' source
code enough to know that it is just displaying the results of wait3(), so
the misleading CPU consumption information is ultimately a wait3()/kernel
issue. However showing this in the documentation would be great.

-Original Message-
From: Chris F.A. Johnson [mailto:ch...@cfajohnson.com] 
Sent: Saturday, March 16, 2013 7:16 PM
To: bug-bash@gnu.org
Cc: Chris Down; Bruce Dawson
Subject: Re: Bug/limitation in 'time'

On Sun, 17 Mar 2013, Chris Down wrote:

> Hi Bruce,
>
> On 2013-03-16 17:41, Bruce Dawson wrote:
>> I think it's important because when I hit this problem (using $(expr) 
>> for looping in shell scripts is slow) I initially assumed that my 
>> task was not CPU bound, because that is what 'time' told me. This 
>> then led me down the wrong path in my investigation.
>
> No comment on the issue itself, but this is just not a good way of 
> writing bash arithmetic loops:
>
>> #!/bin/bash
>> # Warning: this code is excessively slow function ExprCount() {
>> i=$1
>> while [ $i -gt 0 ]; do
>> i=$(expr $i - 1)
>> #sleep 0.001
>> done
>> echo Just did $1 iterations using expr math } time 
>> ExprCount 1000
>
> You're forking 1000 subshells for `expr' when you can quite easily do 
> it on your current shell. A better way would be to write it like this:
>
>ExprCount() {
>for (( i = $1 ; i > 0 ; i-- )); do
>:
>done
>echo "$1 iterations"
>}

Or, in a POSIX-compliant manner:

ExprCount() {
   i=$1
   while [ $(( i -= 1 )) -ge 0 ]; do
 :
   done
   echo Just did $1 iterations using expr math }

-- 
Chris F.A. Johnson, <http://cfajohnson.com/>
Author:
Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)




RE: Bug/limitation in 'time'

2013-03-16 Thread Bruce Dawson
Yep, we've changed our loops to use roughly that syntax. Unfortunately a lot
of online resources recommend the $(expr) technique. My understanding is
that using $(expr) is more portable, because i-- is bash specific, but I
don't really know.

-Original Message-
From: Chris Down [mailto:ch...@chrisdown.name] 
Sent: Saturday, March 16, 2013 6:58 PM
To: Bruce Dawson
Cc: bug-bash@gnu.org; b...@packages.debian.org
Subject: Re: Bug/limitation in 'time'

Hi Bruce,

On 2013-03-16 17:41, Bruce Dawson wrote:
> I think it's important because when I hit this problem (using $(expr) 
> for looping in shell scripts is slow) I initially assumed that my task 
> was not CPU bound, because that is what 'time' told me. This then led 
> me down the wrong path in my investigation.

No comment on the issue itself, but this is just not a good way of writing
bash arithmetic loops:

> #!/bin/bash
> # Warning: this code is excessively slow function ExprCount() {
> i=$1
> while [ $i -gt 0 ]; do
> i=$(expr $i - 1)
> #sleep 0.001
> done
> echo Just did $1 iterations using expr math } time 
> ExprCount 1000

You're forking 1000 subshells for `expr' when you can quite easily do it on
your current shell. A better way would be to write it like this:

ExprCount() {
for (( i = $1 ; i > 0 ; i-- )); do
:
done
echo "$1 iterations"
}

Best,

Chris




RE: Bug/limitation in 'time'

2013-03-17 Thread Bruce Dawson
The man page is clear that it is displaying the results of wait3(). However
it doesn't mention that this means that sub-process startup time is not
accounted for. That's what I feel should be clarified. Otherwise a CPU bound
task may appear to not be CPU bound.

My expectation is that the sum of 'user' and 'sys' time should equal the
elapsed time because the overall task is 100% CPU bound (I've confirmed
this). It is unfortunate that the sub-process startup time is not accounted
for in 'user' or 'sys' time, and I think it would be appropriate to document
this.

Anyway, if the docs can be updated I think that would be great. If not, no
worries.

-Original Message-
From: Pierre Gaston [mailto:pierre.gas...@gmail.com] 
Sent: Saturday, March 16, 2013 11:31 PM
To: Bruce Dawson
Cc: bug-bash@gnu.org
Subject: Re: Bug/limitation in 'time'

On Sun, Mar 17, 2013 at 4:33 AM, Bruce Dawson
 wrote:
> Thanks -- good to know that there is a fast and POSIX compliant method 
> of doing this. I should have included my optimized counting loop -- 
> it's what we switched to when we realized that $(expr) was a problem. Here
it is now:
>
> # This code performs quite well
> function BashCount() {
> i=$1
> while [ $i -gt 0 ]; do
> (( i-- ))
> done
> echo Just did $1 iterations using bash math } time BashCount 
> 15
>
> It's a *lot* faster, of course. BTW, I've poked around in the 'time' 
> source code enough to know that it is just displaying the results of 
> wait3(), so the misleading CPU consumption information is ultimately a 
> wait3()/kernel issue. However showing this in the documentation would be
great.

At least the man page of time on my ubuntu system is pretty much clear about
what it does.

The result is not striking me as impossible though, I can imagine a lot of
real time spent waiting for the scheduler to run expr and then to run bash
again.

I tried a little experiment that I think shows the importance of the
scheduler on the real time result:
I run at the same time this little loop with different "niceness"
i=0;time while ((i++<1));do /bin/echo -n;done

sudo nice -n 19 bash -c 'i=0;time while ((i++<1));do /bin/echo -n;done'
2>&1| sed s/^/19:\ / & sudo nice -n -20 bash -c 'i=0;time while
((i++<1));do /bin/echo -n;done' 2>&1| sed s/^/-20:\ /

I get:
-20: real   0m9.331s
-20: user   0m0.468s
-20: sys0m1.504s

19: real0m14.004s
19: user0m0.532s
19: sys 0m1.660s

so the nicer loop takes twice as much real time indicating that much real
time is spent waiting for the process to run.




RE: Bug/limitation in 'time'

2013-03-17 Thread Bruce Dawson
Chris Down pointed that out. My loop now looks like this -- portable (I
believe) and fast:

BashCount() {
for (( i = $1 ; i > 0 ; i-- )); do
:
done
echo Just did $1 iterations using bash math
}

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Sunday, March 17, 2013 8:36 PM
To: Bruce Dawson
Cc: 'Chris Down'; bug-bash@gnu.org; b...@packages.debian.org
Subject: Re: Bug/limitation in 'time'



Bruce Dawson wrote:
> Yep, we've changed our loops to use roughly that syntax. Unfortunately 
> a lot of online resources recommend the $(expr) technique. My 
> understanding is that using $(expr) is more portable, because i-- is 
> bash specific, but I don't really know.

but $((i-=1)) isn't Bash specific and does the same thing...




RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-18 Thread Bruce Dawson
I'll give those a try.

BTW, I just posted the blog post to share what I'd found. You can see it
here:

http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/

I hope it's accurate, and I do think it would be worth mentioning the issue
in the documentation for 'time' and the bash 'time' internal command.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Monday, March 18, 2013 7:15 PM
To: Bruce Dawson
Cc: bug-bash@gnu.org; b...@packages.debian.org
Subject: Re: Bug/limitation in 'time' (kernel setings?)...



Bruce Dawson wrote:
> Configuration Information [Automatically generated, do not change]:
> Machine: i686
> OS: linux-gnu
> Compiler: gcc
> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
> -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='$ uname output: Linux 
> Ubuntu1m 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5
> 18:04:39 UTC 2012 i686 i686 i$
> Machine Type: i686-pc-linux-gnu
> 
> Bash Version: 4.2
> Patch Level: 24
> Release Status: release
> 
> Description:
> The basic problem is that if you time an operation that starts 
> lots of sub processes - an operation whose execution is dominated by 
> starting short-lived subprocesses - then time does not accurately 
> record CPU time. In fact it is off by a lot. It reports the system as 
> being about 75% idle when the task is entirely CPU bound. I've 
> verified that the task is CPU bound by using iostat, mpstat, perf, and a
few other programs.

BTW,

a few things you didn't mention, your kernel version,

What scheduler your kernel is using...

Also the settings of TICK_CPUACCOUNTING vs. IRQ_TIME_ACCOUNTING from
your
'/prog/config.gz' might be interesting...

Those have to do  with cpu accounting and the default is 'simple
tick based cputime accounting',
Vs. a non-default of 'fine granularity task level IRQ accounting.

The latter might give you more accurate timing info for your tests,
but really have no clue .. just something to try, I suppose.

Also the setting of your HZ/clock value...
HZ_100/HZ_250/HZ_1000 (1/X for /second tick granularity).

Do you have 'NO_HZ' set, HIGH_RES_TIMERS?

Also important -- how your kernel is compiled:
PREEMPT_NONE, PREEMPT_VOLUNTARY, or PREEMPT

I'm not sure, how the RCU settings figure in but there's settings
there that sound like it might affect how fast you get rescheduled like
'Accelerate last non-dyntick-idle CPU grace period (RCU_FAST_NO_HZ), and
RCU_BOOST.

None of that, of course is BASH related, but it might very well
affect your issue, if you wanted to try different kernel options.  No
guarantee that any would affect it either ... but thought  I'd mention some
settings I thought that might affect it.  EVen the power level settings of
your machine -- power savings, vs. on-demand, vs. maxperformance...

It sounds like the ACCOUNTING thing might be most pertinent, but
thought I'd throw in the others just to be thorough...







RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-19 Thread Bruce Dawson
Thanks Pierre.

The profile results, especially the zoom profiler screen shot, show that
virtually all of the CPU time being consumed is from bash and its child
processes. The system is otherwise idle with no other processes running to
any significant degree. My system is ~99.5% idle when I'm not running the
test, and ~91.5% idle when the test is running.

Even when I do run other processes simultaneously they don't slow down the
bash/expr task because with twelve hardware threads there are lots to go
around.

-Original Message-
From: Pierre Gaston [mailto:pierre.gas...@gmail.com] 
Sent: Tuesday, March 19, 2013 2:00 AM
To: Bruce Dawson
Cc: bug-bash@gnu.org; b...@packages.debian.org
Subject: Re: Bug/limitation in 'time' (kernel setings?)...

On Tue, Mar 19, 2013 at 5:03 AM, Bruce Dawson
 wrote:
> I'll give those a try.
>
> BTW, I just posted the blog post to share what I'd found. You can see 
> it
> here:
>
> http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/
>
> I hope it's accurate, and I do think it would be worth mentioning the 
> issue in the documentation for 'time' and the bash 'time' internal
command.
>
For what it's worth, I still thinks that time is not lying (though the man
page warns about possible inaccuracies), Your loop with expr might be "cpu
bound" but it does not run often because other processes are given a chance
to run.




RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-19 Thread Bruce Dawson
Hmmm -- again, that doesn't match what I see. bash is suspended while it
waits for expr to run, and the instant that expr completes bash runs again.
This can be seen in the profiler screenshot. The CPU is busy at all times,
and always busy on exactly one process.

The scheduler is invoked whenever one process readies another so the time
slice length doesn't matter in this scenario -- bash resumes execution the
instant that expr completes. If the scheduler didn't do this -- instantly
wake up a process when its wait is satisfied -- then vast amounts of time
would be wasted.

The time slice length is, I believe, only relevant for preempting processes.
On a heavily loaded system processes are time sliced between CPUs at a
resolution dictated by the time slice length, but since my system is 90+%
idle this isn't relevant.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Tuesday, March 19, 2013 8:19 AM
To: Bruce Dawson
Cc: 'Pierre Gaston'; bug-bash@gnu.org; b...@packages.debian.org
Subject: Re: Bug/limitation in 'time' (kernel setings?)...

I think you misunderstand...what Pierre is saying -- your process is
scheduled out to give others a *chance* to run and the scheduler isn't
called often enough to roll you back in immediately when it finds out that
no one else needs their time ticks...

That's why I mentioned all of the scheduling info in the kernel.

Another thing to play with is length of time slices, but I have a feeling as
soon as you hit the eval you are yielding the cpu to the scheduler, so while
you may be *waiting* 100% on the cpu, much of that is spent waiting for cpu,
not using cpu.



Bruce Dawson wrote:
> Thanks Pierre.
> 
> The profile results, especially the zoom profiler screen shot, show 
> that virtually all of the CPU time being consumed is from bash and its 
> child processes. The system is otherwise idle with no other processes 
> running to any significant degree. My system is ~99.5% idle when I'm 
> not running the test, and ~91.5% idle when the test is running.

> -Original Message-
> From: Pierre Gaston [mailto:pierre.gas...@gmail.com] For what it's 
> worth, I still thinks that time is not lying (though the man page 
> warns about possible inaccuracies), Your loop with expr might be "cpu 
> bound" but it does not run often because other processes are given a 
> chance to run.
> 
> 




RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-19 Thread Bruce Dawson
How many CPUs do you have? 20% usage doesn't mean anything without knowing
that. On a five-core system 20% usage would imply one full core.

> It's your kernel settings that are causing issue.

Hmmm? My kernel is correctly ensuring that there is no idle time. It is
switching instantly between bash and expr. If your kernel is not then you
should get that fixed -- going idle is not efficient.

I'm not happy with my kernel for moving the processes around to so many
different cores, but that is by-design, and normally works well. 

I'm also told that 'top' is problematic for determining CPU usage. I'm not
sure what its limitations are.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Tuesday, March 19, 2013 9:10 AM
To: Bruce Dawson; bug-bash
Subject: Re: Bug/limitation in 'time' (kernel setings?)...



Bruce Dawson wrote:
> Hmmm -- again, that doesn't match what I see. bash is suspended while 
> it waits for expr to run, and the instant that expr completes bash runs
again.
> This can be seen in the profiler screenshot. The CPU is busy at all 
> times, and always busy on exactly one process.
---
I don't see that on my system.

I ran
exprcount() {
 for (( i=$1;i>0;i=$(expr $i-1) )); do :; done  } time exprcount 1000
2.05sec 0.12usr 1.33sys (71.12% cpu) Ishtar:law/bin> time exprcount 1
20.43sec 1.62usr 12.98sys (71.48% cpu)
---
But 'top' never shows over 20% usage , vs.

 function f { i=1000;while ((i--));do :; done ;} time  f 31.73sec
29.25usr 2.41sys (99.76% cpu)

and top shows 100% cpu usage for 'f'

i.e. My system seems to include scheduling time (or a large part of it) as
part of the 'time' figures.

Note..  you said your loop took 2 seconds on your 3.2GHz sys.

My system was idling at 1.6GHz nearly the whole test.


It's your kernel settings that are causing issue.




RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-19 Thread Bruce Dawson
ngle task.

sched_latency_ns on my system is 2400, just as on yours.

$uname -a
Linux brucedglados2 3.2.0-38-generic-pae #60-Ubuntu SMP Wed Feb 13 13:47:26
UTC 2013 i686 i686 i386 GNU/Linux

I tried grabbing /proc/config.gz but that file doesn't exist.

Blog post is here:
http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/

> If that latency of 24 is how long before the scheduler wakes up and
switches tasks

No. Definitely not. The 24 ms scheduler latency is for rescheduling of long
running tasks. It does not apply to the case where a process is waiting on
some event, such as a child process dying. I am 100% positive about this, on
Linux and on Windows. Both operating systems can, if conditions demand it,
schedule threads/processes hundreds of times in 24 ms. I look at process
scheduling data on Windows all the time and if Linux was that much less
efficient then Steve Ballmer would be laughing maniacally and Linux wouldn't
be used on servers.

> Note..  you said your loop took 2 seconds on your 3.2GHz sys.

Yes, but that was then my system was running at 1.1 GHz. When my system
ramps up to its full speed then the 1,000 iterations are done in less than a
second. Our systems are quite consistent there.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Tuesday, March 19, 2013 3:41 PM
To: Bruce Dawson
Cc: 'bug-bash'
Subject: Re: Bug/limitation in 'time' (kernel setings?)...



Linda Walsh wrote:
> 
> Bruce Dawson wrote:
>> How many CPUs do you have? 20% usage doesn't mean anything without 
>> knowing that. On a five-core system 20% usage would imply one full core.
---
Another tool to check out if you have it is 'qps'... it can show the time
slice/schedule interval ... I can see it scheduling 24ms/ intervals for bash
when it runs a separate proc/loop.  It sounds a bit sketchy to me why it
would take 76ms for it to be scheduled for another run, but ... that's about
what it measures out it in real time.

The fact that your kernel doesn't show you that ... I don't know what to
say, other than it's how your kernel is configured.

Now looking in my /proc/sys/kernel

I see sched_latency_ns = 2400 (24 ms).

I also see the scheduler doesn't wakup all the time:
but about every 4 ms.  (I inserted the commas):
proc/sys/kernel# for i in sched*;do
> echo $i: $(<$i)
> done
sched_autogroup_enabled: 1
sched_child_runs_first: 0
sched_domain:
sched_latency_ns: 24,000,000
sched_migration_cost_ns: 500,000
sched_min_granularity_ns: 3,000,000
sched_nr_migrate: 32
sched_rt_period_us: 1,000,000
sched_rt_runtime_us: 950,000
sched_shares_window_ns: 10,000,000
sched_time_avg_ms: 1000
sched_tunable_scaling: 1
sched_wakeup_granularity_ns: 4,000,000

So if the child runs first = 0, the parent keeps running at first, If that
latency of 24 is how long before the scheduler wakes up and switches tasks,
that 24 ms wasted, then same for child.  That's about a 33% duty cycle there
-- which can likely be tweaked by tweaking the above numbers...




RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-20 Thread Bruce Dawson
Note that the ~2ms/process create is when the power management fails to get
invoked. It drops to ~0.6ms/process create when the CPU ramps up to full
speed. The chart you list from twelve years ago shows 6.0 ms to spawn a new
process, so my machine is about ten times faster -- seems about right.
Cloning a process is not comparable because in my test the kernel has to
load and exec() expr, not just fork() bash.

> Right there, it shows you that the scheduler is taking almost 33% of your
time.

I don't follow your logic. How do you know that it is the scheduler? I can
see that when you run your schedtool script it shows 15% to 34.4% of CPU
time consumed. I say that the rest is process startup, you say scheduler.
I'm not sure what evidence points to the scheduler, and I'm not sure what
math you used to come up with the 33% figure. You'll have to elucidate.

> Then your kernel is misconfigured for your workload.

What? My kernel is executing the tasks fine. When my CPU runs at full speed
it is slightly faster than yours, and when it runs at 1/3 speed it is slower
than yours. In both cases 'time' reports that the CPU is not pegged when it
is. Our kernels are behaving quite similarly.

> What documentation would you have updated?

The bash documentation where it mentions the 'time' command, and the 'time'
documentation.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Wednesday, March 20, 2013 1:12 AM
To: Bruce Dawson
Cc: bug-bash
Subject: Re: Bug/limitation in 'time' (kernel setings?)...



Bruce Dawson wrote:
> The reply below is way too long, but oh well.
> 
> The summary is:
> I am 100% certain that my ExprCount() loop is CPU bound.

Your stats claim 530 iterations/second when spawning expr.

That's ~2ms/process create.

In an article from 14 years ago on a 200Mhz Pentium, Linux process creates
were ~1ms.

http://www.linuxjournal.com/article/2941
(table @
http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/029
/2941/2941f1.jpg)

If you believe that a modern processor running at 1.6GHz, takes twice as
long as a 14 year old pentium, then there is little likelihood that you will
be convinced by any logical argument.



You need to rebuild your kernel with the config option turned on -- in fact,
you need to build your own kernel if you want to do any sort of kernel
benchmarks.


> You seem convinced that the task is not CPU bound on my machine, but 
> I'm convinced that it is.
---
Then your kernel is misconfigured for your workload.

You said:
> I am 100% certain that my ExprCount() loop is CPU bound. It is 
> unfortunate that time gives inaccurate information about this. It 
> would be nice if the documentation acknowledged that in order to save 
> future generations from confusion.

What documentation would you have updated?


Here is a perl program:
#!/usr/bin/perl
die "need count" unless @ARGV;
my $count=$ARGV[0];
my $counter=$count;
while ($counter-->0) {
  my $pid;
  if (my $pid=fork() ) {
waitpid($pid,0);
  }else {
exec "/bin/true"; # use 'exit(0)' to just time forks w/no image load
  }
}
-
> /usr/bin/time /tmp/sp.pl 1
 time schedtool -a 3 -e /tmp/sp.pl 1 6.83sec 0.05usr 0.97sys (15.05%
cpu)
---
But if I don't confine it to 1 cpu:
> time /tmp/sp.pl 1  
10.54sec 0.20usr 2.59sys (26.52% cpu)

Just forks:
> time /tmp/sp.pl 1
7.23sec 0.12usr 2.36sys (34.35% cpu)
> time /tmp/sp.pl 1
7.21sec 0.12usr 2.35sys (34.39% cpu)
---
1cpu forks:

> time schedtool -a 3 -e /tmp/sp.pl 1   
4.32sec 0.06usr 0.95sys (23.69% cpu)
Ishtar:law> time schedtool -a 3 -e /tmp/sp.pl 1 4.29sec 0.05usr 0.96sys
(23.80% cpu)

-

Right there, it shows you that the scheduler is taking almost 33% of your
time.

That may be cpu, but it is not cpu that your program is using.
That's why I say your accounting is off.







>From these figures, I would say it's closer to 1000/second





RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-20 Thread Bruce Dawson
I looked at the CPU sampling data to try to confirm what is going on. The
profile data (captured by zoom, using perf) shows ~90% of the samples
hitting in various invocations of expr. The tool won't summarize call stacks
across processes so I manually looked at a bunch of them. I looked at
samples in 15 invocations of expr. Each invocation runs for less than a ms
and the sampling frequency is one KHz so no individual process got more than
one sample. The samples I saw were: 

- page_fault in module vmlinux
- page_fault-> flush_tlb_page in module vmlinux
- do_lookup_x in ld-2.15.so
- perf_event__output_id_sample in vmlinux
- page_fault-> do_page_fault in vmlinux
- do_lookup_x in ld-2.15.so
- stub_execve-> load_elf_binary-> exec_mmap in vmlinux
- page_fault-> handle_pte_fault-> copy_page_c in vmlinux
- page_fault-> handle_pte_fault-> page_add_file_rmap in vmlinux
- system_call_fastpath-> sys_mmap-> mmap_region in vmlinux
- page_fault-> do_page_fault-> handle_mm_fault in vmlinux
-  in unknown module
- page_fault-> handle_pte_fault-> page_add_file_rmap in vmlinux
-  in unknown module
- page_fault-> do_page_fault-> up_read in vmlinux

To summarize, the zoom profiler says that ~90% of CPU time when running
ExprCount() is spent in expr. Of that time it says that more than half of it
is spent in page_fault or mmap. That sounds consistent with the claim that
it is process startup that is consuming the missing CPU time that 'time'
fails to account for.

I see no indications in any of the data of a significant amount of CPU time
being spent in the OS scheduler. Neither the context switch data nor the
sampling data supports that conclusion.

I could be wrong of course but I think my conclusion makes sense, and it is
supported by significant evidence.

FWIW.

-Original Message-
From: Bruce Dawson [mailto:brucedaw...@cygnus-software.com] 
Sent: Wednesday, March 20, 2013 7:44 AM
To: 'Linda Walsh'
Cc: 'bug-bash'
Subject: RE: Bug/limitation in 'time' (kernel setings?)...

Note that the ~2ms/process create is when the power management fails to get
invoked. It drops to ~0.6ms/process create when the CPU ramps up to full
speed. The chart you list from twelve years ago shows 6.0 ms to spawn a new
process, so my machine is about ten times faster -- seems about right.
Cloning a process is not comparable because in my test the kernel has to
load and exec() expr, not just fork() bash.

> Right there, it shows you that the scheduler is taking almost 33% of your
time.

I don't follow your logic. How do you know that it is the scheduler? I can
see that when you run your schedtool script it shows 15% to 34.4% of CPU
time consumed. I say that the rest is process startup, you say scheduler.
I'm not sure what evidence points to the scheduler, and I'm not sure what
math you used to come up with the 33% figure. You'll have to elucidate.

> Then your kernel is misconfigured for your workload.

What? My kernel is executing the tasks fine. When my CPU runs at full speed
it is slightly faster than yours, and when it runs at 1/3 speed it is slower
than yours. In both cases 'time' reports that the CPU is not pegged when it
is. Our kernels are behaving quite similarly.

> What documentation would you have updated?

The bash documentation where it mentions the 'time' command, and the 'time'
documentation.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Wednesday, March 20, 2013 1:12 AM
To: Bruce Dawson
Cc: bug-bash
Subject: Re: Bug/limitation in 'time' (kernel setings?)...



Bruce Dawson wrote:
> The reply below is way too long, but oh well.
> 
> The summary is:
> I am 100% certain that my ExprCount() loop is CPU bound.

Your stats claim 530 iterations/second when spawning expr.

That's ~2ms/process create.

In an article from 14 years ago on a 200Mhz Pentium, Linux process creates
were ~1ms.

http://www.linuxjournal.com/article/2941
(table @
http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/029
/2941/2941f1.jpg)

If you believe that a modern processor running at 1.6GHz, takes twice as
long as a 14 year old pentium, then there is little likelihood that you will
be convinced by any logical argument.



You need to rebuild your kernel with the config option turned on -- in fact,
you need to build your own kernel if you want to do any sort of kernel
benchmarks.


> You seem convinced that the task is not CPU bound on my machine, but 
> I'm convinced that it is.
---
Then your kernel is misconfigured for your workload.

You said:
> I am 100% certain that my ExprCount() loop is CPU bound. It is 
> unfortunate that time gives inaccurate information about this. It 
> would be nice if the documentation acknowledged that in order to save 
> future generations from confusion.
-

RE: Bug/limitation in 'time' (kernel setings?)...

2013-03-20 Thread Bruce Dawson
Hey, fascinating stuff here. I'm learning a lot of useful new tools and
techniques. I like your perl scripts.

I tend to avoid perl because I don't know it well and I'm not a fan.
Personal preference. Good point about using my 'nop' program. It might
simplify testing slightly, but shouldn't change the results.

> 2) Better, why not use the simple 'fork'/exit' which doesn't call a
separate program;

Remember that this investigation started with a real problem, and that real
problem was associated with using $(expr) for counting. fork/exit is a
totally different scenario.

When I run your first perl program -- I call it serialperl.pl because it is
a serial process where, roughly speaking, only one process is executing at a
time, it can do about 5,000 process creates per second. 'time' shows 21% CPU
usage on my four-core eight-thread system. However "iostat -c 1" shows about
13.6% CPU utilization, which corresponds to a bit more than one core. So, as
I have always claimed, 'time' is underestimating the CPU time required to
execute my task. The sampling data I provided (separate e-mail) shows that
this is mostly from page faults, which are associated with process startup.

Your second perl program -- I call it parallel perl because it runs a lot of
processes in parallel -- is trickier to measure. It runs really fast, but if
I pass too big a number to it in order to get it to run long enough for a
useful test then it kills my box. Apparently having 25,000 processes running
in parallel is too much for Ubuntu to handle, even with 32 GB of RAM. Pity.

However, I did get enough data to confirm my theory about why it runs so
much faster than serialper.pl. It's not because scheduler overhead is lower
(in fact, having 20,000 processes running in parallel is a huge burden to
the scheduler). It's because it is running in parallel. When I run
"parallelperl.pl 2" then I see CPU usage (measured by "iostat -c 1"
running in another window) spike to ~80%. It would probably go higher if I
could get a long enough running test without DOSing my machine.

So, the reason parallelperl.pl runs faster is because it is running in
parallel.

> Note that in executing 'no program' (just doing a fork), but with the task
switching (scheduler activity), we get, at best ~ 35% cpu usage:

To be clear, we get 35% CPU usage reported by 'time'. That is entirely
different from actual total CPU usage. I think we agree about that, we're
just debating where that extra time is going. Clarity is critical in these
tricky discussions. In fact, I think it is crucial to report both what
'time' is saying the process and its children are doing, and what "iostat -c
1" says the overall system is doing during the run. Otherwise we are trying
to infer complex behavior with one eye closed. The 'time' results are only
of interest if we compare them to "iostat -c 1" or some similar system-wide
measurement.

> The difference between the fork {wait/exit} and the fork(all) wait(all);
is that the scheduler is not invoked each time through the loop.

I don't think that is a correct analysis and it is not justified by the
evidence. I think the change in performance is because the forked copies are
able to run in parallel. "iostat -c 1" strongly supports this conclusion.
Even if I'm wrong you have hardly shown that the difference is the
scheduler. To do that you would have to have sampling profiler data that
showed a majority of samples hitting in the scheduler. I've done that test
and it shows the majority of samples hitting in page faults, associated with
process startup.

I've provided significant evidence that process startup is the culprit:
- CPU scheduling graphs (pretty pictures) showing expr consuming the extra
time
- sampling data showing page faults in expr consuming the extra time

I have seen no evidence for the scheduler being the issue. If the scheduler
was the issue then that would suggest a hugely inefficient scheduler, which
is a bold claim to make.

Either way, the consistent and pervasive conclusion is that time's CPU
consumption results are significantly understated when multiple processes
are involved. It would be nice to have that acknowledged in the
documentation. If not, well, that's why I have a blog.

-Original Message-
From: Linda Walsh [mailto:b...@tlinx.org] 
Sent: Wednesday, March 20, 2013 11:38 AM
To: Bruce Dawson
Cc: 'bug-bash'
Subject: Re: Bug/limitation in 'time' (kernel setings?)...

1) Why not use your 'null' program, which in your blog, you state makes
little difference.
2) Better, why not use the simple 'fork'/exit' which doesn't call a separate
program;  (that's
   one reason for switching to perl or better, would be C; Note