On Sun, 16 Mar 2008, Dag Wieers wrote:

On Fri, 14 Mar 2008, Sam Morris wrote:
On Fri, 2008-03-14 at 05:29 +0100, Dag Wieers wrote:
On Tue, 26 Feb 2008, Sam Morris wrote:

I've only seen the below happen once out of having run dstat dozens of
times, so it's not that high a priority. :)

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 8   0  92   0   0   0|   0     0 |  28k 1452B|   0     0 |  58   257
 7   2  91   0   0   0|   0   172k|  19k  926B|   0     0 |  67   238
 7   0  93   0   0   0|   0     0 |  10k  506B|   0     0 |  16   182
11   2  87   0   0   0|   0    44k|  23k 1212B|   0     0 |  62   362
 8   0  92   0   0   0|   0     0 |  23k 1229B|   0     0 |  45   213
 7   1  91   0   1   0|   0     0 |  10k  530B|   0     0 |  40   203
15   2  79   2   2   0|   0    68k|   0   120B|   0     0 |  23   168
12   1  86   0   1   0|   0  8192B|  23k 1327B|   0     0 |  76   317
 8   1  91   0   0   0|   0     0 |  20k 1235B|   0     0 |  37   238
Traceback (most recent call last):
 File "/usr/bin/dstat", line 1969, in ?
   main()
 File "/usr/bin/dstat", line 1914, in main
   o.extract()
 File "/usr/bin/dstat", line 511, in extract
self.val[name][i] = 100.0 * (self.cn2[name][i] - self.cn1[name][i]) / (sum(self.cn2[name]) - sum(self.cn1[name]))
ZeroDivisionError: float division

It would also be useful to know exactly it what module/plugin this happens
:) So I guess I need to improve the code to show that information as well.

Ok, fixing the "Interrupted systemcall" problem now allows me to reproduce this problem and the problem happens to be in the cpu plugin.

It speaks for itself that for some reason

        sum(self.cn2[name]) == sum(self.cn1[name])

which should not be the case unless for some reason both snapshots happen in the same interval. And that is what happens when I suspend the tty for some time. I don't know (yet) how I can fix this :-/

But at least I can now reproduce it !

Sam, could you please retry the version from subversion ?

We made some changes to the scheduling code which triggered this bug on a stopped terminal. The problem is that on 2 different positions in time it returns exctly the same number of ticks, which normally should not happen. This means that either the cpu did not do anything in between (like when you hibernate or suspend your system) or the 2 positions in time are very close to each other (which is again weird since we have at least 1 second intervals).

I noticed that also on VMware, where time and cpu ticks can be variable (they synchronise time each minute and gradually correct time, which means that when dstat thinks it is 1 second later, in fact it only is a fw milliseconds later, alas the problem...)

I can imagine that a variable cpu speed can impact this problem as well.

But now it is fixed and dstat will show when it is loosing ticks and how many. This is easy to trigger by stopping your terminal (ctrl-s) for about 15 seconds or by suspending your computer with dstat running.

I am interested to hear how it works for you.

BTW Thanks for reporting this bug !
--
--   dag wieers,  [EMAIL PROTECTED],  http://dag.wieers.com/   --
[Any errors in spelling, tact or fact are transmission errors]



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to