Bug#467615: dstat: Aborts with ZeroDivisionError

Dag Wieers Sun, 23 Mar 2008 05:48:43 -0700

On Sun, 16 Mar 2008, Dag Wieers wrote:

On Fri, 14 Mar 2008, Sam Morris wrote:

On Fri, 2008-03-14 at 05:29 +0100, Dag Wieers wrote:

On Tue, 26 Feb 2008, Sam Morris wrote:

I've only seen the below happen once out of having run dstat dozens of
times, so it's not that high a priority. :)

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 8   0  92   0   0   0|   0     0 |  28k 1452B|   0     0 |  58   257
 7   2  91   0   0   0|   0   172k|  19k  926B|   0     0 |  67   238
 7   0  93   0   0   0|   0     0 |  10k  506B|   0     0 |  16   182
11   2  87   0   0   0|   0    44k|  23k 1212B|   0     0 |  62   362
 8   0  92   0   0   0|   0     0 |  23k 1229B|   0     0 |  45   213
 7   1  91   0   1   0|   0     0 |  10k  530B|   0     0 |  40   203
15   2  79   2   2   0|   0    68k|   0   120B|   0     0 |  23   168
12   1  86   0   1   0|   0  8192B|  23k 1327B|   0     0 |  76   317
 8   1  91   0   0   0|   0     0 |  20k 1235B|   0     0 |  37   238
Traceback (most recent call last):
 File "/usr/bin/dstat", line 1969, in ?
   main()
 File "/usr/bin/dstat", line 1914, in main
   o.extract()
 File "/usr/bin/dstat", line 511, in extract

self.val[name][i] = 100.0 * (self.cn2[name][i] - self.cn1[name][i]) /(sum(self.cn2[name]) - sum(self.cn1[name]))

ZeroDivisionError: float division


It would also be useful to know exactly it what module/plugin this happens
:) So I guess I need to improve the code to show that information as well.

Ok, fixing the "Interrupted systemcall" problem now allows me to reproducethis problem and the problem happens to be in the cpu plugin.


It speaks for itself that for some reason

        sum(self.cn2[name]) == sum(self.cn1[name])

which should not be the case unless for some reason both snapshots happen inthe same interval. And that is what happens when I suspend the tty for sometime. I don't know (yet) how I can fix this :-/


But at least I can now reproduce it !


Sam, could you please retry the version from subversion ?

We made some changes to the scheduling code which triggered this bug on astopped terminal. The problem is that on 2 different positions in time itreturns exctly the same number of ticks, which normally should not happen.This means that either the cpu did not do anything in between (like whenyou hibernate or suspend your system) or the 2 positions in time are veryclose to each other (which is again weird since we have at least 1 secondintervals).

I noticed that also on VMware, where time and cpu ticks can be variable(they synchronise time each minute and gradually correct time, which meansthat when dstat thinks it is 1 second later, in fact it only is a fwmilliseconds later, alas the problem...)


I can imagine that a variable cpu speed can impact this problem as well.

But now it is fixed and dstat will show when it is loosing ticks and howmany. This is easy to trigger by stopping your terminal (ctrl-s) for about15 seconds or by suspending your computer with dstat running.


I am interested to hear how it works for you.

BTW Thanks for reporting this bug !
--
--   dag wieers,  [EMAIL PROTECTED],  http://dag.wieers.com/   --
[Any errors in spelling, tact or fact are transmission errors]



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#467615: dstat: Aborts with ZeroDivisionError

Reply via email to