Bug#305556: dstat: SIGINT sometimes ignored

Marc Lehmann Thu, 21 Apr 2005 07:56:18 -0700

On Thu, Apr 21, 2005 at 02:43:37PM +0200, Dag Wieers <[EMAIL PROTECTED]> wrote:
> On Thu, 21 Apr 2005, Marc Lehmann wrote:
> > Fast system with no activity :) On my dual-P3 1Ghz dstat usually needs
> > around 4 seconds to start up. On my dual 600mhz p3 machine it can take up
> > to 10 seconds.
> 
> Ouch, blame python :) I'm wondering if your system might slow down dstat 
> this badly that the job scheduling can't guarantee a near 1 second 
> interval.


I guess it's more that dstat does 430 open() calls (and reads 82 files),
while, say, vmstat only opens 5. Reading 82 seperate files is always going
to be slow. That's the only reason I still use vmstat sometimes, because,
when a system starts to thrash it will usually not be able to start dstat
in any reasonable timeframe :)

(But that's ok for me, and won't detract me from usign dstat, really :)

> And then use dstat -t <options>. On my system it results in:

I get this:

1114091418.016
1114091419.017
1114091420.017
1114091421.018
1114091422.019
1114091423.020
1114091424.020
1114091425.021
1114091426.022
1114091427.023
1114091428.025
1114091429.026
1114091430.026
1114091431.027
1114091432.028

So the deviation increases. This happens on my completely idle dual-opteron, 
too:

1114091500.907
1114091501.908
1114091502.908
1114091503.909
1114091504.910
1114091505.911
1114091506.912
1114091507.912
1114091508.913
1114091509.914
1114091510.915
1114091511.916
1114091512.916

Even on my rather busy and slow freenet node (less busy during daytime,
but dstat still takes a few seconds to start), I get 1ms increasing
deviation:

1114091890.894
1114091891.895
1114091892.896
1114091893.897
1114091894.898
1114091895.899
1114091896.900
1114091897.901
1114091898.901
1114091899.902
1114091900.903
1114091901.904
1114091902.905
1114091903.906
1114091904.907
1114091905.908

> 1114086819.071|  1   3  96   0   0   0|   0     0 |   0     0 |   0     0 
> |1082   844 |0.11 0.06 0.08
> 1114086820.072|  1   4  95   0   0   0|   0     0 |   0     0 |   0     0 
> |1061   885 |0.11 0.06 0.08
> 1114086821.073|  1   3  96   0   0   0|   0     0 |   0     0 |   0     0 
> |1082   814 |0.11 0.06 0.08
> 
> So you see a 1ms deviation per second. (about 1sec deviation after 17mins)

Obviously a minor bug in dstat then. However, it should not affect
calculations much.

The more I think of it (and the more I test), I think the non-averaging
was probably just an artifact. dstat clearly does averaging here.

It's possible that the intermediate updates confused me, too.

Sorry for making a bogus bugreport, I'll close it :(

One thing, though: the longer I think about it, the mroe I come to the
conclusion that the intermediate updates are useless, as by looking at
dstat output you cannot know what the numbers actually show, because they
are averaged over an unknown number of seconds.

Personaly, I would prefer either a real n-second average, with
intermediate updates and one "scroll" every n seconds, or no averages at
all for the intermediate reprots. Given that I don't really rely on the
intermediate updates (it's just one extra feature over vmstat), and this
is my personal preference, you might simply ignore my thoughts :->

> Now try the same when enabling the following lines:
> 
>           ### Increase precision if we're root (does not seem to have effect)
>       #   if os.geteuid() == 0:
>       #       os.nice(-20)
>       #   sys.setcheckinterval(op.delay / 10000)
> 
> And let me know if this makes a difference for you. On all occasions it 
> never made a real difference for me. (ie. you may want to try this both as 
> root as well as a user and maybe disable the if statement)

Your and my time dumps show no problem with precision, but with clock
stability. I tried to find out how dstat does it's time scheduling but
could only find references to ALARM, which has no stability guarentees.

Not that deviations of a few ms/second matter to me, but if you want to
make one update per second, on average, for continued time, then you'd
need to wait "till the next update" and not "one second between updates",
as the latter doesn't take into account the time of the update itself (or
in this case, delays in ALARM handling out of your control).

> I understand, but the code becomes more ugly :/ (ie. I have to indent a 

Welcome to the real world :) Nice algorithms often become ugly because of
complicatred corner cases, too :)

> complete block + subblocks of code...). But if it can take up to 10secs, 

Well, vmstat can easily take 10 seconds for startup, too, if your system
is thrashing. It might even never start :) I don't count that as a big
problem in my book. Pressing INT in what might be my last remaining shell
for that machine to get a shell prompt back is vital for me, though :)
Wether with trace or not.

> you're absolutely right. I'm going to look into speeding up dstat (or 
> slowing down my system and profiling statements, I bet some modules take 
> a long time and may not be necessary all the time).

I have similar problems with any of my perl programs. Perl simply has to
read so many files that it is impossible to make it faster, except by
using no modules, which is inacceptable.

I don't think it's dstat's fault at all. It's a mere artifact of dstat
being written in a language that does all linking at runtime.

> > I guess the best thing is to only catching INT before initscr, as before
> > there is no reaosn even to catch the signal, because catching it only adds
> > time before the user gets back at his/her prompt (there is nothing to
> > cleanup), although I suspect that it can't be done with python.
> 
> It can be done. But it still requires me to import the signal module :)

Hm.. you mean to say that python gives a backtrace by default within
a "short" timeframe after starting up? Frankly, I'd report this as a
bug immediately, after all, that precludes being able to control signal
handlign completely within python :->

> > However, despite me bashing on this, it really is a very very minor issue
> > :)
> 
> I understand. But details matter to me too

Obviously! And that's becoming increasingly rare!

Thanks for investing time. If vmstat had bugged me enough I would have
wirtten my own vmstat replacement, but it would have been some small
unpublished hack that would only work for me. Thanks for taking the time
to do it properly and release it, I know that's quite an amount of extra
work.

-- 
                The choice of a
      -----==-     _GNU_
      ----==-- _       generation     Marc Lehmann
      ---==---(_)__  __ ____  __      [EMAIL PROTECTED]
      --==---/ / _ \/ // /\ \/ /      http://schmorp.de/
      -=====/_/_//_/\_,_/ /_/\_\      XX11-RIPE


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#305556: dstat: SIGINT sometimes ignored

Reply via email to