On Tue, Dec 23, 2008 at 07:37:39AM +0000, Doug Winter wrote: > 75 processes: 1 running, 74 sleeping > CPU states: 0.5% user, 0.0% nice, 0.0% system, 99.5% idle, 0.0% iowait This is missing a lot of states. I suspect this is the fork top. Cpu(s): 8.9%us, 1.0%sy, 0.0%ni, 90.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
I got iowait (wa) too, but missing is hi and si and st which are hard and soft interrupts and stolen time. > PID USERNAME THR PRI NICE SIZE RES SHR STATE TIME CPU COMMAND > 14357 zope 5 20 0 2186M 2036M 4696K sleep 7:59 20.00% python2.4 > The system this dump was taken from is a dual-core Athlon system that was > under heavy load on a single thread. There are three things that are weird > here: The dual-core is important. Single cpu too? > 1. the overall CPU states show the system being 99.5% idle, and yet the top > process consumes 20% of CPU. That surely is just wrong ;) I think you mean the python process. Single threaded, 20% of a single core. It does seem low, but it looks like your other cores are sitting aronud doing and it is an average, hit '1' in top to see a per-cpu percent. > 2. even under very heavy load, the process rarely consumes more than 20% > CPU, when I would expect to see it use 100% of one core, if it was CPU > bound. But if it's IO bound I would expect to see more iowait, hence... > > 3. the iowait is shown at 0.0%, which is very odd. These look like one and the same problem, if the cpu is not flat out doing something then what is it doing? Why, its waiting! > During these periods when it shows this behaviour, strace shows the process > is spending all of it's time in repeated calls to recvfrom(2) - from which > I would expect to see a high iowait. Yet iowait is never reported at > higher than 1-2%. Ah! you've got the terms confused here. The percentages up the top are *CPU* percentages, recvfrom(2) is a system call that a *process* uses. Now let's consider two scenarios. 1) You like slow MFM drives, every time you want to write something to disc the CPU has to hang around until the 20 year-old drive finally says 'yeah got it'. the cpu spends a lot of time in iowait. Crummy SCSI busses used to do this too. iowait is a deep-down sitting right near the metal kind of hanging around waiting that only CPUs do. 2) You are on an island on a high-latency satellite link that is connected to your computers ethernet port. You send something to your remote server and then wait with a recvfrom(). The iowait is almost 0 because the etherent card services the traffic real quick, the CPU hangs around doing nothing and getting bored. Your process sits in recvfrom() as the 600ms round trip time means it has to wait for the far end, a lot. Blocking is a system call very high level waiting that processes do. A process sitting at recvfrom is not sitting in iowait, it doesn't sit there grabbing the CPU. This is Unix, the system call blocks and it sleeps, ie 0% cpu. It really should use select() first, but for this bug that's beside the point. In short, your program is spending a lot of time waiting for messages to arrive and Linux, being clever, puts the process to sleep until there is something to look at. That means the numbers are right. I think you were getting the iowait mixed up with the WCHAN fields you can see in top and ps. I'll leave the bug open momentarily in case I misunderstood something, but to me your process is fine, the data channel or whatever it is talking to is your bottleneck. My wild guess it is a database being sad. - Craig -- Craig Small GnuPG:1C1B D893 1418 2AF4 45EE 95CB C76C E5AC 12CA DFA5 http://www.enc.com.au/ csmall at : enc.com.au http://www.debian.org/ Debian GNU/Linux, software should be Free -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org