I recently got a new machine and installed oi_151a from scratch
(core i7-2600, intel motherboard DH67BL, 16GB RAM).
After a few days uptime I noticed a constant system load of about 10%
although the desktop was idle and I had not started anything that caused
a permanent load. There was almost no I/O activity, just a few reads
and writes every few seconds. vmstat showed 0-1% user time but
10-13% system time. prstat -v output was far below 1% or 0% user and
system time for all processes.
Over the following days the load increased further. When I took 7 cpu
cores off-line I got about 80% sys load on the remaining core. Where
does it come from?

When I switch from multi user to single user mode the load persists.
When I reboot, everything is fine for a while (0-1% sys load) but the load
slowly starts increasing again. So, I have to reboot the machine about
every 2 days what is very unpleasant.

I tried to analyze the issue using intrstat, lockstat, etc. but have not
got very far.

All following commands were run in single user mode and with only one cpu
core on-line. (I hope it's ok to put the output here?)

~ # vmstat 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s1 s2 s3 s4   in   sy   cs us sy id
 0 0 0 9070220 2859260 2  4  0  0  0  0  0  5 -1  1 14  397  265  258  0  4 95
 0 0 0 10392120 4142932 24 64 0 0  0  0  0  0  0  0  1  508   99  227  0 21 79
 0 0 0 10392120 4142960 0 0  0  0  0  0  0  0  0  0 13  511   60  229  0 21 79
 0 0 0 10392124 4142964 0 0  0  0  0  0  0  0  0  0 13  509   59  226  0 21 79

~ # ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root     0     0   0   Dec 20 ?           0:01 sched
    root     4     0   0   Dec 20 ?           0:00 kcfpoold
    root     6     0   0   Dec 20 ?           2:34 zpool-rpool
    root     1     0   0   Dec 20 ?           0:00 /sbin/init
    root     2     0   0   Dec 20 ?           0:00 pageout
    root     3     0   0   Dec 20 ?           8:54 fsflush
    root    10     1   0   Dec 20 ?           0:03 /lib/svc/bin/svc.startd
    root    12     1   0   Dec 20 ?           0:08 /lib/svc/bin/svc.configd
  netadm    50     1   0   Dec 20 ?           0:00 /lib/inet/ipmgmtd
   dladm    46     1   0   Dec 20 ?           0:00 /sbin/dlmgmtd
    root   167     0   0   Dec 20 ?           1:50 zpool-tank
    root   232     1   0   Dec 20 ?           0:00 /usr/lib/sysevent/syseventd
    root  9518    10   0 21:01:56 console     0:00 -bash
    root   262     1   0   Dec 20 ?           0:02 devfsadmd
    root   276     1   0   Dec 20 ?           0:00 /usr/lib/power/powerd
    root 10708  9518   0 21:04:53 console     0:00 ps -ef
    root  3222     1   0   Dec 20 ?           0:00 -bash


~ # intrstat
      device |      cpu0 %tim      cpu1 %tim      cpu2 %tim      cpu3 %tim
-------------+------------------------------------------------------------
    e1000g#1 |         1  0.0         0  0.0         0  0.0         0  0.0
      ehci#0 |         1  0.0         0  0.0         0  0.0         0  0.0
      ehci#1 |         1  0.0         0  0.0         0  0.0         0  0.0
      rtls#0 |         1  0.0         0  0.0         0  0.0         0  0.0
(cpu4..7 are all 0.0%)


~ # prstat -v
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
 10711 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0 254   0 prstat/1
  3222 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 bash/1
[cut]
Total: 14 processes, 366 lwps, load averages: 0.02, 0.32, 0.67


~ # lockstat -kIW -D20 sleep 30

Profiling interrupt: 2913 events in 30.028 seconds (97 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
-------------------------------------------------------------------------------
 2878  99%  99% 0.00      293 cpu[0]                 acpi_cpu_cstate
   12   0%  99% 0.00      224 cpu[0]                 fsflush
   10   0% 100% 0.00      266 cpu[0]                 i86_mwait
[cut]
-------------------------------------------------------------------------------

Is the high count on acpi_cpu_cstate normal?

The hotkernel script from the dtrace toolkit finally froze my system.
After the reboot hotkernel run flawlessly.

How can I further analyze this?

Thanks,
Mirko


_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to