Hi,
one of my systems was suffering from very similar symptoms. I had no
chance to debug it much as it was on remote site in serverhouse. But in
my case it was lack of memory, system was under significant memory
pressure. I was unable to reproduce it on small systems I have at home.
I added some memory and set limits for zones.
One small suggestion - could you write small script dumping memory info
(from kernel mdb) and list of processes to the disk and run it from
crontab every few minutes? Maybe it will be unable to store data during
"hang" but at least you could see trend.
For lost IP address - are you using NWAM?
Best regards,
Milan
On 22.02.2012 07:32, [email protected] wrote:
Hi there,
I'm seeing roughly weekly hangs on a server running OpenIndiana 151a.
I'm
using it primarily as a home fileserver with ZFS.
The exact behavior seems to depend on when I notice it, but
essentially the
server drops off the network and is only variably responsive when I
try to
access the console directly. Sometimes when this happens the system
doesn't
respond at all (e.g., not even to keyboard input). One time I was
able to
interact with the console (after the server had disappeared from the
network) and tried to see what was going on. Tried pinging
google.com(unreachable, as expected). Next I tried `ifconfig -a` and
got this:
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
8232
index 1
inet 127.0.0.1 netmask ff000000
e1000g0:
flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu
1500 index 2
inet 0.0.0.0 netmask ff000000
which explains the lack of connectivity. But after it printed that it
didn't return. The console still printed my keyboard output
(including ^C,
^Z, etc.), and there was still output coming from other sources
(e.g., I
have napp-it running regular snapshots, so I saw a notice that it had
used
sudo to run that) but I couldn't get a prompt back. Next I tried
hitting
the power button on the machine I got this:
poweroff: initiated by user on /dev/console
in.ndpd[994]: phyint_reach_random: SIOCSLIFLNKINFO (interfac
e1000g0):
Interrupted system call
bootadm: /boot/solaris/bin/extract_boot_filelist is not owned by 101,
skipping
syncing file systems... done
WARNING: Power off requested from power button or SC, powering down
the
system!
followed shortly by:
WARNING: Failed to shut down the system!
Tried looking through the logs for anything interesting but didn't
come up
with anything, though to be honest I'm not 100% sure where to look or
what
to look for. When the machine drops off the network I can still
access it
via IPMI (tried this using both the dedicated jack on the motherboard
and
by sharing the Intel NIC--worked in both cases, but OI was still
unresponsive), so I doubt it's a bad NIC. Motherboard is a Supermicro
X9SCM-F.
I know that at least sometimes the system will stop running even my
ZFS
snapshots via napp-it, since I've come back to a frozen console that
showed
the last snapshot being taken 12+ hours before (they're supposed to
be
taken every 15 minutes). My guess is this is just because it takes me
longer to notice sometimes--seems like it's hitting a deadlock
somewhere
that eventually grinds everything to a halt (like with the ipconfig
call
above).
Also, FWIW, here's what ipconfig -a gets me when it works correctly
(MAC
address removed, although interestingly it wasn't even printed in the
output above):
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
8232
index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu
1500
index 2
inet 192.168.10.10 netmask ffffff00
ether [MAC address here]
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu
8252
index 1
inet6 ::1/128
e1000g0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500
index 2
inet6 fe80::225:90ff:fe50:2c2a/10
ether [MAC address here]
Any ideas/suggestions on where to go from here? Thanks in advance.
_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss