Re: [OpenIndiana-discuss] Server hangs weekly

Milan Jurik Fri, 24 Feb 2012 05:46:03 -0800

Hi,

On 23.02.2012 06:37, [email protected] wrote:

Interesting. I have 8GB of memory, which I thought would be enoughfor mypurposes (6 x 2TB drives, RAIDZ-2, no deduping or anything speciallikethat). Most of the time the server is on it's idle, but perhaps theregularsnapshots are causing problems. I'll have to try to track memoryusage and
see what happens.

ZFS ARC is very aggressive in memory consumption. Together with othersystem caches it can consume a lot of free space and it is very lazy togive it back to system. And there seem to be some very bad reactions ofsystem to OOM.

As for NWAM, I believe I'm using it? I haven't turned it off, anyway.

so it could explain why you loose network. It could be NWAM daemonitself or some bad behavior in network stack giving info to NWAM daemonthat network card lost link and later it is unable to reconfigure card.Probably separate bug.


Best regards,

Milan

Thanks for the feedback.
On Wed, Feb 22, 2012 at 5:27 PM, Daniel Kjar <[email protected]>wrote:
I have this problem on a system that I was using to back up 50 gbsofmaterial each night. It would transfer that across the network inzfs andthat would kill it but it would only happen after a week or so ofnightlyupdates of roughly the same size. This machine has 32gb of ram anda cpprocess would hang and swallow it all bringing the system to itsknees. Ijust stopped that big transfer job and called it a night. I am nolonger
backing up my files to 3 different buildings but that is better than
crashing my sunray server every 5 days.


On 2/22/2012 1:48 AM, Milan Jurik wrote:
Hi,
one of my systems was suffering from very similar symptoms. I hadnochance to debug it much as it was on remote site in serverhouse.But in mycase it was lack of memory, system was under significant memorypressure. Iwas unable to reproduce it on small systems I have at home. I addedsome
memory and set limits for zones.
One small suggestion - could you write small script dumping memoryinfo(from kernel mdb) and list of processes to the disk and run it fromcrontabevery few minutes? Maybe it will be unable to store data during"hang" but
at least you could see trend.

For lost IP address - are you using NWAM?

Best regards,

Milan

On 22.02.2012 07:32, [email protected] wrote:
Hi there,
I'm seeing roughly weekly hangs on a server running OpenIndiana151a. I'm
using it primarily as a home fileserver with ZFS.
The exact behavior seems to depend on when I notice it, butessentially
the
server drops off the network and is only variably responsive whenI try
to
access the console directly. Sometimes when this happens thesystem
doesn't
respond at all (e.g., not even to keyboard input). One time I wasable tointeract with the console (after the server had disappeared fromthe
network) and tried to see what was going on. Tried pinging
google.com(unreachable, as expected). Next I tried `ifconfig -a`and
got this:
lo0:flags=2001000849<UP,LOOPBACK,**RUNNING,MULTICAST,IPv4,**VIRTUAL>
mtu 8232
index 1
       inet 127.0.0.1 netmask ff000000
e1000g0:flags=1040843<UP,BROADCAST,**RUNNING,MULTICAST,DEPRECATED,**IPv4>
mtu
1500 index 2
       inet 0.0.0.0 netmask ff000000
which explains the lack of connectivity. But after it printed thatitdidn't return. The console still printed my keyboard output(including
^C,
^Z, etc.), and there was still output coming from other sources(e.g., Ihave napp-it running regular snapshots, so I saw a notice that ithad
used
sudo to run that) but I couldn't get a prompt back. Next I triedhitting
the power button on the machine I got this:

poweroff: initiated by user on /dev/console
in.ndpd[994]: phyint_reach_random: SIOCSLIFLNKINFO (interface1000g0):
Interrupted system call
bootadm: /boot/solaris/bin/extract_**boot_filelist is not owned by101,
skipping
syncing file systems... done
WARNING: Power off requested from power button or SC, poweringdown the
system!


followed shortly by:

WARNING: Failed to shut down the system!
Tried looking through the logs for anything interesting but didn'tcome
up
with anything, though to be honest I'm not 100% sure where to lookor
what
to look for. When the machine drops off the network I can stillaccess itvia IPMI (tried this using both the dedicated jack on themotherboard and
by sharing the Intel NIC--worked in both cases, but OI was still
unresponsive), so I doubt it's a bad NIC. Motherboard is aSupermicro
X9SCM-F.
I know that at least sometimes the system will stop running evenmy ZFSsnapshots via napp-it, since I've come back to a frozen consolethat
showed
the last snapshot being taken 12+ hours before (they're supposedto betaken every 15 minutes). My guess is this is just because it takesmelonger to notice sometimes--seems like it's hitting a deadlocksomewherethat eventually grinds everything to a halt (like with theipconfig call
above).
Also, FWIW, here's what ipconfig -a gets me when it workscorrectly (MACaddress removed, although interestingly it wasn't even printed inthe
output above):
lo0:flags=2001000849<UP,LOOPBACK,**RUNNING,MULTICAST,IPv4,**VIRTUAL>
mtu 8232
index 1
       inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,**RUNNING,MULTICAST,DHCP,IPv4>mtu
1500
index 2
       inet 192.168.10.10 netmask ffffff00
       ether [MAC address here]
lo0:flags=2001000849<UP,LOOPBACK,**RUNNING,MULTICAST,IPv6,**VIRTUAL>
mtu 8252
index 1
       inet6 ::1/128
e1000g0: flags=20002004841<UP,RUNNING,**MULTICAST,DHCP,IPv6> mtu1500
index 2
       inet6 fe80::225:90ff:fe50:2c2a/10
       ether [MAC address here]


Any ideas/suggestions on where to go from here? Thanks in advance.


_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Server hangs weekly

Reply via email to