I am looking for recommendations for a new rackmount server with a
watchdog(4) device fully supported under OpenBSD 4.2.

Currently I have a pair of Sun Fire v100 servers providing recursive
DNS services;  each of these handles a peak of perhaps 50
requests/second.  One of the two servers will crash hard about once
every two months.  When this happens, the server just stops, no
debugger, no console output.  We've gone so far as to replace the
entire server with an identical v100 built from scratch with a
standard OpenBSD/sparc64 install from CD, and yet the problem still
happens on the same approximate schedule.  I suspect a power glitch.

Since power quality is out of our control, I've been asked by
management to make this problem go away, or at least to hide the
symptoms.  Since I haven't been able to diagnose much less resolve the
problem, I figure the next best thing is to make sure that when the
server does freeze, it self-reboots instead of waiting for a human to
respond and manually power-cycle the machine.

I see support for the pmc(4) watchdog on UltraSparc-III (my V100s are
IIe, no watchdog) systems, can I safely assume all new IIIi servers
from Sun (e.g. V125) include the PMC watchdog?

Are there less expensive AMD64 rackmount 1U systems with hardware
watchdogs which I should also consider?


Thanks,

Kevin

Reply via email to