I am looking for recommendations for a new rackmount server with a watchdog(4) device fully supported under OpenBSD 4.2.
Currently I have a pair of Sun Fire v100 servers providing recursive DNS services; each of these handles a peak of perhaps 50 requests/second. One of the two servers will crash hard about once every two months. When this happens, the server just stops, no debugger, no console output. We've gone so far as to replace the entire server with an identical v100 built from scratch with a standard OpenBSD/sparc64 install from CD, and yet the problem still happens on the same approximate schedule. I suspect a power glitch. Since power quality is out of our control, I've been asked by management to make this problem go away, or at least to hide the symptoms. Since I haven't been able to diagnose much less resolve the problem, I figure the next best thing is to make sure that when the server does freeze, it self-reboots instead of waiting for a human to respond and manually power-cycle the machine. I see support for the pmc(4) watchdog on UltraSparc-III (my V100s are IIe, no watchdog) systems, can I safely assume all new IIIi servers from Sun (e.g. V125) include the PMC watchdog? Are there less expensive AMD64 rackmount 1U systems with hardware watchdogs which I should also consider? Thanks, Kevin

