On Sat, 23 Apr 2011 23:42:23 -0400
Ken Wesson <[email protected]> wrote:

> On Sat, Apr 23, 2011 at 11:35 PM, Mike Meyer <[email protected]> wrote:
> > On Sat, 23 Apr 2011 23:19:53 -0400
> > Ken Wesson <[email protected]> wrote:
> >
> >> On Sat, Apr 23, 2011 at 8:13 PM, Mike Meyer <[email protected]> wrote:
> >> > On Sat, 23 Apr 2011 19:41:28 -0400
> >> > Ken Wesson <[email protected]> wrote:
> >> > or you live in a universe where cosmic rays can flip bits and other
> >> > sources of hardware hiccups exist.
> >> Software crashes caused by non-software-bug-triggered memory
> >> corruption seem to me to be exceedingly rare, and they could as easily
> >> strike critical parts of the operating system as a multithreaded
> >> server program (and a large batch of independent C jobs will occupy
> >> more memory and have a correspondingly larger cross section as a
> >> target for such things).
> >> The best recourse if the server gets hit by something like that is
> >> going to be to reboot it.
> >
> > While it might be exceedingly rare on a per-cpu-second basis, if your
> > application runs 7x24 on enough cpus, you can expect to see them at
> > regular intervals. In which case the best recourse - if you want a
> > stable, robust application - is to restart the smallest set of
> > processes that might have been affected by the problem.
> 
> In other words, all of them, since the operating system might have
> been affected by such a problem and if it was, everything else is
> probably affected too.

Let me guess - you're one of these people who reboots systems every
couple of days "just in case"?

Sure, a hardware glitch that affects the OS means you should reboot
the system. Of course, if it affects some user process, it may have
affected the OS without leaving evidence of doing so. Then again, it
may not have. While you could reboot everything "just in case", you
could also have a hardware glitch affect the OS without leaving
evidence in any process, so you might as well reboot even though
nothing is wrong "just in case."

Nah, hardware glitches are either localized, in which case restarting
just the address spaces that failed is sufficient (and has proven so
in practice for years), or they're systemic, in which case you'll have
failures throughout the system. It's pretty easy to tell the
difference between the two and deal with them appropriately.

        <mike
-- 
Mike Meyer <[email protected]>             http://www.mired.org/consulting.html
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to