On Mon, Jan 29, 2001 at 07:53:17PM -0500, Chris Colomb wrote:
> If you're in that percentage that's small comfort.

Quite true.  The choice is, however, to determine what recovery costs are
acceptable.  It may be acceptable to simply tolerate a long fsck, since it's
a once-in-a-blue-moon event.  Or to recover from backup tape.  OR, if none
of the other options are acceptable, to implement a journaling filesystem.

> There was a hardware failure in the device that switches to battery.
> But that's not really the point.

Oh, but that very much IS the point--there should never be a single point-of-
failure in such a system!  The tell-tale phrase above is "the device"--it
clearly wasn't redundant, and provided a single point of failure.  (Who made
this system?  I want to avoid it.)

> As you put earlier...gently, gently.

Yes; you're quite right.  I apologize.

> Right. And that's why Linux isn't being used for any of our large data
> mission critical applications. It doesn't take a catastrophic failure to
> put one in such a situation. And it doesn't have to be about a UPS...you
> get the same issues if, say, the kernel panics. Or if someone kicks a cord
> by mistake. Stuff happens, despite one's best efforts to the contrary. 

Well, perhaps.  I see about the same level of kernel panics, driver errors,
etc. on current commercial Linux distributions as I've had to live with in
commercial Unix systems over the last 20 years.  

And while I've been through cord-kicking, too, I always have to look at that
as cockpit error--there shouldn't BE a cord to kick in a mission-critical
installation.

> They were. However the automatic fsck halted, as it should, because the
> damage was too extensive. Which is when manual intervention was required. 

Hmm.  Yes, that's a bad situation.  I'd be even more ticked at whoever
designed the UPS.

> The point being, with IBM AIX, SGI, Sun none of that was necessary as it
> never came to that in the first place.

Well, I certainly have had those problems with all those platforms
at one time or another in the past.  Believe me, it's not a matter of
orders-of-magnitude any more.

> Journaling hardly has to be a universal solution to be extremely valuable
> and very much worth implementing. The 3/4 of the life of Unix argument is
> interesting in that it leads into what I think is really the issue at
> hand, one of best practices. 

Yes.  And incidentally, I thoroughly agree that one problem we're going to
continue to see in Linux is that the "non-sexy" features will be slow in
coming--or won't come, and will be commercial extensions.

> Best practices is all about using the tools available to you in the most
> advantageous way possible, given the state of the technology you're
> working with.  What is best practice changes over time...what could have
> been considered acceptable a couple of years ago can be an unecessary risk
> today. 

Yes; most certainly agreed.

> What I don't think you're acknowledging in your statements is that, um,
> stuff happens.  Despite all one's best attempts at planning etc. 

No--I've said all along it's a matter of probability and cost/benefit analysis.
That pretty explicitly assumes there's always risk.  The important point is
that there *are* silly things that most certainly can be prevented--and those
live squarely under the rubric of "best practices".

> Some things you can't really plan for, like say a meteor hits your
> machine room (though some folks actually do plan for things like this)
> but are pretty remote possibilities.

I'm *so* glad you added that--I'm one of those folks.  I've planned both
wholly-owned and leased off-site replacement facilities.  I've written
plans to totally reconstruct an installation, up to and including
having to go out and buy new but compatible hardware and restoring
from vault tapes.  There are people out here who make a living out of
being paranoid; the only real question, as that old joke goes, is are
we paranoid _enough_.

> Most management I've worked with/for understands that stuff beyond your
> control happens, and you won't get fired for that. What *will* get you
> fired is if there was a prevalent industry practice that you could have
> been implementing to prevent the problem in the first place, and you
> weren't doing it.

Yes; or if you can't show, _before_ the problem, that you considered both
the possiblity of that problem and its associated costs, and the course
you took was one that had been evaluated and approved.  (It's gratifying
to point out to the CEO that you *did* ask for that fully redundant UPS
installation, and when the CFO refused, you had to take the second-best
soluition  (It's gratifying to point out to the CEO that you *did* ask
for that fully redundant UPS installation, and when the CFO refused,
you had to take the second-best solution.  It's more gratifying if you
can show that your solution still saved the data...)

I don't really think we're disagreeing about ultimate ends--in a
mission- critical installation, all risks must be assessed, and costs
for mitigating them compared to the perceived benefits.

Our only difference is in the perception of current risk levels in
Linux vs. commercial Unix; and in the relative benefits of journaling
filesystems _for most installations_.  That's fair grounds for discussion,
but probably has passed beyond the bounds of relevance for this discussion
group.

Cheers,
-- 
        Dave Ihnat
        [EMAIL PROTECTED]



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to